Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: allow for jobservers across multiple nix builds #314888

Open
wants to merge 2 commits into
base: staging
Choose a base branch
from

Conversation

RaitoBezarius
Copy link
Member

Description of changes

Retake of #143820 where I unfortunately fucked up trying to debug kernel issues and btrfs.

Original motivations by @pennae:

make -jN -lN in stdenv is a very blunt instrument. it works well when max-jobs=1, but as nix-level paralellism increases it becomes increasingly deficient. starting from a low-load situation we start max-jobs * N compilers, loadavg goes through the roof, the -lN load limit kicks in and inhibits new compilers starting until loadavg has fallen below N—at which point all make instances spawn a lot of new compilers and loadavg goes through the roof again. this oscillation leaves the system underutilized in low phases and overcommitted in high phases.

testing the current stdenv against a jobserver with 26 tokens on a 12C/24T machine shows that parallel builds of llvm_{8..11} run about 7% faster (35:52min for stdenv, 33:30min with jobserver), a larger build of llvm{5..13} is about about 11% faster (1:27h for stdenv, 1:17h with jobserver). (removing the -l from stdenv also improves utilization but is less efficient. preliminary testing here shows that -l${1.5 * N} may be a good alternative to -lN as used currently, #141266 could be a good vector to go for that instead of this whole mess. [more testing says that -l2N would be a minimum to get better utilization, but so far every -l setting we've tried has produced some underutilization except excessive large numbers like 6N or higher])

nothing in here should be regarded as a final suggestion in any way, it's more of a "hey look, this might just work". as such it's extremely rough around the edges, eg to use the jobserver the experimenter currently has to bring a /jobserver fifo filled with tokens into the nix sandbox:

nix-build -E 'with import ./. {}; pkgs.callPackage ./pkgs/os-specific/linux/nixos-jobserver {}'
touch /tmp/jobserver
sudo ./result -t 24 -g nixbld /tmp/jobserver&
nix build --extra-sandbox-paths "/jobserver=/tmp/jobserver" #...

is this something worth pursuing? a 10% speedup for hydra does seem tempting

todos before this is more generally usable:

  • re-add $NIX_BUILD_CORES support (ioctl on the jobserver fd should do it)
  • maybe port from cuse to fuse to not require root for everything (mmap problems may be solved by reporting jobserver file size as 0 at all times, needs testing)
  • usability of tools (error messages, runtime configuration, etc)
  • add support for other build systems (cargo can honor MAKEFLAGS, maybe others too)
  • nixos module
  • make configurable which env var(s) jscall adds jobserver information to
  • metrics?
  • <your suggestion here>

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = true
  • Tested, as applicable:
    • Build without a job server
    • Build with a job server
    • ~Reproducibility of ISO image
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: rust 6.topic: stdenv Standard environment labels May 26, 2024
@lilyinstarlight lilyinstarlight added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label May 26, 2024
@@ -1355,6 +1355,30 @@ name="/nix/store/9s9r019176g7cvn2nvcw41gsp862y6b4-coreutils-8.24"
someVar=$(stripHash $name)
```

### `runInJobServer` \<command\> \-\-\-\- \<defArgs\> \-\-\-\- \<args\> {#fun-runInJobServer}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### `runInJobServer` \<command\> \-\-\-\- \<defArgs\> \-\-\-\- \<args\> {#fun-runInJobServer}
### `runInJobserver` \<command\> \-\-\-\- \<defArgs\> \-\-\-\- \<args\> {#fun-runInJobServer}

(multiple places)

config = mkIf cfg.enable {
nix.sandboxPaths = [ "/build-support/jobserver=${tokenFile}?" ];

systemd.services.nixos-jobserver = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running this as root is most unwise in practice but running it as another user seems to be nearly impossible. when fuse doesn't fuck us over systemd does.


struct stat jobserver_st = {
.st_ino = FUSE_ROOT_ID,
.st_mode = S_IFREG | 0660,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.st_mode = S_IFREG | 0660,
.st_mode = S_IFREG | 0666,

won't work otherwise

if [[ -n "$enableParallelChecking" ]]; then
runInJobserver \
make ---- \
-j${NIX_BUILD_CORES} -l${NIX_BUILD_CORES} ---- \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-j${NIX_BUILD_CORES} -l${NIX_BUILD_CORES} ---- \
-j${NIX_BUILD_CORES} ---- \

multiple places

@pennae
Copy link
Contributor

pennae commented May 27, 2024

cargo apparently no longer accepts jobserver fds that aren't pipes. that's somewhat problematic.

initially only make and cargo support using the jobserver. other build
systems may follow suit later.
@pennae pennae force-pushed the stdenv-jobserver branch from 8a4a67a to cd4036f Compare May 27, 2024 19:27
Co-authored-by: Raito Bezarius <[email protected]>
@pennae pennae force-pushed the stdenv-jobserver branch from cd4036f to e97220e Compare May 28, 2024 19:49
@h7x4 h7x4 added the 8.has: module (new) This PR adds a module in `nixos/` label Jun 1, 2024
@aviallon
Copy link
Contributor

Is this work dead? It looked very promising.

@yu-re-ka
Copy link
Contributor

I have some patches lying around locally to make it work with ninja, so anyone interested in picking this up, feel free to write me about it.

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 10, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: rust 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 8.has: module (new) This PR adds a module in `nixos/` 8.has: module (update) This PR changes an existing module in `nixos/` 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 501+ 10.rebuild-darwin: 5001+ 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ 10.rebuild-linux-stdenv This PR causes stdenv to rebuild significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants