-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
treewide: drop -l$NIX_BUILD_CORES #192447
Conversation
(previously: #174473) |
I'm not a fan of this, but OK I hope. It does solve a real problem with how Hydra.nixos.org builders are set up. It probably makes things worse for people who want(ed) to combine small and big builds on the same machine "fast", with many cores. ( After merging this, we could try to rethink some other approach, possibly based on PR #184886 Also we might see more real-life feedback in the meantime. |
More robust designs for load limiting is probably a good thing to explore. Let's go ahead and merge this once ofborg is green, and if it causes problems we can revert -- no sweat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper path forward would be to introduce a NIX_MAX_CORES
or some similar option which would default to $JOBS * $CORES
, as there are many cases where you want run many jobs as most builds have long single threaded sections, but it can be really detrimental to have many jobs suddenly spike in thread + RAM usage.
Passing `-l$NIX_BUILD_CORES` improperly limits the overall system load. For a build machine which is configured to run `$B` builds where each build gets `total cores / B` cores (`$C`), passing `-l $C` to make will improperly limit the load to `$C` instead of `$B * $C`. This effect becomes quite pronounced on machines with 80 cores, with 40 simultaneous builds and a cores limit of 2. On a machine with this configuration, Nix will run 40 builds and make will limit the overall system load to approximately 2. A build machine with this many cores can happily run with a load approaching 80. A non-solution is to oversubscribe the machine, by picking a larger `$C`. However, there is no way to divide the number of cores in a way which fairly subdivides the available cores when `$B` is greater than 1. There has been exploration of passing a jobserver in to the sandbox, or sharing a jobserver between all the builds. This is one option, but relatively complicated and only supports make. Lots of other software uses its own implementation of `-j` and doesn't support either `-l` or the Make jobserver. For the case of an interactive user machine, the user should limit overall system load using `$B`, `$C`, and optionally systemd's cpu/network/io limiting features. Making this change should significantly improve the utilization of our build farm, and improve the throughput of Hydra.
If this causes problems, let's consider a revert and see what happens :). |
This was pushed with the IMO wrong description about getting system load to value 2, but not worth changing history now. Anyway, I materialized my ideas about an imperfect solution into PR #192799 |
I created an issue NixOS/nix#7091 for getting a longer term solution in nix |
My suggestion isn't getting support so far, if I understand it correctly, but note that on NixOS the default nix configuration for the local machine is setting both to Details: the default
|
Much of what I do with NixOS involves frequently performing fairly massive rebuilds on my workstation, and this change has made that substantially more unpleasant. I don't know what the solution is but I wanted to chime in to say that this is a significant problem for my personal workflow. |
FWIW @edolstra has some work in progress on cgroup support in Nix. This may help with this kind of resource control problem. In the meantime, @ElvishJerricco maybe setting cores and max-jobs to the square root of the number of cores you have, or a little more, is a reasonable compromise? |
@lheckemann I do not think that would be a reasonable utilization of my hardware (EDIT: for reference, 16c/32t, so high enough that I can get a lot of parallelism out of it, though not nearly as high as many build servers). For instance, one thing I find myself doing often is rebuilding NixOS with a new variation of systemd. Such builds come with long periods where a single derivation could be using all my cores, and long periods where there are many derivations building at the same time. There is no single Similarly, cgroups would not help, as I understand it. They can't stop |
I think the current build system with cores and max-jobs is just not smart enough that you can always utilize most of your hardware. Right now you need to optimize for either cores or max-jobs. |
We could control make and ninja with a central jobserver as following working proof of concept shows: |
@SuperSandro2000 All I can say is that the workloads I'm talking about with nixpkgs did utilize my hardware very well before this change, and now it is not possible to reach a balance. I do not know what the solution is; I just know it's now a lot worse for me. |
I found the opposite: previous I find |
@trofi I've measured it and it is slightly faster without |
Yeah, that makes sense. My RAM/core ratio is 8GB/core (+6GB/core of zram just in case). I would guess 4GB/core should be fine for most packages provided |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Linking #328677 for those subscribed to this. |
Passing
-l$NIX_BUILD_CORES
improperly limits the overall system load.For a build machine which is configured to run
$B
builds where each build getstotal cores / B
cores ($C
), passing-l $C
to make will improperly limit the load to$C
instead of$B * $C
.This effect becomes quite pronounced on machines with 80 cores, with 40 simultaneous builds and a cores limit of 2. On a machine with this configuration, Nix will run 40 builds and make will limit the overall system load to approximately 2. A build machine with this many cores can happily run with a load approaching 80.
A non-solution is to oversubscribe the machine, by picking a larger
$C
. However, there is no way to divide the number of cores in a way which fairly subdivides the available cores when$B
is greater than 1.There has been exploration of passing a jobserver in to the sandbox, or sharing a jobserver between all the builds. This is one option, but relatively complicated and only supports make. Lots of other software uses its own implementation of
-j
and doesn't support either-l
or the Make jobserver.For the case of an interactive user machine, the user should limit overall system load using
$B
,$C
, and optionally systemd's cpu/network/io limiting features.Making this change should significantly improve the utilization of our build farm, and improve the throughput of Hydra.
Description of changes
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes