-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libstore: Add load-limit setting to dynamically control parallelism #6855
base: master
Are you sure you want to change the base?
Conversation
The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load. Maybe the best solution is to make the GNU jobserver available in the sandbox... |
Shouldn't nix-daemon take care of counting the leftover cores? I expect that when I do |
Removing the load average should only be done if a system wide job server limits the jobs. A proof of conecpt was implemented in NixOS/nixpkgs#143820 . The current setting (jobs equal to load) leads to unused CPU load, when:
I think this PR can be seen as simple solution for more control about the utilization of the system. |
The PoC in NixOS/nixpkgs#143820 sets both |
If the default is |
we're open to moving the jobserver prototype we made in NixOS/nixpkgs#143820 into nix itself. unfortunately the gnu jobserver protocol isn't universally supported, ghc for example uses semaphores instead of pipes. unfortunately we can't support both the gnu protocol and sysv-semaphore-like protocols with the same implementation without kernel support (ie, fuse or other drivers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triaged in Nix team meeting:
We're not sure yet if this approach is right, since the load average and Make's -l
flag are somewhat controversial (see linked PRs and issues). Ideally we'd have a parallelism manager in Nix itself, and @edolstra intends to look into resource control using cgroups.
We'll discuss if exposing the additional parameter to builders is a meaningful stopgap solution.
Complete discussion
- @fricklerhandwerk: following @pennae's comment would this at the worst mean hand-rolling something like our own jobserver protocol?
- @thufschmitt: yes, and it may not be all that hard, but would degrade reproducibility as we'd lose control over parameters passed to the builder
- @edolstra: the GNU jobserver also seems to have unfixable problems, although we could do it better in Nix, at least in principle
- a parallelism manager may as well reduce parallelism in builders overall, reducing the number of reproducibility issues that arise from parallelism
- @edolstra: cgroups may be the proper way to do things
- @thufschmitt: agreed, but won't work on Darwin
- @edolstra: the real issue here though is load average
- have to think about how that would interact with Make
- this particular PR does not really solve the problem and potentially creates new ones
machine's load average. For instance, in Nixpkgs, if this value is >0, | ||
the builder passes the `-lN` flag to GNU Make. In this case, if the | ||
load average of the machine exceeds `N`, the amount of parallelism will | ||
be dynamically reduced to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machine's load average. For instance, in Nixpkgs, if this value is >0, | |
the builder passes the `-lN` flag to GNU Make. In this case, if the | |
load average of the machine exceeds `N`, the amount of parallelism will | |
be dynamically reduced to 1. | |
machine's load average. | |
For instance, a builder could use the value to set the `-l` flag to GNU Make. | |
In this case, if the load average of the machine exceeds `NIX_LOAD_LIMIT`, the amount of parallelism will be dynamically reduced to 1. |
No need to point to a specific implementation, as we'd have to keep track of if that statement is still correct.
load average of the machine exceeds `N`, the amount of parallelism will | ||
be dynamically reduced to 1. | ||
|
||
By default, it is set to the number of cores available to Nix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the lines of #7091 we may want to instead fall back to the cores
setting and only then system cores, i.e. settings.buildCores
.
available. In this case, `load-limit` should be set to `0` to prevent | ||
the `-lN` flag from being passed to GNU Make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
available. In this case, `load-limit` should be set to `0` to prevent | |
the `-lN` flag from being passed to GNU Make. | |
available. In this case, `load-limit` should be set to `-1`, which GNU Make would interpret as "no limit". |
Setting it to zero won't necessarily prevent it to be passed to Make, that would depend on how a builder is implemented. Right now -l
is not set at all in Nixpkgs builders.
Make's source code says that the -1
(default) means no limit.
load average is 24 and where Nix is limited to 8 cores. By default, | ||
`-j8 -l8` will be passed to GNU Make. Since the load average exceeds 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load average is 24 and where Nix is limited to 8 cores. By default, | |
`-j8 -l8` will be passed to GNU Make. Since the load average exceeds 8, | |
load average is 24 and where Nix is limited to 8 cores. By default, | |
`-j8 -l8` would be passed to GNU Make. Since the load average exceeds 8, |
Since we're still in the example where the builder is a Make invocation.
I'm not a cgroups expert by any means and I haven't tried too hard, but I couldn't find a satisfying tool in there. Well, limiting RAM could be one way, as I think the RAM exhaustion is the main risk of too aggressive parallelism here. As for CPUs... cgroups offer limiting to just a particular subset of system's CPUs (not count but particular subset, sadly); such model seemed hard to apply well here, but I suspect the detections like nproc would then report and use the size of this subset at least. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1 |
Is there any appetite for this? It’s a simple change and the UX of running large rebuilds without |
@emilazy I very frequently find myself wanting this. CGroups are not a viable solution to this. Each derivation should have access to all the cores on the system, in case it's the only build running, so we can't limit the CPUs with CGroups. But the system will run out of memory if you have too many highly parallelized builds; CGroups can more gracefully limit the memory of builds, but that will result in build failures, not proper sharing of cores. I understand that |
Yeah, I agree. Apparently recent versions of GNU Make have a new job server protocol that lacks the syscall performance pitfalls of the earlier ones, so in the long run that seems like the way to go to get proper load balancing across a wide range of build tools, but that’s going to take engineering work to get it implemented and plumb it throughout the ecosystem and in the meantime we could add this one single setting and flip it on in Nixpkgs and make doing large builds on workstations so much more pleasant. (And CGroups aren’t even an option at all on macOS, of course.) |
I believe you can set it up so that CPU is shared equally between builds if there is many of them running concurrently, with all CPUs being available to share between them (so a single build running on its own could still use all of the CPU). |
I am working on reviving this after discussion on Matrix. |
Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>
Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>
Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>
Linking #11143 here for any subscribers to this. |
On busy machines where Nix co-exists with other workloads, parallelism may not work as intended. For example, consider a 64 core machine whose load average is 24 and where Nix is limited to 8 cores. By default,
-j8 -l8
will be passed to GNU Make. Since the load average exceeds 8, no parallelism will take place despite the fact that 8 cores are available. In this case,load-limit
should be set to0
to prevent the-lN
flag from being passed to GNU Make.See also: NixOS/nixpkgs#174473
Closes #7091