libstore: Add load-limit setting to dynamically control parallelism #6855

centromere · 2022-08-02T20:01:41Z

On busy machines where Nix co-exists with other workloads, parallelism may not work as intended. For example, consider a 64 core machine whose load average is 24 and where Nix is limited to 8 cores. By default, -j8 -l8 will be passed to GNU Make. Since the load average exceeds 8, no parallelism will take place despite the fact that 8 cores are available. In this case, load-limit should be set to 0 to prevent the -lN flag from being passed to GNU Make.

See also: NixOS/nixpkgs#174473

Closes #7091

edolstra · 2022-08-03T13:20:29Z

The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.

Maybe the best solution is to make the GNU jobserver available in the sandbox...

veprbl · 2022-08-03T16:43:12Z

Shouldn't nix-daemon take care of counting the leftover cores? I expect that when I do nix build --cores 8 I'm telling a build that it's allowed to occupy up to 8 cores (regardless if it's running GNU Make or a Tensorflow job).

ck3d · 2022-08-03T18:27:25Z

Removing the load average should only be done if a system wide job server limits the jobs. A proof of conecpt was implemented in NixOS/nixpkgs#143820 .

The current setting (jobs equal to load) leads to unused CPU load, when:

The system has a lot of IO tasks (e.g. busy hard drives).
The system runs CPU intensive processes.

I think this PR can be seen as simple solution for more control about the utilization of the system.

centromere · 2022-08-03T19:35:55Z

The PoC in NixOS/nixpkgs#143820 sets both -j and -l to ${NIX_BUILD_CORES}. Even with a jobserver running, will Make properly ask for tokens if the load average exceeds NIX_BUILD_CORES?

markuskowa · 2022-08-16T20:30:41Z

The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.

If the default is cores == load-limit nothing would change. However on machines with mixed work loads it is highly desirable to remove this limitation NixOS/nixpkgs#174473.

pennae · 2022-08-19T14:03:40Z

we're open to moving the jobserver prototype we made in NixOS/nixpkgs#143820 into nix itself. unfortunately the gnu jobserver protocol isn't universally supported, ghc for example uses semaphores instead of pipes. unfortunately we can't support both the gnu protocol and sysv-semaphore-like protocols with the same implementation without kernel support (ie, fuse or other drivers).

fricklerhandwerk

Triaged in Nix team meeting:

We're not sure yet if this approach is right, since the load average and Make's -l flag are somewhat controversial (see linked PRs and issues). Ideally we'd have a parallelism manager in Nix itself, and @edolstra intends to look into resource control using cgroups.

We'll discuss if exposing the additional parameter to builders is a meaningful stopgap solution.

Complete discussion

@fricklerhandwerk: following @pennae's comment would this at the worst mean hand-rolling something like our own jobserver protocol?
- @thufschmitt: yes, and it may not be all that hard, but would degrade reproducibility as we'd lose control over parameters passed to the builder
- @edolstra: the GNU jobserver also seems to have unfixable problems, although we could do it better in Nix, at least in principle
  - a parallelism manager may as well reduce parallelism in builders overall, reducing the number of reproducibility issues that arise from parallelism
@edolstra: cgroups may be the proper way to do things
- @thufschmitt: agreed, but won't work on Darwin
- @edolstra: the real issue here though is load average
  - have to think about how that would interact with Make
- this particular PR does not really solve the problem and potentially creates new ones

fricklerhandwerk · 2023-04-28T16:07:21Z

src/libstore/globals.hh

+        machine's load average. For instance, in Nixpkgs, if this value is >0,
+        the builder passes the `-lN` flag to GNU Make. In this case, if the
+        load average of the machine exceeds `N`, the amount of parallelism will
+        be dynamically reduced to 1.


Suggested change

machine's load average. For instance, in Nixpkgs, if this value is >0,

the builder passes the `-lN` flag to GNU Make. In this case, if the

load average of the machine exceeds `N`, the amount of parallelism will

be dynamically reduced to 1.

machine's load average.

For instance, a builder could use the value to set the `-l` flag to GNU Make.

In this case, if the load average of the machine exceeds `NIX_LOAD_LIMIT`, the amount of parallelism will be dynamically reduced to 1.

No need to point to a specific implementation, as we'd have to keep track of if that statement is still correct.

fricklerhandwerk · 2023-04-28T16:22:03Z

src/libstore/globals.hh

+        load average of the machine exceeds `N`, the amount of parallelism will
+        be dynamically reduced to 1.
+
+        By default, it is set to the number of cores available to Nix.


Along the lines of #7091 we may want to instead fall back to the cores setting and only then system cores, i.e. settings.buildCores.

fricklerhandwerk · 2023-04-28T16:36:18Z

src/libstore/globals.hh

+        available. In this case, `load-limit` should be set to `0` to prevent
+        the `-lN` flag from being passed to GNU Make.


Suggested change

available. In this case, `load-limit` should be set to `0` to prevent

the `-lN` flag from being passed to GNU Make.

available. In this case, `load-limit` should be set to `-1`, which GNU Make would interpret as "no limit".

Setting it to zero won't necessarily prevent it to be passed to Make, that would depend on how a builder is implemented. Right now -l is not set at all in Nixpkgs builders.
Make's source code says that the -1 (default) means no limit.

fricklerhandwerk · 2023-04-28T16:38:09Z

src/libstore/globals.hh

+        load average is 24 and where Nix is limited to 8 cores. By default,
+        `-j8 -l8` will be passed to GNU Make. Since the load average exceeds 8,


Suggested change

load average is 24 and where Nix is limited to 8 cores. By default,

`-j8 -l8` will be passed to GNU Make. Since the load average exceeds 8,

load average is 24 and where Nix is limited to 8 cores. By default,

`-j8 -l8` would be passed to GNU Make. Since the load average exceeds 8,

Since we're still in the example where the builder is a Make invocation.

vcunat · 2023-04-28T16:57:19Z

I'm not a cgroups expert by any means and I haven't tried too hard, but I couldn't find a satisfying tool in there. Well, limiting RAM could be one way, as I think the RAM exhaustion is the main risk of too aggressive parallelism here. As for CPUs... cgroups offer limiting to just a particular subset of system's CPUs (not count but particular subset, sadly); such model seemed hard to apply well here, but I suspect the detections like nproc would then report and use the size of this subset at least.

nixos-discourse · 2023-04-28T17:02:50Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1

emilazy · 2024-07-07T20:04:00Z

Is there any appetite for this? It’s a simple change and the UX of running large rebuilds without -l on a workstation machine is really poor. Nobody will have to use it if they don’t want to; it could even default to (jobs × cores) to yield essentially the same behaviour as the status quo.

ElvishJerricco · 2024-07-18T09:02:03Z

@emilazy I very frequently find myself wanting this. CGroups are not a viable solution to this. Each derivation should have access to all the cores on the system, in case it's the only build running, so we can't limit the CPUs with CGroups. But the system will run out of memory if you have too many highly parallelized builds; CGroups can more gracefully limit the memory of builds, but that will result in build failures, not proper sharing of cores.

I understand that -l is not a perfect solution, but IME, testing mass rebuilds on nixpkgs locally was far easier before -l was removed.

emilazy · 2024-07-18T10:43:53Z

Yeah, I agree. Apparently recent versions of GNU Make have a new job server protocol that lacks the syscall performance pitfalls of the earlier ones, so in the long run that seems like the way to go to get proper load balancing across a wide range of build tools, but that’s going to take engineering work to get it implemented and plumb it throughout the ecosystem and in the meantime we could add this one single setting and flip it on in Nixpkgs and make doing large builds on workstations so much more pleasant.

(And CGroups aren’t even an option at all on macOS, of course.)

adrian-gierakowski · 2024-07-18T17:22:39Z

CGroups are not a viable solution to this. Each derivation should have access to all the cores on the system, in case it's the only build running, so we can't limit the CPUs with CGroups.

I believe you can set it up so that CPU is shared equally between builds if there is many of them running concurrently, with all CPUs being available to share between them (so a single build running on its own could still use all of the CPU).

emilazy · 2024-07-18T21:35:56Z

I am working on reviving this after discussion on Matrix.

Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>

emilazy · 2024-07-20T18:41:59Z

Linking #11143 here for any subscribers to this.

This was referenced Aug 2, 2022

stdenv: Pass -lN to GNU Make based on NIX_LOAD_LIMIT NixOS/nixpkgs#184886

Closed

stdenv/setup.sh: fix parallel make NixOS/nixpkgs#174473

Closed

libstore: Add load-limit setting to dynamically control parallelism

cc0beae

centromere force-pushed the load-limit branch from 601e940 to cc0beae Compare August 8, 2022 16:52

ck3d mentioned this pull request Sep 24, 2022

stdenv: reintroduce limiting by system load NixOS/nixpkgs#192799

Draft

vcunat linked an issue Sep 24, 2022 that may be closed by this pull request

Allow configuration of load limit for nix builds #7091

Open

fricklerhandwerk added performance store Issues and pull requests concerning the Nix store labels Mar 3, 2023

github-actions bot removed the store Issues and pull requests concerning the Nix store label Apr 28, 2023

fricklerhandwerk reviewed Apr 28, 2023

View reviewed changes

emilazy added a commit to emilazy/nix that referenced this pull request Jul 20, 2024

libstore: add load-limit setting to control parallelism

cdadc73

Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>

emilazy mentioned this pull request Jul 20, 2024

libstore: add load-limit setting to control parallelism #11143

Open

emilazy added a commit to emilazy/nix that referenced this pull request Jul 20, 2024

libstore: add load-limit setting to control parallelism

58e5be9

Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>

emilazy added a commit to emilazy/nix that referenced this pull request Jul 20, 2024

libstore: add load-limit setting to control parallelism

34f2477

Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>

edolstra requested review from Ericson2314 and edolstra as code owners November 12, 2024 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libstore: Add load-limit setting to dynamically control parallelism #6855

libstore: Add load-limit setting to dynamically control parallelism #6855

centromere commented Aug 2, 2022 •

edited by fricklerhandwerk

Loading

edolstra commented Aug 3, 2022

veprbl commented Aug 3, 2022

ck3d commented Aug 3, 2022 •

edited

Loading

centromere commented Aug 3, 2022

markuskowa commented Aug 16, 2022

pennae commented Aug 19, 2022

fricklerhandwerk left a comment

fricklerhandwerk Apr 28, 2023

fricklerhandwerk Apr 28, 2023

fricklerhandwerk Apr 28, 2023

fricklerhandwerk Apr 28, 2023

vcunat commented Apr 28, 2023

nixos-discourse commented Apr 28, 2023

emilazy commented Jul 7, 2024

ElvishJerricco commented Jul 18, 2024 •

edited

Loading

emilazy commented Jul 18, 2024 •

edited

Loading

adrian-gierakowski commented Jul 18, 2024

emilazy commented Jul 18, 2024

emilazy commented Jul 20, 2024

		available. In this case, `load-limit` should be set to `0` to prevent
		the `-lN` flag from being passed to GNU Make.

		load average is 24 and where Nix is limited to 8 cores. By default,
		`-j8 -l8` will be passed to GNU Make. Since the load average exceeds 8,

libstore: Add load-limit setting to dynamically control parallelism #6855

Are you sure you want to change the base?

libstore: Add load-limit setting to dynamically control parallelism #6855

Conversation

centromere commented Aug 2, 2022 • edited by fricklerhandwerk Loading

edolstra commented Aug 3, 2022

veprbl commented Aug 3, 2022

ck3d commented Aug 3, 2022 • edited Loading

centromere commented Aug 3, 2022

markuskowa commented Aug 16, 2022

pennae commented Aug 19, 2022

fricklerhandwerk left a comment

Choose a reason for hiding this comment

fricklerhandwerk Apr 28, 2023

Choose a reason for hiding this comment

fricklerhandwerk Apr 28, 2023

Choose a reason for hiding this comment

fricklerhandwerk Apr 28, 2023

Choose a reason for hiding this comment

fricklerhandwerk Apr 28, 2023

Choose a reason for hiding this comment

vcunat commented Apr 28, 2023

nixos-discourse commented Apr 28, 2023

emilazy commented Jul 7, 2024

ElvishJerricco commented Jul 18, 2024 • edited Loading

emilazy commented Jul 18, 2024 • edited Loading

adrian-gierakowski commented Jul 18, 2024

emilazy commented Jul 18, 2024

emilazy commented Jul 20, 2024

centromere commented Aug 2, 2022 •

edited by fricklerhandwerk

Loading

ck3d commented Aug 3, 2022 •

edited

Loading

ElvishJerricco commented Jul 18, 2024 •

edited

Loading

emilazy commented Jul 18, 2024 •

edited

Loading