stdenv/setup.sh: fix parallel make #174473

markuskowa · 2022-05-25T11:12:28Z

Description of changes

The stdenv setup phase set both the j and the l option for make to $NIX_BUILD_CORES (e.g. nix-build's --cores option).
However, the l option sets an upper bound for the system load. If this load is exceeded, make basically runs with -j 1.
This leads to unwanted behavior and slow builds. For example: on a system with 48 cores and a load of 24, make will
only run one job at a time if $NIX_BUILD_CORES is set to less than 24, leaving the system under utilized.
It is not clear to me why l should be set to $NIX_BUILD_CORES. This PR removes the l option from the setup phase.

For reference from the GNU make manual:
"When the system is heavily loaded, you will probably want to run fewer jobs than when it is lightly loaded. You can use the ‘-l’ option to tell make to limit the number of jobs to run at once, based on the load average. The ‘-l’ or ‘--max-load’ option is followed by a floating-point number. For example,

-l 2.5

will not let make start more than one job if the load average is above 2.5. The ‘-l’ option with no following number removes the load limit, if one was given with a previous ‘-l’ option."

Things done

SuperSandro2000 · 2022-05-25T20:48:02Z

pkgs/stdenv/generic/setup.sh

@@ -1075,7 +1075,7 @@ buildPhase() {
        # Old bash empty array hack
        # shellcheck disable=SC2086
        local flagsArray=(
-            ${enableParallelBuilding:+-j${NIX_BUILD_CORES} -l${NIX_BUILD_CORES}}


Did you think about setting this to -l 2.5 or maybe -l 5.0? I am not sure if that would make sense or not.

A constant value would not make sense here IMHO. Which value makes sense rather depends on the system and its configuration (i.e. number of cores and nix's max-jobs/cores): A value of 2.0 may be a good choice for a laptop but not for a big server with lots of CPU cores.
If max-jobs is set to the number of cores in the system, it could make sense to set -l <number cores> to avoid overly high system loads.
I would be interesting to know the motivation why -l${NIX_BUILD_CORES} was set here in the first place. Maybe @Ericson2314 knows more?

Isn't -l${NIX_BUILD_CORES} a good protection against overloading/DoS-ing build machines? Maybe it could be configurable at runtime, but I personally think the default is good.

It is good to have safeguards in place. However, when the safeguards cause my builds to run with the equivalent of -j1 on a 64-core machine, it is no longer feasible to use Nix in any way in a professional context.

cause my builds to run with the equivalent of -j1 on a 64-core machine

Are you sure? I would imagine it starting out with 64 jobs and if/when system load > 64, then new jobs are delayed until load falls below 64. But that'd mean it should run the overall build with >>1 job.

It would be cool to visualize it :-)

I am not able to run builds with cores = 64 because the OOM killer is invoked:

If I run the builds with cores = 8, -j8 -l8 will be passed to make. This is not good because the system has a load average higher than 8, which causes the builds to slow to a crawl.

Isn't -l${NIX_BUILD_CORES} a good protection against overloading/DoS-ing build machines? Maybe it could be configurable at runtime, but I personally think the default is good.

It is certainly a protection against overloading. However, the question if it is an efficient protection. The default may be good for the main Hydra build farm. On servers with mixed load, this default does not work not work well: E.g.: on a 48 core machine which dedicates half of its cores to a constant, non-build load, and the other half to nix-build jobs, this results in the named problem of gross under utilization. Running nix with -l24 will not result in the desired result of using 24 cores for the build job, but in nix (or make) to only use a single core.

@centromere @markuskowa: Good point.

centromere · 2022-07-15T07:23:49Z

Is there something that can be done to move this forward? I am currently affected by this bug.

SuperSandro2000

@Ericson2314 what do you think?

I am really unsure how this will interact with hydra.

markuskowa · 2022-07-15T11:25:39Z

@Ericson2314 what do you think?

I am really unsure how this will interact with hydra.

Feedback from someone familiar with the Hydra build farm (and its load problems) is absolutely needed here. To be clear: from what I can judge here, this certainly will have a non-negligible impact on the Hydra load patterns.

vcunat · 2022-07-16T05:57:48Z

This isn't just about hydra. And I don't think it's good to go without any -l limit. By default that would parallelize up to square of your number of cores, and that would be quite likely to exhaust RAM.

So indeed the point is to protect the machine from overloading, even though it's quite a crude method. My experience of using the current setting on a 32-core is OK, but I can imagine it could be considered limiting if you have lots of non-CPU load (e.g. from rotating drives).

For an easy step, I think it would be nice to allow to override make's -l by a setting separate from its -j which each machine could configure. (EDIT: sounds like #124166) For a better scheduling, different nix builds would have to "communicate together" by something better than this current-load metric. An attempt was in PR #143820

markuskowa · 2022-07-31T18:18:23Z

For an easy step, I think it would be nice to allow to override make's -l by a setting separate from its -j which each machine could > configure. (EDIT: sounds like #124166) For a better scheduling, different nix builds
would have to "communicate together" by something better than this current-load metric. An attempt was in PR #143820

I am closing this PR, since it is not mergable in its current from and would probably cause real trouble on Hydra's build farm.
The above mentioned options, such as making -l overridable, or even better, the job server solution seem to be much more appropriate.

centromere · 2022-08-01T17:00:34Z

@markuskowa Any change to stdenv will invalidate (almost) every nar in the cache, yes? If so, is it feasible to make any changes whatsoever to stdenv without causing pain to the build farm?

vcunat · 2022-08-01T17:19:24Z

No, we change stdenv several times a month. Mass rebuilds are common.

centromere · 2022-08-01T22:00:59Z

Okay. What do y'all think of this plan?

Add an integer nix.conf setting named core-limit.
Inject NIX_CORE_LIMIT in to the build environment:

env["NIX_CORE_LIMIT"] = (format("%d") % settings.coreLimit).str();

Rework stdenv as follows:

NIX_CORE_LIMIT="${NIX_CORE_LIMIT:-$NIX_BUILD_CORES}"
export NIX_CORE_LIMIT

local flagsArray=(
    ${enableParallelBuilding:+-j${NIX_BUILD_CORES}}
...
)

if ((NIX_CORE_LIMIT > 0)); then
    flagsArray+=("-l${NIX_CORE_LIMIT}")
fi

ck3d · 2022-08-02T07:22:07Z

I had the same solution in mind. Only the name "core limit" should be more aligned to make and ninja wording -> load average.

centromere · 2022-08-02T20:04:11Z

@ck3d @vcunat @markuskowa I've submitted some PRs to address this:

NixOS/nix#6855

#184886

stdenv/setup: fix parallel make

4df989c

markuskowa requested review from Ericson2314 and matthewbauer as code owners May 25, 2022 11:12

github-actions bot added the 6.topic: stdenv Standard environment label May 25, 2022

ofborg bot added 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 10.rebuild-linux-stdenv This PR causes stdenv to rebuild 10.rebuild-darwin: 501+ 10.rebuild-darwin: 5001+ 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ labels May 25, 2022

SuperSandro2000 reviewed May 25, 2022

View reviewed changes

SuperSandro2000 reviewed Jul 15, 2022

View reviewed changes

markuskowa closed this Jul 31, 2022

This was referenced Aug 2, 2022

libstore: Add load-limit setting to dynamically control parallelism NixOS/nix#6855

Open

stdenv: Pass -lN to GNU Make based on NIX_LOAD_LIMIT #184886

Closed

markuskowa deleted the fix-parallel-make branch August 12, 2022 09:11

lheckemann mentioned this pull request Sep 22, 2022

treewide: drop -l$NIX_BUILD_CORES #192447

Merged

13 tasks

This was referenced Jul 20, 2024

libstore: add load-limit setting to control parallelism NixOS/nix#11143

Open

{stdenv,ninja}: add support for NIX_LOAD_LIMIT #328677

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stdenv/setup.sh: fix parallel make #174473

stdenv/setup.sh: fix parallel make #174473

markuskowa commented May 25, 2022

SuperSandro2000 May 25, 2022 •

edited

Loading

markuskowa May 25, 2022 •

edited

Loading

bjornfor Jul 15, 2022

centromere Jul 15, 2022

bjornfor Jul 15, 2022

centromere Jul 15, 2022

markuskowa Jul 15, 2022

bjornfor Jul 15, 2022

centromere commented Jul 15, 2022

SuperSandro2000 left a comment

markuskowa commented Jul 15, 2022

vcunat commented Jul 16, 2022 •

edited

Loading

markuskowa commented Jul 31, 2022

centromere commented Aug 1, 2022

vcunat commented Aug 1, 2022

centromere commented Aug 1, 2022

ck3d commented Aug 2, 2022

centromere commented Aug 2, 2022

stdenv/setup.sh: fix parallel make #174473

stdenv/setup.sh: fix parallel make #174473

Conversation

markuskowa commented May 25, 2022

Description of changes

Things done

SuperSandro2000 May 25, 2022 • edited Loading

Choose a reason for hiding this comment

markuskowa May 25, 2022 • edited Loading

Choose a reason for hiding this comment

bjornfor Jul 15, 2022

Choose a reason for hiding this comment

centromere Jul 15, 2022

Choose a reason for hiding this comment

bjornfor Jul 15, 2022

Choose a reason for hiding this comment

centromere Jul 15, 2022

Choose a reason for hiding this comment

markuskowa Jul 15, 2022

Choose a reason for hiding this comment

bjornfor Jul 15, 2022

Choose a reason for hiding this comment

centromere commented Jul 15, 2022

SuperSandro2000 left a comment

Choose a reason for hiding this comment

markuskowa commented Jul 15, 2022

vcunat commented Jul 16, 2022 • edited Loading

markuskowa commented Jul 31, 2022

centromere commented Aug 1, 2022

vcunat commented Aug 1, 2022

centromere commented Aug 1, 2022

ck3d commented Aug 2, 2022

centromere commented Aug 2, 2022

SuperSandro2000 May 25, 2022 •

edited

Loading

markuskowa May 25, 2022 •

edited

Loading

vcunat commented Jul 16, 2022 •

edited

Loading