-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libstore: add load-limit setting to control parallelism #11143
base: master
Are you sure you want to change the base?
Conversation
Untested Nixpkgs PR to show the other side of the interface: NixOS/nixpkgs#328677. |
Oops, looks like some of the dead code I cleaned up before pushing was actually load‐bearing, thanks C++ 🙃 Working on it… |
cdadc73
to
58e5be9
Compare
Closes: NixOS#7091 Closes: NixOS#6855 Closes: NixOS#8105 Co-authored-by: Alex Wied <[email protected]>
58e5be9
to
34f2477
Compare
I don’t understand why the test is failing only on Linux. Do I need to do more for cache‐busting? |
Thanks for the work. I can't wait to drop my local patches. |
# Release X.Y (202?-??-??) | ||
|
||
- Add a `load-limit` setting to control builder parallelism. This has | ||
also been backported to the 2.18 and later release branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release notes should now be done as separate files in doc/manual/rl-next/
, they'll be concatenated when we do a release.
Needs a decision to either merge or close. How is the experience for those who have been carrying this patch? |
The patch doesn’t really do anything currently, because it depends on the linked changes to the Nixpkgs standard environment, which depend on this PR being accepted. (Perhaps @aviallon rebuilds the world locally to be able to use this, or just hacks it in for specific problem builds?) However, it allows people to regain the behaviour from before the unconditional load limit in stdenv was removed a few years ago for the benefit of Hydra. Many people I have talked to found the old behaviour to be far preferable on workstation machines when doing large‐scale builds, which is why this is the third pull request now trying to work with Nix to add proper support for this feature. I have not yet subjected NixOS/nixpkgs#328677 to intense scrutiny from Nixpkgs reviewers because of not yet knowing whether this PR will go anywhere, but from what I can tell from talking about it in the development channels and reading old PRs and issues, nobody objects to the basic idea and a good number of people are eager to see this functionality. This isn’t ready to merge because I need to address the review comment but also because I’m waiting on responses to all my questions in the PR message and #11143 (comment) from people who understand the Nix codebase better than me. If progress is made on those, I’ll update this PR accordingly and seek wider review of my stdenv changes. |
Idea approved. This is ready for wider review and testing against Nixpkgs and larger workloads. Small changes for docs requested above and test fixups. Assigning @edolstra for review. |
Sorry for the spam, but is there any reason why this got stale? To answer @emilazy, what I did was hacking this feature in some specific problematic packages (hello anything using LTO), by modifying |
I don’t know why this has stalled. I’ve been waiting for my questions from the PR message to be answered and for general review from @edolstra. Testing would be great, but parts of the approach may presumably have to change depending on the review and the answers to the things I’m uncertain about. Review of how the approach works out on the Nixpkgs side in NixOS/nixpkgs#328677 would also be good. |
Motivation
This is a rework of #6855. It isn’t a perfect solution (e.g. if you're building a bunch of Rust stuff they’ll still use as many cores as they want), and it’d be great to have an implementation of the new and improved GNU Make job server protocol at some point, but it seems like there’s widespread desire for this to come back in some form as it provides a much better UX for workstation users doing large builds, especially when you don’t have copious amounts of spare RAM. I have a draft Nixpkgs stdenv change to use this variable that I will post and link here.
I would like to backport this down to 2.18 so that NixOS and nix-darwin users get a good experience out of the box. I can handle the manual backport work for any versions the bot fails on.
Context
Some notes and questions:
I kept the default at
getDefaultCores()
rather than using thecores
setting. That seems like a reasonable middle ground: system load is a bit of a laggy, imprecise measure, so usingcores
directly would likely make CPU go to waste by default, but most workstation users probably want to keep total CPU utilization below 100% so they can still use their machine. This way, we don’t regress on the concerns laid out in stdenv/setup.sh: fix parallel make nixpkgs#174473 and treewide: drop -l$NIX_BUILD_CORES nixpkgs#192447. It’d also be kind of a fuss code‐wise to make the default depend on another setting like that. I’m open to bikeshedding on this, though;none
and the value ofcores
also seem like plausible defaults, although I think the former would be optimizing too much for Hydra over the UX for people with normal machines that aren’t exclusively used for build farms. Given that the default forcores
is to use all the cores on the machine, and that limitingcores
doesn’t limit total system load at all right now, I think this is probably fine.I’m a bit unsure about how
getDefaultCores()
interacts with remote builders, and the difference between havingcores
unset and settingcores = 0
. If you have remote builders and the defaultcores
, does each remote builder limit to the number of cores on the machine doing the evaluation? Surely not, right? The Nixpkgs stdenv explicitly handlescores = 0
, normalizing it to the result of$(nproc)
. Is this just a backwards‐compatibility hack and there’s no reason forcores = 0
these days? I want to understand if there’s any reason for bothcores = ‹default›
andcores = 0
to exist, so that I can understand if it makes sense to mirror this behaviour withload-limit = 0
as my current changes do.GNU Make and Ninja have inconsistent interpretation of
0
and-1
; with GNU Make-l0
means something you’d never want and-l-1
disables the limit, and with Ninja-l0
disables the limit and I think-l-1
might be invalid. I decided that it would be better to have an explicit non‐integer value to disable the load limit entirely, and to makeload-limit = 0
the same ascores = 0
, but obviously (3) is a load‐bearing consideration here.Relatedly, is
load-limit = none
okay? Nothing uses that syntax currently. It seemed less obscure than using the empty string.I don’t understand what the
src/libstore/daemon.cc
thing is doing. I copied it from libstore+nix-build: add load-limit setting and use its value for NIX_LOAD_LIMIT en var #8105. Please tell me what it does and if it’s good or bad!Closes: #7091
Closes: #6855
Closes: #8105
cc @fricklerhandwerk @centromere @aviallon @reckenrode @ElvishJerricco
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.