Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add haskell.lib.incremental utility #204020

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Gabriella439
Copy link
Contributor

This adds a new incremental utility for Haskell CI that supports incremental builds based on the approach outlined in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build for a package, we split every build into two builds:

  • A full build at an older point in time

    e.g. a daily or weekly time boundary

  • An incremental build relative to the last full build

    This incremental build reuses the build products left over from the most recent full build.

In order to do this, though, we need a way to "snap" a package's git source input to an earlier point in time (e.g. a daily boundary or weekly boundary). This would allow multiple incremental builds to share the same full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to extend Nix's builtins.fetchGit to support a new date argument and you can find the corresponding PR for that here:

NixOS/nix#7362

That is why the incremental utility added here requires a sufficiently new version of Nix (one that would incorporate that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix to GHC's change detection logic, as described in more detail in the above blog post.

However, if you satisfy those requirements then this works exactly the way you'd expect: all of the incremental builds only have to build the diff since the last time boundary. Moreover, if CI caches the full build then developers can also run nix build locally and only have to build the diff, too.

@Gabriella439
Copy link
Contributor Author

This is a redo of #12659

@Gabriella439
Copy link
Contributor Author

Gabriella439 commented Dec 1, 2022

Note that the base branch for this is master for now instead of haskell-updates as there are a large number of build failures for us internally on the latest haskell-updates branch and I'm using our internal repository for stress testing this. Once we upgrade internally to build against the latest haskell-updates builds I'll switch this PR's base branch

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Dec 1, 2022
@@ -26,6 +26,7 @@ in
, doBenchmark ? false
, doHoogle ? true
, doHaddockQuickjump ? doHoogle && lib.versionAtLeast ghc.version "8.6"
, doInstallDist ? false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first reading, I thought this flag was going to be about sdist.
I've never understood why the directory is called dist. It is not for distributable files, although I can see a distant connection.

So for the output name, which I believe we should try to standardize beyond just haskellPackages, I think we should find a distinct name that isn't as distorted, so as not to disturb users who expect "distributable" files in such an output. #167670 uses a derivation name suffix incrementalBuildArtifacts rather than an output name, but it is rather distended. Perhaps that name could be distilled into something shorter.

(I hope my use of dist- words was not too distracting.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think incrementalBuildArtifacts can definitely be shortened to at least buildArtifacts (since they don't necessarily need to be used for incremental builds) and it's still an accurate name. The question is if that could be shortened even further to just build or artifacts.

I would personally try to get away with shortening that to just artifacts since I believe that name has the right connotation: "artifacts" tends to mean historic byproducts that were ancillary to the final outcome (in both a programming context and non-programming context). Also, the build in buildArtifacts seems superfluous since everything produced by a Nix derivation is part of a build of some sort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to artifacts referring to what's typically already in the outputs, rather than the intermediate files that do not need to be exposed for a package to be useful.
How about intermediate?

historic byproducts that were ancillary to the final outcome

Isn't this true of all software, no matter how complete?

the build in buildArtifacts seems superfluous [...]

It may refer to buildPhase, but that's not obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like intermediate. Or maybe intermediates (as a contraction of intermediateFiles)

@Gabriella439 Gabriella439 force-pushed the gabriella/incremental branch 2 times, most recently from 696e082 to 4cfd4e1 Compare December 18, 2022 16:11
@roberth
Copy link
Member

roberth commented Dec 19, 2022

Is it possible to "merge" this feature into #167670?

Having multiple interfaces to achieve the same thing would be bad.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-support-for-incremental-haskell-builds/24115/2

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-support-for-incremental-haskell-builds/24115/7

@9999years
Copy link
Contributor

NB: Some of the changes from this PR were split out into #227985 and merged there.

This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.

Lower required version

… so that it works against the upstream PR

I'll change the required version to an official release
if the PR is merged.

s/pkgs/pkg/g

… as caught by @cdepillabout

Co-authored-by: Dennis Gosnell <[email protected]>

Skip the use of `tar`

We can store the `dist` directory decompressed, which
speeds up the dist export/import

This potentially requires more disk space *but* by storing
the files unpacked it may actually improve disk utilization
in some cases if `auto-optimise-store` is enabled by
permitting deduplication of `dist` files.

Add an `installDist` phase

… which is disabled by default

The motivation for this is to bring the behavior of
`enableSeparateDistOutput` more in line with the other
options where it doesn't change *whether* or not something
is exported, but rather *where* it is exported.

Now `installDist` controls whether or not the `dist`
directory is exported.

Based on this discussion:

https://github.com/NixOS/nixpkgs/pull/203499/files#r1034150076

Document `interval` argument

… as suggested by @cdepillabout

s/for use for/for use with/

… based on feedback from @MaxGabriel

Move `installDistPhase` to `postPhases`

There are two reasons for doing this:

- We can get rid of the hack to remove the dist output from the outputs

- We can ensure that any changes that happen in the install phase are
  correctly reflected in the `dist` export

Disable dylib workaround for incremental build

Improve correctness of `incremental` function

Typically we don't want to just roll back the source code that is the
input for the Haskell package because the dependencies for the package
may have changed

In other words, if you roll back the source code for the top-level
package without also rolling back the Nix-supplied dependencies
for that build then you run the risk of an unexpected build failure
(due to an older version of the Haskell package being built against
a newer version of the Nix-supplied dependencies).

What you actually want to do is to roll back the entire repository
(i.e. the Haskell source code and the supporting Nix code) to ensure
that the Haskell source code and Nix code stay in sync.

This more generalized rollback complicates the UX for the
`incremental` function.  I did my best to try to streamline
the UX so that the user just needs to specify how to locate the
matching (older) package after a rollback.

Make date relative to revision (if possible)

This way if you attempt to incrementally build an older revision
then the full rebuild will be relative to the older revision
instead of being relative to the present.

Add `extraFetchGitArgs` option

This in particular comes in handy if you want to specify
`ref = "main";` to ensure that the older build comes from
the `main` branch of your repository.
@9999years 9999years force-pushed the gabriella/incremental branch from f74b788 to 37ec2fa Compare June 7, 2023 17:57
@github-actions github-actions bot added 6.topic: agda "A dependently typed programming language / interactive theorem prover" 6.topic: cinnamon Desktop environment 6.topic: emacs Text editor 6.topic: erlang 6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: golang 6.topic: mate The MATE Desktop Environment 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: ocaml labels Jun 7, 2023
@github-actions github-actions bot removed 6.topic: policy discussion 6.topic: cinnamon Desktop environment 6.topic: emacs Text editor 6.topic: golang 6.topic: printing 6.topic: pantheon The Pantheon desktop environment 6.topic: ocaml 6.topic: systemd 6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: xfce The Xfce Desktop Environment 8.has: documentation This PR adds or changes documentation 6.topic: agda "A dependently typed programming language / interactive theorem prover" 6.topic: python 6.topic: vscode 6.topic: ruby 6.topic: qt/kde 8.has: changelog 6.topic: TeX Issues regarding texlive and TeX in general 6.topic: vim 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: rust labels Jun 14, 2023
@anka-213
Copy link
Contributor

I wonder if you could hack this using IFD without needing NixOS/nix#7362? If you have a derivation that builds a fetchGit expression, I believe that as long as it generates the same output, the derivation that imports it will get the same hash and so not need to be rebuilt.


For example, you could do something like this:

 let rev = import (runCommand "hello.nix" {} ''mkdir $out;rev=$(${git}/bin/git --bare --git-dir=${./.git} log '--before={'$(${coreutils}/bin/date --iso-8601 -u)'T00:00Z}' --format=%H -n1) ; echo '"'$rev'"' > $out/default.nix'')
 in fetchGit { url = ./.; rev= rev;}

Now, this is absolutely horrible and very impure. It also copies the whole .git directory to the store every time you run it, but it seems to work (as long as you have the repo checked out in the current directory). It would probably be better to give git a url to clone from, but that would still require a semi-shallow clone (e.g. with git clone --filter=tree:0) each time you run it, with no way to cache the operation.

It was worse than I first thought, since I had forgotten that derivations don't have any access to git info (since that would be impure and cause needless recompilation).

@Gabriella439
Copy link
Contributor Author

@anka-213: Maybe, but what's the point of doing this in a hacky way when it could be done in a principled way? There is already a PR that implements the principled way with tests and everything

@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Oct 3, 2023
@pwm
Copy link

pwm commented Mar 25, 2024

@Gabriella439 what's the relation of this PR to #227985 which got merged? Do I need both for incremental builds? Thanks in advance, I'm keen to have our CI build incrementally.

@Gabriella439
Copy link
Contributor Author

Yes, you need both PRs to support incremental builds, but this PR is blocked on NixOS/nix#7362 being merged upstream

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
@FliegendeWurst FliegendeWurst added the 2.status: blocked by pr/issue Another PR or issue is preventing this from being completed label Dec 12, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Dec 12, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: blocked by pr/issue Another PR or issue is preventing this from being completed 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: haskell 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Development

Successfully merging this pull request may close these issues.

9 participants