Skip to content

Upload lock held for longer than necessary #11083

Open
@jkarni

Description

Describe the bug

When building with remote builders, very often only one build will make progress; other ones are stuck for quite a long time. With sudo lslocks | grep upload, you can see that processes are waiting on the .upload-lock for the machine (and stracing them confirms that they're blocked on flock(5, LOCK_EX). This can take a very long time. Interestingly, stracing the process that does have the lock seems to indicate that it's past the upload phase anyway - I see hundred of thousands lines with type 105:

write(2, "@nix {\"action\":\"result\",\"fields\":[221366008,0,0,0],\"id\":17878179326722332,\"type\":105}\n", 86) = 86

This could I suppose this could be happening in parallel to trying to copy files, though I somewhat doubt that.

Steps To Reproduce

  1. Enable a remote builder
  2. Kick off a bunch of simultaenous builds with --max-jobs 0
  3. Keep track of .upload-locks in lslocks
  4. Strace the processes to see what they're doing

Expected behavior

I expect the builds to start building more quickly

nix-env --version output

nix-env (Nix) 2.18.4

Additional context

Some investigation shows this is happening here. The original motivation for this logic, according to comments in the Perl precursor to this module from a decade ago, is to prevent multiple processes from trying to copy the same derivation over and over again.

It seems like the lock is potentially held too long. But moreover, it's too "big" a lock - we should probably only have a lock per store path + remote. And the alarm of 15 minutes also seems very long.

Priorities

Add 👍 to issues you find important.

Metadata

Assignees

No one assigned

    Labels

    performanceremote buildThe SSH store, ssh:, ssh-ng:, ... (split from protocol label 2024-07)storeIssues and pull requests concerning the Nix store

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions