Switch from `conda-build` to `rattler-build` #374

gforsyth · 2025-02-13T19:39:35Z

Porting over the conda-builds to rattler-build

Contributes to rapidsai/build-planning#47

gforsyth · 2025-02-13T19:40:41Z

conda/recipes/ucxx/recipe.yaml

+      version: ${{ version }}
+    build:
+      string: cuda${{ cuda_major }}_${{ date_string }}_${{ head_rev }}
+      script: install_libucxx.sh


There are comments in this build-script that I'm not sure how to preserve in the script directive here, so I've opted to leave that standalone script in place.

gforsyth · 2025-02-13T23:04:01Z

conda/recipes/ucxx/recipe.yaml

+    host:
+      - cuda-version =${{ cuda_version }}
+      - if: cuda_major == "11"
+        then: cuda-cudart-dev =${{ cuda_version }}


This fixed cuda11 builds for me locally. See below for the failure without this pin where ucxx import tests are looking for libcudart.so.12.
Not sure if cuda-cudart-dev is constrained (or unconstrained) in some weird way version-wise, or if rattler-build is doing something unexpected re: version solving

( from https://github.com/rapidsai/ucxx/actions/runs/13318054370/job/37196857183)

│ │ Installing test environment │ │ Successfully updated the test environment │ │ + python $SRC_DIR/conda_build_script.py │ │ Traceback (most recent call last): │ │ File "$SRC_DIR/conda_build_script.py", line 1, in <module> │ │ import ucxx │ │ File "$PREFIX/lib/python3.10/site-packages/ucxx/__init__.py", line 30, in <module> │ │ from . import exceptions, types, testing # noqa │ │ File "$PREFIX/lib/python3.10/site-packages/ucxx/exceptions.py", line 4, in <module> │ │ from ._lib.libucxx import ( # noqa │ │ File "$PREFIX/lib/python3.10/site-packages/ucxx/_lib/__init__.py", line 4, in <module> │ │ from .libucxx import _create_exceptions │ │ ImportError: libcudart.so.12: cannot open shared object file: No such file or directory │ │ × error Script failed with status 1 │ │ × error Work directory: '/tmp/.tmpn812nb' │ │ × error To debug the build, run it manually in the work directory (execute the `./conda_build.sh` or `conda_build.bat` script)

tl;dr CUDA 11 packages aren't great.

I'm not actually sure what exactly is happening right now, but I think what is happening is that you are winding up with unconstrained nvidia channel packages. This shows up in CI even on the latest (passing) CUDA 11 builds like this one. If you look at the cuda-cudart-dev packages on conda-forge there aren't any for CUDA<12. Presumably we're being saved at runtime by something else bringing in cudatoolkit, namely probably librmm. You'll note that in the old recipe the condition for cuda-cudart-dev was {% if cuda_major != "11" %}. For CUDA 11 we are still reliant on including some CUDA libraries on the host system (not installed via conda).

Let's flip this constraint to != 11 and try again.

Separately, we should double-check if we even need to include the nvidia channel any more. I've lost track of the current state of things and would need to take 20 mins to refresh myself but I believe we may not need it any more.

(^everything Vyas said, and...)

CUDA 12 builds should not need the nvidia channel for anything. The only CUDA 11 packages we need from the nvidia channel are math libraries, so that we can get the C++ headers that don't ship with conda-forge::cudatoolkit for CUDA 11. I think that only affects RAFT/cuML/cuGraph/cuVS.

We are currently only including the nvidia channel for consistency with CUDA 11, and will drop it from our build commands once CUDA 11 is no longer supported. Perhaps we should even consider dropping nvidia now, for the packages that don't need its math library headers.

What would we prefer?

drop -c nvidia from the rapids-rattler-channel-string and then add it piecemeal to raft, cuml, cugraph, and cuvs?

wait until we drop cuda 11

I think rapids-rattler-channel-string should check the value of RAPIDS_CUDA_VERSION and only add --channel nvidia on CUDA 11. We can implement that as part of the rattler-build migration, no need to wait until we drop CUDA 11.

Or, we can just drop --channel nvidia from rapids-rattler-channel-string entirely and add it to the repos where it is needed. I am okay with either approach.

The version of this package should be constrained by cuda-version. We should never pin CTK package versions in our recipes (for CUDA 12 only -- CUDA 11 requires exact pinnings like these). CUDA 12 packages should always float to whatever is allowed by cuda-version.

Suggested change

then: cuda-cudart-dev =${{ cuda_version }}

then: cuda-cudart-dev

gforsyth · 2025-02-18T22:43:54Z

conda/recipes/ucxx/recipe.yaml

+      ignore_run_exports:
+        by_name:
+          - cuda-version


Without this explicit cuda-version ignore, the libucxx-examples output was getting a runtime dependency of cuda-version>12.8,<13 -- inherited from the compiler('cuda'):

│ │ Run dependencies: │ │ ╭──────────────────┬────────────────────────────────────────────────────────╮ │ │ │ Name ┆ Spec │ │ │ ╞══════════════════╪════════════════════════════════════════════════════════╡ │ │ │ Run dependencies ┆ │ │ │ │ libucxx ┆ ==0.43.0a37 cuda12_250218_2a6940ca (PS) │ │ │ │ libgcc ┆ >=13 (RE of [cache-build: gcc_linux-64]) │ │ │ │ __glibc ┆ >=2.28,<3.0.a0 (RE of [cache-build: sysroot_linux-64]) │ │ │ │ cuda-version ┆ >=12.8,<13 (RE of [cache-build: cuda-nvcc_linux-64]) │ <---- here │ │ │ libstdcxx ┆ >=13 (RE of [cache-build: gxx_linux-64]) │ │ │ │ libgcc ┆ >=13 (RE of [cache-build: gxx_linux-64]) │ │ │ │ ucx ┆ >=1.15.0,<1.16.0a0 (RE of [cache-host: ucx]) │ │ │ │ python_abi ┆ 3.12.* *_cp312 (RE of [cache-host: python]) │ │ │ │ librmm ┆ >=25.4.0a26,<25.5.0a0 (RE of [cache-host: librmm]) │ │ │ ╰──────────────────┴────────────────────────────────────────────────────────╯

this meant that in the test_cpp job running on cuda 12.0, the just-built package for examples was invalid, leading to the failures.

This looks to me like outputs in rattler-build are inheriting runtime dependencies from the build cache, which is usually fine but in this case is not what we want.

We had a recent conversation about ARM / Tegra CUDA support in conda-forge. We may be adding new run-exports to the CUDA compiler packages on ARM.

Currently we use ignore_run_exports_from the CUDA compiler package. Instead, we should ignore_run_exports: on just cuda-version. So what you're saying here is right, and we should do this universally for all packages that support CUDA compatibility within the major version (that applies to all of RAPIDS).

This reverts commit dc5fb01.

msarahan · 2025-02-20T15:37:51Z

ci/build_cpp.sh


 sccache --show-adv-stats

+# remove build_cache directory
+rm -rf "$RAPIDS_CONDA_BLD_OUTPUT_DIR"/build_cache


This seems like something that should be an ignore pattern in our upload script instead of something that every build script has to include.

Agreed. We opened an issue upstream, too, to add a CLI option to clear the build-cache post-build: prefix-dev/rattler-build#1424

vyasr · 2025-02-20T16:37:57Z

conda/recipes/ucxx/recipe.yaml

+    host:
+      - cuda-version =${{ cuda_version }}
+      - if: cuda_major == "11"
+        then: cuda-cudart-dev =${{ cuda_version }}


tl;dr CUDA 11 packages aren't great.

I'm not actually sure what exactly is happening right now, but I think what is happening is that you are winding up with unconstrained nvidia channel packages. This shows up in CI even on the latest (passing) CUDA 11 builds like this one. If you look at the cuda-cudart-dev packages on conda-forge there aren't any for CUDA<12. Presumably we're being saved at runtime by something else bringing in cudatoolkit, namely probably librmm. You'll note that in the old recipe the condition for cuda-cudart-dev was {% if cuda_major != "11" %}. For CUDA 11 we are still reliant on including some CUDA libraries on the host system (not installed via conda).

Let's flip this constraint to != 11 and try again.

Separately, we should double-check if we even need to include the nvidia channel any more. I've lost track of the current state of things and would need to take 20 mins to refresh myself but I believe we may not need it any more.

conda/recipes/ucxx/recipe.yaml

vyasr · 2025-02-20T16:48:40Z

conda/recipes/ucxx/recipe.yaml

+      name: distributed-ucxx
+      version: ${{ version }}
+    build:
+      string: py${{ python }}_${{ date_string }}_${{ head_rev }}


In the general case (i.e. for the rest of RAPIDS) I'd consider rapidsai/build-planning#43 out of scope for the rattler-build work since it requires reworking our shared workflows. For ucxx specifically, we should take at least a quick look at solving it here. The reason that I suggest that is that for ucxx, distributed-ucxx is (AFAIK) a pure Python package that is built within the same recipe as a non-pure Python package, whereas everywhere else those packages are separate conda builds. It should not be constrained by the cpython ABI below, for example. If the conclusion winds up just being that we need to split this package out to properly support building a pure Python wheel that is fine too, but I think it's worth thinking a bit about it at least briefly right now.

building it as noarch now and that seems to be working

Co-authored-by: Vyas Ramasubramani <[email protected]>

bdice · 2025-02-21T01:49:23Z

conda/recipes/ucxx/recipe.yaml

+          - cuda-version
+    tests:
+      - package_contents:
+          files:


Can we alphabetically sort this whole list of files?

(Also we used to have lists of files to verify for cudf but we dropped it there because it was annoying to maintain. But I am okay with the status quo here.)

bdice · 2025-02-21T01:51:05Z

conda/recipes/ucxx/recipe.yaml

+      ignore_run_exports:
+        from_package:
+          - cuda-cudart-dev
+          - python
+          - ucx
+        by_name:
+          - cuda-version


None of these packages are in the build/host sections, so these run-exports shouldn't exist in the first place. Can we delete this block?

Suggested change

ignore_run_exports:

from_package:

- cuda-cudart-dev

- python

- ucx

by_name:

- cuda-version

bdice · 2025-02-21T01:51:50Z

conda/recipes/ucxx/recipe.yaml

+      ignore_run_exports:
+        from_package:
+          - cuda-cudart-dev
+          - python
+          - ucx
+        by_name:
+          - cuda-version


Same as above. Nothing in build/host for this package has run-exports that we need to ignore. We are manually defining the run dependency on cuda-cudart.

Suggested change

ignore_run_exports:

from_package:

- cuda-cudart-dev

- python

- ucx

by_name:

- cuda-version

gforsyth added the improvement Improves an existing functionality label Feb 13, 2025

gforsyth requested a review from a team as a code owner February 13, 2025 19:39

gforsyth added the non-breaking Introduces a non-breaking change label Feb 13, 2025

gforsyth requested a review from a team as a code owner February 13, 2025 19:39

gforsyth requested a review from KyleFromNVIDIA February 13, 2025 19:39

gforsyth commented Feb 13, 2025

View reviewed changes

gforsyth force-pushed the rattler-build branch from 09f42c5 to cf0e6cb Compare February 13, 2025 20:34

gforsyth commented Feb 13, 2025

View reviewed changes

gforsyth force-pushed the rattler-build branch from 0b95496 to be7ce53 Compare February 18, 2025 18:35

gforsyth commented Feb 18, 2025

View reviewed changes

gforsyth added 20 commits February 19, 2025 13:39

Switch from conda-build to rattler-build

fca047d

Make sure cmake is available on output host

df73928

Check package contents

58eaa03

fix rmm pinning, add make, standardize quotes

2e83626

Add build environment for ucxx output

2ef072f

Explicit cuda-cudart-dev version

0717789

pin cuda-version in examples host

25a0e70

(HACK): force exact pinning in env.yaml

73b4826

force prepended channel name

f995245

revert to sequential solves

952b2eb

Revert "revert to sequential solves"

e27f89a

This reverts commit dc5fb01.

Remove sed hacking (doesn't work anyway)

cca3956

don't prepend cuda to examples build string

d6d413f

add missing run exports to libucxx

1338b4e

fix distributed-ucxx build string

c4365d5

remove cuda-version from examples host

170be1a

explicit build environments per output

d3e28c1

ignore strict cuda-version pinning in examples

7753854

explicit run ignore of cuda-version

5744f54

remove duplicate build env

4d2e89f

gforsyth added 2 commits February 19, 2025 13:39

ignore cuda run exports in distributed-ucxx

d6df972

Restore cmake to tests output

fe45008

gforsyth force-pushed the rattler-build branch 2 times, most recently from 8e43602 to fe45008 Compare February 19, 2025 18:50

explicit build env for ucxx

9eb378b

gforsyth force-pushed the rattler-build branch from 7fc99b8 to 9eb378b Compare February 19, 2025 19:09

standardize on ignoring cuda-version instead of compiler('cuda')

3f5e45f

msarahan approved these changes Feb 20, 2025

View reviewed changes

vyasr reviewed Feb 20, 2025

View reviewed changes

gforsyth and others added 5 commits February 20, 2025 12:08

Remove python dir cmake arg

bfa2734

Co-authored-by: Vyas Ramasubramani <[email protected]>

flip condition for cudart-dev

c59b00b

Remove unneeded conda channel configuration

1037b03

inline build.sh

0fe315a

Generate noarch distributed_ucxx

238fb68

vyasr mentioned this pull request Feb 20, 2025

Build conda packages using rattler-build #248

Closed

ignore run exports cuda-cudart-dev

1bf13d1

gforsyth removed the request for review from KyleFromNVIDIA February 20, 2025 18:40

gforsyth added 5 commits February 20, 2025 14:50

Don't specify ignore_run_exports for noarch

25ee3d4

ignore cudart export from cache-host

27b6260

ignore exports from ucx, fix py version string

603a907

disable non-conda workflows

6a49743

ignore exports from python for c++ outputs

bba2275

bdice reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from `conda-build` to `rattler-build` #374

Switch from `conda-build` to `rattler-build` #374

gforsyth commented Feb 13, 2025 •

edited

Loading

gforsyth Feb 13, 2025

gforsyth Feb 13, 2025 •

edited

Loading

vyasr Feb 20, 2025

bdice Feb 20, 2025

gforsyth Feb 20, 2025

bdice Feb 21, 2025 •

edited

Loading

bdice Feb 21, 2025

gforsyth Feb 18, 2025

bdice Feb 21, 2025

msarahan Feb 20, 2025

gforsyth Feb 20, 2025

vyasr Feb 20, 2025

vyasr Feb 20, 2025

gforsyth Feb 20, 2025

bdice Feb 21, 2025

bdice Feb 21, 2025

bdice Feb 21, 2025 •

edited

Loading

	then: cuda-cudart-dev =${{ cuda_version }}
	then: cuda-cudart-dev

Switch from conda-build to rattler-build #374

Are you sure you want to change the base?

Switch from conda-build to rattler-build #374

Conversation

gforsyth commented Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

gforsyth Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Switch from `conda-build` to `rattler-build` #374

Switch from `conda-build` to `rattler-build` #374

gforsyth commented Feb 13, 2025 •

edited

Loading

gforsyth Feb 13, 2025 •

edited

Loading

bdice Feb 21, 2025 •

edited

Loading

bdice Feb 21, 2025 •

edited

Loading