Skip to content

Consider consolidating conda solves in CI tests #22

Open
@charlesbluca

Description

Currently (using cuML as an example here), the conda test environment initialization for most CI jobs looks something like creating the test environment:

rapids-dependency-file-generator \
  --output conda \
  --file_key test_python \
  --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml

rapids-mamba-retry env create --force -f env.yaml -n test

And then downloading and installing build artifacts from previous jobs on top of this environment:

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

...

rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  libcuml cuml

In addition to forcing us to eat the cost of a second conda environment solve, in many cases this can cause some pretty drastic changes to the environment which can be blocking - for example, consider this cuML run which fails because conda is unable to solve a downgrade from Arrow 15.0.0 (build 5) to 14.0.1.

Our current workaround for this is to manually add pinnings to the testing dependencies initially solved such that the artifact installation can be solved, but this can introduce a lot of burden in needing to:

  • identify what packages/changes are blocking artifact installation
  • open PR(s) modifying the impacted repos
  • follow up on each impacted repo to potentially remove the pinning later on

Would it be possible to consolidate some (or all) of these conda environment solves by instead:

  1. downloading the conda artifacts before creating the environment
  2. updating the dependencies.yaml (or, if this isn't possible, the generated conda environment file) to include the desired packages, making sure to explicitly specify source channel to ensure we're picking up the build artifacts
  3. creating the environment with this patched file

In my mind, the main blocker I could see to this working would be if rapids-download-conda-from-s3 requires some conda packages contained in the testing environment to work.

EDIT: Updating to capture state of other projects:

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

epicimprovementImproves an existing functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions