Description
Describe the bug
I've recently started noticing some conda-python-tests-singlegpu
CI jobs on PRs here taking 4+ hours to complete.
For example, on a recent run for #6238, I see:
11.8.0, 3.10, amd64, rockylinux8, v100, earliest-driver, oldest-deps
(still running - 5h11m)12.5.1, 3.12, amd64, ubuntu24.04, v100, latest-driver, latest-deps
(finished - 1h42m)12.0.1, 3.11, arm64, ubuntu20.04, a100, latest-driver, latest-deps
(finished - 57m58s)
Note that those timing do not include time spent waiting for a runner to pick up the jobs... they only include actual time occupying a runner.
Steps/Code to reproduce bug
I haven't narrowed this down further than CI yet.
But observing this on multiple PRs in the last few hours. Seeing this on PRs like #6328 that are just removing extraneous whitespace makes me think that this is about something that's changed recently in cuML or external to cuML, not the content of those specific PRs.
Other recent runs have this behavior too:
- introduce libcuml wheels #6199 (comment)
- https://github.com/rapidsai/cuml/actions/runs/12931792996/job/36069082783
Expected behavior
Expected these test jobs to take 1-2 hours to complete, based on results from clicking around the last few days of successes at https://github.com/rapidsai/cuml/actions/workflows/pr.yaml?query=is%3Asuccess.
Environment details (please complete the following information):
N/A - RAPIDS CI
Additional context
N/A