-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide docs for cuml.accel command line feature #6322
Draft
wphicks
wants to merge
12
commits into
rapidsai:branch-25.04
Choose a base branch
from
wphicks:docs/0cc
base: branch-25.04
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+363
−0
Draft
Changes from 2 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
e17cd69
feat: Add CUDA source file for file processing
wphicks e624e26
Add note on Pickle
wphicks 5fe5cf2
Make docs build work, add notes about kmeans
betatim 92c9e4b
Revert commit accidentally included from other work
wphicks e5471e9
More details on KMeans equivalence
betatim 0c66358
Merge pull request #5 from betatim/0cc-kmeans-docs
wphicks 1382653
Merge branch 'branch-25.04' into docs/0cc
wphicks d9206bc
Add warning about JIT compilation
wphicks 6f4e2c7
Document neighbors incompatibilities.
csadorf 28bc5c6
Merge pull request #6 from csadorf/docs/0cc-nearest-neighbors
wphicks 7b72480
Add 0cc documentation for `decomposition` models
jcrist 096e3e2
Merge pull request #10 from jcrist/0cc-decomp-docs
wphicks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
cuml.accel: Zero Code Change Acceleration for Scikit-Learn | ||
========================================================== | ||
|
||
Starting in RAPIDS 25.02, cuML offers a new way to accelerate existing code | ||
based on Scikit-Learn, UMAP-Learn, and HDBScan. Instead of rewriting that code | ||
to import equivalent cuML functionality, simply invoke your existing, | ||
unaltered Python script as follows, and cuML will accelerate as much of the | ||
code as possible with NVIDIA GPUs, falling back to CPU where necessary: | ||
|
||
.. code-block:: terminal | ||
python -m cuml.accel unchanged_script.py | ||
|
||
The same functionality is available in Jupyter notebooks using the | ||
following magic at the beginning of the notebook (before other imports): | ||
|
||
.. code-bloc:: jupyter | ||
%load_ext cuml.accel | ||
import sklearn | ||
|
||
**``cuml.accel`` is currently a beta feature and will continue to improve over | ||
time.** | ||
|
||
FAQs | ||
---- | ||
|
||
1. Why use cuml.accel instead of using cuML directly? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Many software lifecycles involve running code on a variety of hardware. Maybe | ||
the data scientists developing a pipeline do not have access to NVIDIA GPUs, | ||
but you want the cost and time savings of running that pipeline on NVIDIA GPUs | ||
in production. Rather than going through a manual migration to cuML every time | ||
the pipeline is updated, ``cuml.accel`` allows you to immediately deploy | ||
unaltered Scikit-Learn, UMAP-Learn, and HDBScan code on NVIDIA GPUs. | ||
Furthermore, ``cuml.accel`` will automatically fall back to CPU execution for | ||
anything which is implemented in Scikit-Learn but not yet accelerated by cuML. | ||
|
||
Additionally, ``cuml.accel`` offers a quick way to evaluate the minimum | ||
acceleration cuML can provide for your workload without touching a line of | ||
code. | ||
|
||
2. Why use cuML directly instead of cuml.accel? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
In many cases, ``cuml.accel`` offers enough of a performance boost on its own | ||
that there is no need to migrate code to cuML. However, cuML's API offers a | ||
variety of additional parameters that let you fine-tune GPU execution in order | ||
to get the maximum possible performance out of NVIDIA GPUs. So for software | ||
that will always be run with NVIDIA GPUs available, it may be worthwhile to | ||
write your code directly with cuML. | ||
|
||
Additionally, running code directly with cuML offers finer control over GPU | ||
memory usage. ``cuml.accel`` will automatically use `unified or managed memory <https://developer.nvidia.com/blog/unified-memory-cuda-beginners/>`_ | ||
for allocations in order to reduce the risk of CUDA OOM errors. In | ||
contrast, cuML defaults to ordinary device memory, which can offer improved | ||
performance but requires slightly more care to avoid exhausting the GPU VRAM. | ||
|
||
3. What does ``cuml.accel`` accelerate? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
``cuml.accel`` is designed to provide zero code change acceleration of any | ||
Scikit-Learn-like estimator which has an equivalent cuML implementation, | ||
including estimators from Scikit-Learn, UMAP-Learn, and HDBScan. Currently, | ||
the following estimators are mostly or entirely accelerated when run under | ||
``cuml.accel``: | ||
|
||
* Scikit-Learn | ||
* ``sklearn.cluster.KMeans`` | ||
* ``sklearn.cluster.DBSCAN`` | ||
* ``sklearn.decomposition.PCA`` | ||
* ``sklearn.decomposition.TruncatedSVD`` | ||
* ``sklearn.kernel_ridge.KernelRidge`` | ||
* ``sklearn.linear_model.LinearRegression`` | ||
* ``sklearn.linear_model.LogisticRegression`` | ||
* ``sklearn.linear_model.ElasticNet`` | ||
* ``sklearn.linear_model.Ridge`` | ||
* ``sklearn.linear_model.Lasso`` | ||
* ``sklearn.manifold.TSNE`` | ||
* ``sklearn.neighbors.NearestNeighbors`` | ||
* ``sklearn.neighbors.KNeighborsClassifier`` | ||
* ``sklearn.neighbors.KNeighborsRegressor`` | ||
* UMAP-Learn | ||
* ``umap.UMAP`` | ||
* HDBScan | ||
* ``hdbscan.HDBSCAN`` | ||
|
||
This list will continue to expand as ``cuml.accel`` development | ||
continues.Please see `Zero Code Change Limitations <0cc_limitations.rst>`_ | ||
for known limitations. | ||
|
||
4. Will I get the same results as I do without ``cuml.accel``? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
``cuml.accel`` is designed to provide *equivalent* results to the estimators | ||
it acelerates, but the output may have small numerical differences. To be more | ||
specific, measures of the quality of the results (accuracy, | ||
trustworthiness, etc.) should be approximately as good or better than those | ||
obtained without ``cuml.accel``, even if the exact output varies. | ||
|
||
A baseline limitation for obtaining exact numerical equality is that in | ||
highly parallel execution environments (e.g. GPUs), there is no guarantee that | ||
floating point operations will happen in exactly the same order as in | ||
non-parallel environments. This means that floating point arithmetic error | ||
may propagate differently and lead to different outcomes. This can be | ||
exacerbated by discretization operations in which values end up in | ||
different categories based on floating point values. | ||
|
||
Secondarily, some algorithms are implemented in a fundamentally different | ||
way on GPU than on CPU in order to make efficient use of the GPU's highly | ||
parallel compute capabilities. In such cases, ``cuml.accel`` will translate | ||
hyperparameters appropriately to maintain equivalence with the CPU | ||
implementation. Differences of this kind are noted in the corresponding entry | ||
of `Zero Code Change Limitations <0cc_limitations.rst>`_ for that | ||
estimator. | ||
|
||
If you discover a use case where the quality of results obtained with | ||
``cuml.accel`` is worse than that obtained without, please `report it as a bug | ||
<https://github.com/rapidsai/cuml/issues/new?template=bug_report.md>`_, and the | ||
RAPIDS team will investigate. | ||
|
||
5. How much faster is ``cuml.accel``? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
This depends on the individual algorithm being accelerated and the dataset | ||
being processed. As with cuML itself, you will generally see the most benefit | ||
when ``cuml.accel`` is used on large datasets. Please see | ||
`Zero Code Change Benchmarks <0cc_benchmarks.rst>`_ for some representative benchmarks. | ||
|
||
6. Will I run out of GPU memory if I use ``cuml.accel``? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
``cuml.accel`` will use CUDA `managed memory <https://developer.nvidia.com/blog/unified-memory-cuda-beginners/>`_ for allocations on NVIDIA GPUs. This means that host memory can be used to augment GPU memory, and data will be migrated automatically as necessary. This does not mean that ``cuml.accel`` is entirely impervious to OOM errors, however. Very large datasets can exhaust the entirety of both host and device memory. Additionally, if device memory is heavily oversubscribed, it can lead to slow execution. ``cuml.accel`` is designed to minimize both possibilities, but if you observe OOM errors or slow execution on data that should fit in combined host plus device memory for your system, please `report it <https://github.com/rapidsai/cuml/issues/new?template=bug_report.md>`_, and the RAPIDS team will investigate. | ||
|
||
7. What is the relationship between ``cuml.accel`` and ``cudf.pandas``? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Both projects serve a similar role. Just as ``cuml.accel`` offers zero code | ||
change acceleration for Scikit-Learn and similar packages, ``cudf.pandas`` | ||
offers zero code change acceleration for Pandas. They can be used together by | ||
TODO(wphicks): FILL THIS IN ONCE THIS MECHANISM HAS BEEN IMPLEMENTED. | ||
|
||
8. What happens if something in my script is not implemented in cuML? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
``cuml.accel`` should cleanly and transparently fall back to the CPU | ||
implementation for any methods or estimators which are not implemented in cuML. | ||
If it does not do so, please `report it as a bug <https://github.com/rapidsai/cuml/issues/new?template=bug_report.md>`_, and the RAPIDS team will investigate. | ||
|
||
9. I've discovered a bug in ``cuml.accel``. How do I report it? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Bugs affecting ``cuml.accel`` can be reported via the `cuML issue tracker <https://github.com/rapidsai/cuml/issues/new?template=bug_report.md>`_. If you observe a significant difference in the quality of output with and without ``cuml.accel``, please report it as a bug. These issues will be taken especially seriously. Similarly, if runtime slows down for your estimator when using ``cuml.accel``, the RAPIDS team will try to triage and fix the issue as soon as possible. Note that library import time *will* be longer when using ``cuml.accel``, so please exclude that from runtime. Long import time is a known issue and will be improved with subsequent releases of cuML. | ||
|
||
10. If I serialize a model using ``cuml.accel``, can I load it without ``cuml.accel``? | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
This is a common use case for ``cuml.accel``, since it may be useful to train | ||
a model using NVIDIA GPUs but deploy it for inference in an environment that | ||
does not have access to NVIDIA GPUs. Currently, models serialized with | ||
``cuml.accel`` need to be converted to pure Scikit-Learn (or UMAP/HDBScan/...) | ||
models using the following invocation: | ||
|
||
TODO(wphicks): FILL THIS OUT | ||
|
||
This conversion step should become unnecessary in a future release of cuML. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
cuml.accel: Known Limitations | ||
============================= | ||
|
||
General Limitations | ||
------------------- | ||
|
||
TODO(wphicks): Fill this in | ||
TODO(wphicks): Pickle | ||
|
||
Algorithm-Specific Limitations | ||
------------------------------ | ||
TODO(wphicks): Fills these in. Document when each will fall back to CPU, how to | ||
assess equivalence with CPU implementations, and significant differences in | ||
algorithm, as well as any other known issues. | ||
|
||
``sklearn.cluster.KMeans`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.cluster.DBSCAN`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.decomposition.PCA`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.decomposition.TruncatedSVD`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.kernel_ridge.KernelRidge`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.linear_model.LinearRegression`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.linear_model.LogisticRegression`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.linear_model.ElasticNet`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.linear_model.Ridge`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.linear_model.Lasso`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.manifold.TSNE`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.neighbors.NearestNeighbors`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.neighbors.KNeighborsClassifier`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``sklearn.neighbors.KNeighborsRegressor`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``umap.UMAP`` | ||
^^^^^^^^^^^^^ | ||
|
||
``hdbscan.HDBSCAN`` | ||
^^^^^^^^^^^^^^^^^^^ |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can also explicitly turn it on, correct?
correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Will add that as well.