Use the default values of the scikit-learn constructor arguments #6309

betatim · 2025-02-11T13:55:14Z

When a user instantiates a new "scikit-learn" estimator with the cuml accelerator enabled, we need to fill in the default values for the arguments the user didn't pass. Here we take them from the scikit-learn constructor and then apply the hyper-parameter translator. And then initialise the cuml class.

I think from a scikit-learn user's perspective this makes sense. You have some scikit-learn code, passing values for arguments where you don't like the default and assuming that for all the other arguments you will get the default values.

It is also straightforward to reason about for accelerator developers. The starting point of the arguments and their values are the defaults from the scikit-learn class that is being proxied. Then, if needed, these values are translated to the cuml equivalent values. The translator gets to see all arguments, not just those the user passed. This allows the translator to translate cases where the default scikit-learn value needs translating.

So far we constructed the cuml class with its default values + the translated user supplied arguments.

One place where the current approach doesn't work is if we have a deprecation in cuml (say n_init parameter of KMeans). The user will get a warning about the deprecation. However, this is confusing because from the user's point of view the default value of n_init is the value that the warning tells them will be the new default.

This breaks out a change from #6142 so we can look at it independent of the changes to KMeans.

When a user isntatiates a new estimator, we need to fill in the default values for the arguments the user didn't pass. Here we take them from the scikit-learn constructor and then apply the hyper-parameter translator.

betatim · 2025-02-14T07:52:27Z

A reason to handle this, and do the work of setting up translation infrastructure for it despite the "cuml defaults are tweaked to match sklearn defaults" is that when a user explicitly passes an argument but uses the default value, then we don't get the cuml default anymore.

For example these two classes which are equivalent classifiers, default value for foo is 42 in scikit-learn and the equivalent value for foo in cuml is 21. As long as the user constructs SomeSklearnClassifier without passing foo it all works, we get the equivalent default value in cuml (21). However if the user uses SomeSklearnClassifier(foo=42), then we pass 42 to the cuml classifier, not 21.

class SomeSklearnClassifier:
  def __init__(self, foo=42):
    ...

class SomeCumlClassifier:
  def __init__(self, foo=21):
    ..

Use the default values of the scikit-learn constructor

373fbf0

When a user isntatiates a new estimator, we need to fill in the default values for the arguments the user didn't pass. Here we take them from the scikit-learn constructor and then apply the hyper-parameter translator.

betatim requested a review from a team as a code owner February 11, 2025 13:55

betatim requested review from teju85 and viclafargue February 11, 2025 13:55

github-actions bot added the Cython / Python Cython or Python issue label Feb 11, 2025

cjnolet assigned betatim Feb 13, 2025

dantegd added the cuml-cpu label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the default values of the scikit-learn constructor arguments #6309

Use the default values of the scikit-learn constructor arguments #6309

betatim commented Feb 11, 2025

betatim commented Feb 14, 2025

Use the default values of the scikit-learn constructor arguments #6309

Are you sure you want to change the base?

Use the default values of the scikit-learn constructor arguments #6309

Conversation

betatim commented Feb 11, 2025

betatim commented Feb 14, 2025