Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the default values of the scikit-learn constructor arguments #6309

Open
wants to merge 1 commit into
base: branch-25.04
Choose a base branch
from

Conversation

betatim
Copy link
Member

@betatim betatim commented Feb 11, 2025

When a user instantiates a new "scikit-learn" estimator with the cuml accelerator enabled, we need to fill in the default values for the arguments the user didn't pass. Here we take them from the scikit-learn constructor and then apply the hyper-parameter translator. And then initialise the cuml class.

I think from a scikit-learn user's perspective this makes sense. You have some scikit-learn code, passing values for arguments where you don't like the default and assuming that for all the other arguments you will get the default values.

It is also straightforward to reason about for accelerator developers. The starting point of the arguments and their values are the defaults from the scikit-learn class that is being proxied. Then, if needed, these values are translated to the cuml equivalent values. The translator gets to see all arguments, not just those the user passed. This allows the translator to translate cases where the default scikit-learn value needs translating.

So far we constructed the cuml class with its default values + the translated user supplied arguments.

One place where the current approach doesn't work is if we have a deprecation in cuml (say n_init parameter of KMeans). The user will get a warning about the deprecation. However, this is confusing because from the user's point of view the default value of n_init is the value that the warning tells them will be the new default.

This breaks out a change from #6142 so we can look at it independent of the changes to KMeans.

When a user isntatiates a new estimator, we need to fill in the default
values for the arguments the user didn't pass. Here we take them from
the scikit-learn constructor and then apply the hyper-parameter
translator.
@betatim betatim requested a review from a team as a code owner February 11, 2025 13:55
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 11, 2025
@betatim
Copy link
Member Author

betatim commented Feb 14, 2025

A reason to handle this, and do the work of setting up translation infrastructure for it despite the "cuml defaults are tweaked to match sklearn defaults" is that when a user explicitly passes an argument but uses the default value, then we don't get the cuml default anymore.

For example these two classes which are equivalent classifiers, default value for foo is 42 in scikit-learn and the equivalent value for foo in cuml is 21. As long as the user constructs SomeSklearnClassifier without passing foo it all works, we get the equivalent default value in cuml (21). However if the user uses SomeSklearnClassifier(foo=42), then we pass 42 to the cuml classifier, not 21.

class SomeSklearnClassifier:
  def __init__(self, foo=42):
    ...

class SomeCumlClassifier:
  def __init__(self, foo=21):
    ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuml-cpu Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants