Use the default values of the scikit-learn constructor arguments #6309
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a user instantiates a new "scikit-learn" estimator with the cuml accelerator enabled, we need to fill in the default values for the arguments the user didn't pass. Here we take them from the scikit-learn constructor and then apply the hyper-parameter translator. And then initialise the cuml class.
I think from a scikit-learn user's perspective this makes sense. You have some scikit-learn code, passing values for arguments where you don't like the default and assuming that for all the other arguments you will get the default values.
It is also straightforward to reason about for accelerator developers. The starting point of the arguments and their values are the defaults from the scikit-learn class that is being proxied. Then, if needed, these values are translated to the cuml equivalent values. The translator gets to see all arguments, not just those the user passed. This allows the translator to translate cases where the default scikit-learn value needs translating.
So far we constructed the cuml class with its default values + the translated user supplied arguments.
One place where the current approach doesn't work is if we have a deprecation in cuml (say
n_init
parameter ofKMeans
). The user will get a warning about the deprecation. However, this is confusing because from the user's point of view the default value ofn_init
is the value that the warning tells them will be the new default.This breaks out a change from #6142 so we can look at it independent of the changes to
KMeans
.