Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMAP fixes #6316

Open
wants to merge 1 commit into
base: branch-25.04
Choose a base branch
from
Open

UMAP fixes #6316

wants to merge 1 commit into from

Conversation

viclafargue
Copy link
Contributor

No description provided.

@viclafargue viclafargue requested review from a team as code owners February 13, 2025 15:34
@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Feb 13, 2025
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels Feb 13, 2025
@viclafargue
Copy link
Contributor Author

viclafargue commented Feb 13, 2025

I also reviewed the smooth_knn_dist_kernel kernel and noticed that one of the loops doesn't account for the first element. Is this because the first distance is expected to be zero (self) in a pairwise search? If so, is this safe for transformations? Despite this, the membership strength values seem to be reasonably in accordance with the KNN dists. However, it might be worth addressing if this turns out to be a genuine issue.

@@ -315,8 +313,6 @@ CUML_KERNEL void optimize_batch_kernel(T const* head_embedding,
auto grad_d = T(0.0);
if (repulsive_grad_coeff > T(0.0))
grad_d = clip<T>(repulsive_grad_coeff * (current[d] - negative_sample[d]), T(-4.0), T(4.0));
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still curious about this change- I want to make sure we're not sacrificing the numerical stability by changing this based on limited evidence. What is the reason for changing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear mistake here.
Here is the reference implementation : https://github.com/lmcinnes/umap/blob/a012b9d8751d98b94935ca21f278a54b3c3e1b7f/umap/layouts.py#L181

@cjnolet
Copy link
Member

cjnolet commented Feb 13, 2025

Is this because the first distance is expected to be zero (self) in a pairwise search? If so, is this safe for transformations?

Yes, that's exactly why. I've verified this many times in the past and it should be equivalent w/ this logic. We don't remove the self-loop until later to save memory.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (branch-25.04@9c0166a). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-25.04    #6316   +/-   ##
===============================================
  Coverage                ?   67.07%           
===============================================
  Files                   ?      202           
  Lines                   ?    13076           
  Branches                ?        0           
===============================================
  Hits                    ?     8771           
  Misses                  ?     4305           
  Partials                ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CUDA/C++ Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants