Java: Fixing the broken pre-filtering for Brute Force #706

chatman · 2025-02-18T00:30:42Z

Changed the signature from long[] prefilter to BitSet[] prefilters.
Added a very simple test.

copy-pr-bot · 2025-02-18T00:30:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

chatman · 2025-02-18T00:31:25Z

FYI @ChrisHegarty @narangvivek10.

ChrisHegarty · 2025-02-18T10:58:02Z

relates #697

ChrisHegarty · 2025-02-18T11:07:05Z

java/internal/src/cuvs_java.c

-    int64_t prefilter_shape[1] = {(n_queries * n_rows + 31) / 32};
-    DLManagedTensor prefilter_tensor = prepare_tensor(prefilter_data_d, prefilter_shape, kDLUInt, 32, 1, kDLCUDA);
+    // Parse the filters data
+    uint32_t *prefilters = (uint32_t*) malloc(prefilter_data_length);


I don't understand why it is necessary to malloc and copy the pre filter data. Is it not possible to access it directly from the passed pointer?

I had to change the byte ordering to make it work. Just type casting it to uint32_t * didn't work. I'll explore if doing some in-place transformation without allocating a new array is feasible.

It should be possible to write the expected byte order at the java level,

On the C++ side I added last month an option to specify the byte order used at the time of the creation of the prefilter data: https://github.com/rapidsai/raft/blob/branch-25.04/cpp/include/raft/core/bitmap.hpp#L64.
I think in your case it was created with uint8_t? So specifying original_nbits = sizeof(uint8_t) should prevent you from that extra allocation/copy. It is not present yet in the C API though.

@ChrisHegarty fixed, it was unnecessary to change the order. I was doing things wrong. Fixed now.

rhdong · 2025-02-18T23:57:34Z

/ok to test

rhdong · 2025-02-19T00:04:40Z

Hi @chatman , the style checking seems to fail, may you have time to take a look? thanks!

chatman · 2025-02-19T07:32:02Z

FYI, for larger dataset sizes, this fix is incorrect. Working on the proper fix.

rhdong · 2025-02-19T17:39:23Z

/ok to test

rhdong · 2025-02-19T17:43:53Z

/ok to test

…refiltering

chatman · 2025-02-19T17:51:24Z

I think this is ready for merge. Please review @rhdong @ChrisHegarty @narangvivek10.

rhdong · 2025-02-19T18:03:29Z

/ok to test

chatman · 2025-02-20T19:34:21Z

@rhdong Upon testing, I'm finding that on some systems, this is not working. On such systems, search call is returning non success return value, with invalid memory access. I copied the prefilter data into a device array and passed that array, and it works again. Wondering why it is working on one system with just the host array, but not on others. I'll continue the investigation and fix this by tomorrow (hopefully, before the freeze for the next release). FYI @cjnolet.

rhdong · 2025-02-21T00:50:27Z

@rhdong Upon testing, I'm finding that on some systems, this is not working. On such systems, search call is returning non success return value, with invalid memory access. I copied the prefilter data into a device array and passed that array, and it works again. Wondering why it is working on one system with just the host array, but not on others. I'll continue the investigation and fix this by tomorrow (hopefully, before the freeze for the next release). FYI @cjnolet.

Hi @chatman , the API always uses the filter memory pointer as a device accessible one, a little weird that malloc can work. 🤔 May I have your reproduction code? or code snippet? Or the test configuration like dataset size, query size, etc. Many thanks!

chatman changed the base branch from branch-25.04 to branch-25.02 February 18, 2025 00:31

ChrisHegarty reviewed Feb 18, 2025

View reviewed changes

cjnolet assigned ChrisHegarty Feb 18, 2025

cjnolet added bug Something isn't working non-breaking Introduces a non-breaking change labels Feb 18, 2025

cjnolet approved these changes Feb 18, 2025

View reviewed changes

chatman changed the base branch from branch-25.02 to branch-25.04 February 19, 2025 17:45

Ishan Chattopadhyaya added 4 commits February 19, 2025 23:15

Java: Fixing the broken pre-filtering for Brute Force

9a476b4

Formatting fixes

90b1c46

Avoiding extra copy, adding randomized testing for Brute Force with p…

b59eb5a

…refiltering

Avoiding extra copy, adding randomized testing for Brute Force with p…

b6a7661

…refiltering

chatman force-pushed the ishan/prefiltering-fix branch from a53dbf2 to b6a7661 Compare February 19, 2025 17:45

Formatting fixes for Java

3380479

rhdong mentioned this pull request Feb 19, 2025

[Fix] Various fixes for 25.02.01 point release #695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java: Fixing the broken pre-filtering for Brute Force #706

Java: Fixing the broken pre-filtering for Brute Force #706

chatman commented Feb 18, 2025

copy-pr-bot bot commented Feb 18, 2025

chatman commented Feb 18, 2025

ChrisHegarty commented Feb 18, 2025

ChrisHegarty Feb 18, 2025

chatman Feb 18, 2025

ChrisHegarty Feb 18, 2025

lowener Feb 19, 2025

chatman Feb 19, 2025

rhdong commented Feb 18, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 19, 2025

rhdong commented Feb 19, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 19, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 20, 2025

rhdong commented Feb 21, 2025 •

edited

Loading

Java: Fixing the broken pre-filtering for Brute Force #706

Are you sure you want to change the base?

Java: Fixing the broken pre-filtering for Brute Force #706

Conversation

chatman commented Feb 18, 2025

copy-pr-bot bot commented Feb 18, 2025

chatman commented Feb 18, 2025

ChrisHegarty commented Feb 18, 2025

ChrisHegarty Feb 18, 2025

Choose a reason for hiding this comment

chatman Feb 18, 2025

Choose a reason for hiding this comment

ChrisHegarty Feb 18, 2025

Choose a reason for hiding this comment

lowener Feb 19, 2025

Choose a reason for hiding this comment

chatman Feb 19, 2025

Choose a reason for hiding this comment

rhdong commented Feb 18, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 19, 2025

rhdong commented Feb 19, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 19, 2025

rhdong commented Feb 19, 2025

chatman commented Feb 20, 2025

rhdong commented Feb 21, 2025 • edited Loading

rhdong commented Feb 21, 2025 •

edited

Loading