Open
Description
Index builders allow to set the input vector dataset (to index) as a float[][]
. This requires that all vectors be reified in the Java heap as a single contiguous block of memory. This can be expensive and may exhaust the java heap. Usage example:
float[][] dataset = ...
CagraIndex.newBuilder(resources)
.withDataset(dataset)
For the Lucene use case, we write the raw vectors to disk and mmap them. To index them, then we need to copy them back into that Java heap as a float[][]
. This is not ideal. It would be better to allow the index builder to set the dataset as an opaque pointer + length.
Metadata
Assignees
Labels
No labels
Activity