Skip to content

cuvs-java: Support providing indexing data off-heap #698

Open
@ChrisHegarty

Description

Index builders allow to set the input vector dataset (to index) as a float[][]. This requires that all vectors be reified in the Java heap as a single contiguous block of memory. This can be expensive and may exhaust the java heap. Usage example:

float[][] dataset = ...

CagraIndex.newBuilder(resources)
  .withDataset(dataset)

For the Lucene use case, we write the raw vectors to disk and mmap them. To index them, then we need to copy them back into that Java heap as a float[][]. This is not ideal. It would be better to allow the index builder to set the dataset as an opaque pointer + length.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions