Skip to content

Add create_vector_index for IVF_PQ, IVF_HNSW, IVF_SQ vector indices#25

Open
beinan wants to merge 1 commit into
daft-engine:mainfrom
beinan:feat/vector-index-creation
Open

Add create_vector_index for IVF_PQ, IVF_HNSW, IVF_SQ vector indices#25
beinan wants to merge 1 commit into
daft-engine:mainfrom
beinan:feat/vector-index-creation

Conversation

@beinan

@beinan beinan commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add create_vector_index() public API that wraps lance.LanceDataset.create_index() with daft-lance's standard dataset construction pattern (URI, io_config, storage_options, version, etc.)
  • Supports IVF_PQ, IVF_HNSW_PQ, IVF_HNSW_SQ, IVF_FLAT, IVF_SQ index types
  • Configurable distance metric (L2, cosine, dot), num_partitions, num_sub_vectors, GPU accelerator
  • Validates column exists and is a vector (fixed-size list) type before building
  • Forwards **kwargs for advanced lance params (ivf_centroids, pq_codebook, target_partition_size, etc.)

Test plan

  • 12 new tests covering: IVF_PQ creation, default type, cosine metric, custom name, replace=True/False, invalid column/type errors, vector search after index, case-insensitive type, kwargs forwarding
  • All 290 existing tests pass (5 skipped)
  • Pre-commit hooks pass

🤖 Generated with Claude Code

Adds create_vector_index() as a passthrough to lance.LanceDataset.create_index()
with consistent dataset construction patterns. Supports IVF_PQ, IVF_HNSW_PQ,
IVF_HNSW_SQ, IVF_FLAT, IVF_SQ with configurable metric, partitions, and GPU
acceleration.

Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant