Multi-Vector Distance Benchmarks by suri-kumkaran · Pull Request #1027 · microsoft/DiskANN

suri-kumkaran · 2026-05-06T21:17:18Z

Adds Chamfer / MaxSim benchmarks from diskann-quantization::multi_vector to
diskann-benchmark, behind a new multi-vector Cargo feature (off by default).
Four registered regression kernels — (f32, f16) × (optimized, reference):

optimized drives QueryComputer (architecture-dispatched SIMD).
reference drives the Chamfer / MaxSim fallback used by the
multi_vector unit tests — numerical ground truth and a baseline for
measuring SIMD speedups.

With the feature off, a multi-vector-op job emits the standard
Requires the "multi-vector" feature diagnostic via stub_impl!.

Design

Trait-object distance. OptimizedDistance<T> / ReferenceDistance<'a, T>
implement an object-safe Distance<T> trait; a shared
run_with_distance<T>(run, doc, &dyn Distance<T>) carries the loop nest.
run_loops<F: FnMut()> monomorphises over (T, Operation) only.
QueryComputer::new is hoisted out of the timed loop; the reference
path doesn't need a parallel hoist (MaxSim::new is a non-empty check +
pointer wrap).
Two-axis dispatch. Element type via the framework's DispatchRule;
implementation axis via TryFrom<Implementation> plus a tiny
ImplementationMatcher trait for diagnostic consts.
Matrix abstraction reuse. The fixture is
(Mat<Standard<T>>, Mat<Standard<T>>), built through a new
NewOwned<Init<F>> constructor (closure-per-element fill) added to
diskann-quantization::multi_vector::matrix alongside the existing
NewOwned<T> / NewOwned<Defaulted> variants.
ns/IP @ Dim = min_latency_µs * 1000 / (Q * D * loops). Invariant
under Q, D, and loop budget; approximately linear in Dim. Per-shape
regression detection is robust because the Dim factor cancels.
No macros for benchmark glue. Two per-T NewFromMatRef impls cover the
inherent QueryComputer::<T>::new; everything else is generic.

Tests

6 unit tests in backend::multi_vector::imp::tests (regression-check edges) +
2 integration tests in main.rs (run and check verify). 1 unit test + 1
doctest in diskann-quantization::multi_vector::matrix cover the new
NewOwned<Init<F>>.

Follow-up PRs (deferred)

Reduce diskann-benchmark compile times by feature-gating the existing
indexing code.
Fold diskann-benchmark-simd into diskann-benchmark similarly.
Stop publishing benchmark crates to crates.io (publish = false).
Optionally move benchmark-related binary crates under a benchmarks/
workspace subdirectory.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new diskann-benchmark-multi-vector crate to benchmark and regression-check multi-vector distance ops (Chamfer, MaxSim) across f32/f16 and optimized/reference implementations, integrated with diskann-benchmark-runner.

Changes:

Introduces benchmark library with dispatcher registration, benchmark execution, and regression checking logic.
Adds benchmark-multi-vector CLI + integration tests and example job/tolerance configs used by the runner workflow.
Updates workspace to include the new crate.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
diskann-benchmark-multi-vector/src/lib.rs	Core benchmark/regression logic, dispatch rules, fixtures, and unit tests.
diskann-benchmark-multi-vector/src/bin.rs	CLI entrypoint wiring registry + basic integration tests.
diskann-benchmark-multi-vector/examples/tolerance.json	Default tolerance config consumed by `check verify/run`.
diskann-benchmark-multi-vector/examples/test.json	Small smoke input for integration tests.
diskann-benchmark-multi-vector/examples/multi-vector.json	Full benchmark matrix input example.
diskann-benchmark-multi-vector/README.md	Documents kernels, metric normalization, and runner workflows.
diskann-benchmark-multi-vector/Cargo.toml	New crate manifest + deps/bin target.
Cargo.toml	Adds new crate to workspace members.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#[derive(Debug, Error)]
+#[error("this kernel handles a different implementation than {0}")]
+pub(crate) struct ImplementationMismatch(Implementation);


+            match change {
+                Ok(change) => {
+                    row.insert(format!("{:.3} %", change * 100.0), 6);
+                    if change > c.tolerance.min_time_regression.get() {
+                        row.insert("FAIL", 7);
+                    }
+                }
+                Err(err) => {
+                    row.insert("invalid", 6);
+                    row.insert(err, 7);
+                }
+            }


+                match relative_change(before_min, after_min) {
+                    Ok(change) => {
+                        if change > tolerance.min_time_regression.get() {
+                            passed = false;
+                        }
+                    }
+                    Err(_) => passed = false,
+                };


+            let min_latency = r
+                .latencies
+                .iter()
+                .min()
+                .copied()
+                .unwrap_or(MicroSeconds::new(u64::MAX));


+fn run_loops<F>(run: &Run, mut body: F) -> RunResult
+where
+    F: FnMut(),
+{
+    let mut latencies = Vec::with_capacity(run.num_measurements.get());
+
+    for _ in 0..run.num_measurements.get() {
+        let start = std::time::Instant::now();
+        for _ in 0..run.loops_per_measurement.get() {
+            body();
+        }
+        latencies.push(start.elapsed().into());
+    }
+
+    let percentiles = percentiles::compute_percentiles(&mut latencies).unwrap();
+    RunResult {
+        run: run.clone(),
+        latencies,
+        percentiles,
+    }
+}


+            .join("tolerance.json");
+
+        let stdout = run_check_test(&input_path, &tolerance_path);
+        println!("stdout = {}", stdout);


codecov-commenter · 2026-05-07T06:57:02Z

Codecov Report

❌ Patch coverage is 54.70085% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.45%. Comparing base (c50fb2b) to head (d06df7e).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
diskann-benchmark/src/inputs/multi_vector.rs	23.18%	53 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1027      +/-   ##
==========================================
- Coverage   89.51%   89.45%   -0.07%     
==========================================
  Files         460      463       +3     
  Lines       85466    86047     +581     
==========================================
+ Hits        76508    76973     +465     
- Misses       8958     9074     +116

Flag	Coverage Δ
miri	`89.45% <54.70%> (-0.07%)`	⬇️
unittests	`89.08% <54.70%> (-0.28%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark/src/backend/mod.rs	`100.00% <100.00%> (ø)`
diskann-benchmark/src/backend/multi_vector.rs	`100.00% <100.00%> (ø)`
diskann-benchmark/src/inputs/mod.rs	`79.16% <100.00%> (+0.90%)`	⬆️
diskann-benchmark/src/main.rs	`91.93% <100.00%> (+0.57%)`	⬆️
diskann-quantization/src/multi_vector/matrix.rs	`95.68% <100.00%> (+0.09%)`	⬆️
diskann-benchmark/src/inputs/multi_vector.rs	`23.18% <23.18%> (ø)`

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 96d17b3.

hildebrandmw

Thanks Suryansh, this will be useful for performance optimization. One big question I had is whether or not this needs to be its own crate, or whether it can be folded into diskann-benchmark (which will need some work to put the normal indexing code behind a feature gate to keep compile times low) or as a binary in diskann-quantization directly.

The pattern established by diskann-benchmark-simd is a little unfortunate so now is the time to think about our approach before we have to deal with a crate proliferation. We also might be able to bury all benchmark related binary crates in a subdirectory to keep the top level organized.

In either case, I don't think we should publish this to crates.io - and we should probably stop publishing diskann-benchmark-simd as well.

hildebrandmw · 2026-05-07T15:34:05Z

+    ($f:ident, $field:tt, $($expr:tt)*) => {
+        writeln!($f, "{:>18}: {}", $field, $($expr)*)
+    }
+}


Since #996, manual alignment shouldn't be needed any more.

hildebrandmw · 2026-05-08T16:40:45Z

+    macro_rules! register {
+        ($impl:ident, $t:ty, $tag:literal) => {
+            dispatcher.register_regression($tag, Kernel::<$impl, $t>::new());
+        };


Nit: The macro is offering only a moderate benefit in terseness. Might be better off without it?

hildebrandmw · 2026-05-08T16:41:47Z

+    }
+
+    // Optimized (architecture-dispatched QueryComputer).
+    register!(Optimized, f32, "multi-vector-op-f32-optimized");


We should probably provide a way of overriding the runtime architecture to test both AVX512 and AVX2 kernels on the same machine. Otherwise, we have no way to compare such implementations against each other ...

hildebrandmw · 2026-05-08T16:46:00Z

+            }
+        }
+    };
+}


This might be cleaner to get rid of DispatchRule and use an entirely bespoke, smaller trait. Like:

impl TryFrom<Implementation> for Optimized { type Error = FailureScore; fn try_from(i: Implementation) -> Result<Self, Self::Error> { match i { Implementation::Optimized => Ok(Self), _ => Err(FailureScore(1)), } } }

and write matching via TryFrom.

hildebrandmw · 2026-05-08T16:48:09Z

+const RNG_SEED: u64 = 0x12345;
+
+struct Data<T> {
+    query_data: Box<[T]>,


We spend so much time writing matrix abstractions. Why not use them? If the constructors are insufficient, that's a sign we have work to do!

hildebrandmw · 2026-05-08T16:50:06Z

+        results.push(result);
+    }
+    Ok(results)
+}


These two run loops are nearly identical. Perhaps a tweak to factor out the common parts? Maybe consider keeping the loop nest the same, but taking a trait object (to reduce compile times) for the distance implementation?

hildebrandmw · 2026-05-08T16:51:28Z

+}
+
+impl_kernel_for!(f32);
+impl_kernel_for!(f16);


Since the abstractions these macros are stamping were also introduced in this PR, consider whether or not there's a simpler way to express things via generics that doesn't require macros.

hildebrandmw · 2026-05-08T16:52:58Z

+| `multi-vector-op-f16-reference`  | `f16`   | `Chamfer` / `MaxSim` |
+
+The **optimized** path constructs a `QueryComputer` once per shape (which
+internally selects the best available SIMD kernel for the host) and calls


If we're using this as a development tool ... it's probably a good idea to allow an override for the architecture.

hildebrandmw · 2026-05-08T16:54:45Z

+bench run --input-file examples/multi-vector.json --output-file before.json
+```
+
+### Regression check workflow


The section on regression checks seems to be just a copy+paste of how diskann-benchmark-runner works. Does this need to be spelled out in the README like this?

hildebrandmw · 2026-05-08T16:57:26Z

+The crate registers four kernels — one per `(element_type, implementation)`
+pair:
+
+| Tag                              | Element | Implementation       |


While it's fine to have a list like this, the chances of this getting updated as kernels are added is nearly zero. Maybe instead rely on the inputs behavior of the runner APP and ensure that the returned descriptions are sufficient?

suhasjs · 2026-05-11T16:33:54Z

    "diskann-benchmark-runner",
    "diskann-benchmark-core",
    "diskann-benchmark-simd",
+    "diskann-benchmark-multi-vector",


Do we want to make this a high-level crate? I think it would be good if we had this added as a 'benchmark' under diskann-benchmark..

How many top-level crates do we actually want to have here? If we end up with 10–20 diskann-benchmark-* crates, is that going to be a problem?

As an alternative, could we keep benchmarks inside each crate: e.g., add a bench subfolder and use [[bench]] with harness = false, pulling in something like diskann-benchmark-runner as a dev-dependency? That feels more idiomatic in Rust.

suhasjs · 2026-05-11T16:34:23Z

+| `multi-vector-op-f16-reference`  | `f16`   | `Chamfer` / `MaxSim` |
+
+The **optimized** path constructs a `QueryComputer` once per shape (which
+internally selects the best available SIMD kernel for the host) and calls


arkrishn94

Thanks Suryansh, I left some relatively minor comments. Can we run this benchmark and post some example numbers in the PR description?

arkrishn94 · 2026-05-12T16:08:44Z

+Per-measurement latency is normalized to **nanoseconds per inner-product
+call**, abbreviated `ns/IP`:


Maybe I'm overthinking this but some CPU cycles are spent on reduction here too for Chamfer.

Why not just report the total time here? Users can profile themselves if they want to.

arkrishn94 · 2026-05-12T16:18:59Z

+#[derive(Debug, Serialize)]
+struct Comparison {
+    run: Run,
+    tolerance: MultiVectorTolerance,


nit: Tolerance here should do? The path should allow us to distinguish?

arkrishn94 · 2026-05-12T16:19:26Z

+struct Comparison {
+    run: Run,
+    tolerance: MultiVectorTolerance,
+    before_min: f64,


nit: Positive?

arkrishn94 · 2026-05-12T16:35:50Z

+        if datatype::Type::<T>::try_match(&from.element_type).is_err() {
+            *failscore.get_or_insert(0) += 10;
+        }


I forget - how did we come up a failure score of 10 for this? Should this be something large?

arkrishn94 · 2026-05-12T16:48:18Z

+// initialized by it is non-null and properly aligned to the underlying type.
+unsafe impl<T, F> NewOwned<Init<F>> for Standard<T>
+where
+    T: Copy,
+    F: FnMut() -> T,
+{
+    type Error = crate::error::Infallible;


Does it make sense to fold the default initialization into this in the spirit of minimizing the number of constructor paths we expose to users?

arkrishn94 · 2026-05-12T16:52:38Z

+pub(crate) enum Operation {
+    Chamfer,
+    MaxSim,
+}


I'm wondering if we should just expose a single operation here? I'm not sure folks will use MaxSim much and it might simplify some of your code here?

suri-kumkaran and others added 4 commits May 7, 2026 01:18

Add benchmark crate for multi-vector

77cc936

Move some repetetive code to macros and add more benchmark cases

54a21ec

Move some repetetive code to macros and add more benchmark cases

f3a5d9f

Move some repetetive code to macros and add more benchmark cases

8efdbcd

suri-kumkaran self-assigned this May 6, 2026

suri-kumkaran requested review from a team and Copilot May 6, 2026 21:17

suri-kumkaran added this to DiskANN backlog May 6, 2026

suri-kumkaran linked an issue May 6, 2026 that may be closed by this pull request

Add benchmarking suite for multi-vector operations #993

Open

4 tasks

Copilot AI reviewed May 6, 2026

View reviewed changes

Copilot started reviewing on behalf of suri-kumkaran May 6, 2026 21:57 View session

suri-kumkaran added 2 commits May 7, 2026 11:54

Merge branch 'main' into users/suryangupta/multi-vector-benchmark

74d1cae

Add Cargo.lock

3a89c37

suri-kumkaran added 2 commits May 8, 2026 01:48

Remove unused scalar benchmark config file

96d17b3

Revert "Remove unused scalar benchmark config file"

6b33719

This reverts commit 96d17b3.

hildebrandmw reviewed May 8, 2026

View reviewed changes

harsha-simhadri requested a review from suhasjs May 8, 2026 19:33

suhasjs reviewed May 11, 2026

View reviewed changes

Fold the new crate to existing diskann-benchmark crate

d06df7e

suri-kumkaran changed the title ~~Add diskann-benchmark-multi-vector crate~~ Multi-Vector Distance Benchmarks May 12, 2026

arkrishn94 reviewed May 12, 2026

View reviewed changes

		Per-measurement latency is normalized to **nanoseconds per inner-product
		call**, abbreviated `ns/IP`:

Conversation

suri-kumkaran commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Tests

Follow-up PRs (deferred)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov-commenter commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arkrishn94 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

suri-kumkaran commented May 6, 2026 •

edited

Loading

codecov-commenter commented May 7, 2026 •

edited

Loading