Skip to content

[ENH] Wire maxscore reader in search#6899

Open
Sicheng-Pan wants to merge 2 commits intohammad/maxscore_segment_wiringfrom
hammad/maxscore_operators
Open

[ENH] Wire maxscore reader in search#6899
Sicheng-Pan wants to merge 2 commits intohammad/maxscore_segment_wiringfrom
hammad/maxscore_operators

Conversation

@Sicheng-Pan
Copy link
Copy Markdown
Contributor

@Sicheng-Pan Sicheng-Pan commented Apr 13, 2026

Description of changes

This is PR 8 in the MaxScore stack — the final PR that connects the query pipeline. With this change, collections with algorithm: "max_score" in their schema use the MaxScore index for both sparse KNN search and BM25 IDF scoring.

  • SparseIndexKnn operator (sparse_index_knn.rs): Added dual-path in run() — checks maxscore_index_reader first, falls back to sparse_index_reader. Both paths produce the same RecordMeasure output (1.0 - score.score). Added MaxScoreError variant to the error enum.
  • Idf operator (idf.rs): Added dual-path for computing document frequency per dimension. MaxScore path uses MaxScoreReader::count_postings() instead of SparseReader::get_dimension_offset_rank(), and skips the WAND-specific load_offset_values() prefetch. Added MaxScoreError variant to the error enum.
  • No orchestrator changes: The two readers are mutually exclusive (SPARSE_POSTING vs SPARSE_MAX in file_path), so the existing SparseIndexKnn and Idf operators handle both index types internally. The orchestrator dispatches the same operators regardless of which index is in use.

Test plan

  • All existing sparse KNN and IDF tests pass unchanged (WAND path exercised by default).
  • The MaxScore path is exercised end-to-end by the segment-level consistency test in PR 7, which verifies write → commit → flush → fork → read → query correctness across multiple iterations with 100% recall against brute force.
  • Manual verification against a live Tilt server with Python, JS, and Rust clients confirmed correct behavior (PR 6).

Migration plan

No migration needed. This PR adds no new persistent state. The operator behavior is determined entirely by which reader the segment layer provides, which in turn is determined by the blockfile keys written during compaction (controlled by the schema's algorithm field, gated per-tenant in PR 6).

Observability plan

No new instrumentation. The existing MaxScoreReader::query() has tracing spans from PRs 2–3. Operator-level tracing is inherited from the chroma_system::Operator framework.

Documentation Changes

Added inline comments in both operators noting that MaxScore and WAND readers are mutually exclusive.

Copy link
Copy Markdown
Contributor Author

Sicheng-Pan commented Apr 13, 2026

@Sicheng-Pan Sicheng-Pan changed the title Wire MaxScore reader into SparseIndexKnn and Idf operators [ENH ]Wire maxscore reader in search Apr 13, 2026
@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review April 13, 2026 18:20
@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot Bot commented Apr 13, 2026

Wire generic sparse index reader paths for SparseIndexKnn and Idf operators

This PR updates the sparse query execution path to use a unified sparse_index_reader interface in both SparseIndexKnn and Idf, enabling the same operators to work with either WAND or MaxScore-backed sparse indexes without orchestrator changes. The main behavioral shift is replacing WAND-specific calls with reader-abstracted methods (knn_query and dimension_counts) while preserving output semantics such as converting scores via 1.0 - score.score and sorting KNN results for downstream merge behavior.

Error handling was also generalized in both operators by moving from concrete sparse-reader error variants to Chroma-wrapped dynamic errors, allowing both index-reader implementations to propagate through the same operator error types. Net diff is small and focused (2 files, 10 additions, 44 deletions), indicating simplification and interface consolidation rather than broad logic expansion.

This summary was automatically generated by @propel-code-bot

Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review found no issues; the dual-path MaxScore integration appears sound and low risk.

Status: No Issues Found | Risk: Low

Review Details

📁 2 files reviewed | 💬 0 comments

@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Sicheng-Pan Sicheng-Pan changed the title [ENH ]Wire maxscore reader in search [ENH] Wire maxscore reader in search Apr 13, 2026
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 4412753 to 46cdd46 Compare April 14, 2026 22:04
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from 0d36e48 to 644f6bb Compare April 14, 2026 22:04
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 46cdd46 to 813d628 Compare April 14, 2026 23:12
@blacksmith-sh

This comment has been minimized.

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from d9001bc to e3bdd50 Compare April 24, 2026 00:02
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 813d628 to 657596c Compare April 24, 2026 00:02
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from e3bdd50 to 2dd4274 Compare April 24, 2026 02:05
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 657596c to 23284cc Compare April 24, 2026 02:05
records: Vec::new(),
});
let mut records = match metadata_segement_reader.sparse_index_reader {
Some(chroma_segment::blockfile_metadata::SparseIndexReader::MaxScore(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be inside the enum impl

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 23284cc to b3b7448 Compare April 24, 2026 05:29
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from 2dd4274 to 1c74032 Compare April 24, 2026 05:29
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from b3b7448 to a0577fd Compare April 24, 2026 17:22
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch 2 times, most recently from 08d7415 to a568734 Compare April 24, 2026 18:03
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from a0577fd to 013a547 Compare April 24, 2026 18:03
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_segment_wiring branch from a568734 to c986d63 Compare April 24, 2026 20:57
@Sicheng-Pan Sicheng-Pan force-pushed the hammad/maxscore_operators branch from 013a547 to da25969 Compare April 24, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants