[ENH] Wire maxscore reader in search#6899
[ENH] Wire maxscore reader in search#6899Sicheng-Pan wants to merge 2 commits intohammad/maxscore_segment_wiringfrom
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Wire generic sparse index reader paths for This PR updates the sparse query execution path to use a unified Error handling was also generalized in both operators by moving from concrete sparse-reader error variants to This summary was automatically generated by @propel-code-bot |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
4412753 to
46cdd46
Compare
0d36e48 to
644f6bb
Compare
46cdd46 to
813d628
Compare
This comment has been minimized.
This comment has been minimized.
d9001bc to
e3bdd50
Compare
813d628 to
657596c
Compare
e3bdd50 to
2dd4274
Compare
657596c to
23284cc
Compare
| records: Vec::new(), | ||
| }); | ||
| let mut records = match metadata_segement_reader.sparse_index_reader { | ||
| Some(chroma_segment::blockfile_metadata::SparseIndexReader::MaxScore( |
There was a problem hiding this comment.
this should be inside the enum impl
23284cc to
b3b7448
Compare
2dd4274 to
1c74032
Compare
b3b7448 to
a0577fd
Compare
08d7415 to
a568734
Compare
a0577fd to
013a547
Compare
a568734 to
c986d63
Compare
013a547 to
da25969
Compare

Description of changes
This is PR 8 in the MaxScore stack — the final PR that connects the query pipeline. With this change, collections with
algorithm: "max_score"in their schema use the MaxScore index for both sparse KNN search and BM25 IDF scoring.SparseIndexKnnoperator (sparse_index_knn.rs): Added dual-path inrun()— checksmaxscore_index_readerfirst, falls back tosparse_index_reader. Both paths produce the sameRecordMeasureoutput (1.0 - score.score). AddedMaxScoreErrorvariant to the error enum.Idfoperator (idf.rs): Added dual-path for computing document frequency per dimension. MaxScore path usesMaxScoreReader::count_postings()instead ofSparseReader::get_dimension_offset_rank(), and skips the WAND-specificload_offset_values()prefetch. AddedMaxScoreErrorvariant to the error enum.SPARSE_POSTINGvsSPARSE_MAXin file_path), so the existingSparseIndexKnnandIdfoperators handle both index types internally. The orchestrator dispatches the same operators regardless of which index is in use.Test plan
Migration plan
No migration needed. This PR adds no new persistent state. The operator behavior is determined entirely by which reader the segment layer provides, which in turn is determined by the blockfile keys written during compaction (controlled by the schema's
algorithmfield, gated per-tenant in PR 6).Observability plan
No new instrumentation. The existing
MaxScoreReader::query()has tracing spans from PRs 2–3. Operator-level tracing is inherited from thechroma_system::Operatorframework.Documentation Changes
Added inline comments in both operators noting that MaxScore and WAND readers are mutually exclusive.