refactor(fuse): abstract runtime filter into PartitionRuntimeFilter, IndexRuntimeFilter, and RowRuntimeFilter traits#19728
Conversation
…IndexRuntimeFilter, and RowRuntimeFilter traits
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac50ce6726
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // This filter requires async IO for bloom index loading. | ||
| // The sync prune path cannot be used alone — use prune_async instead. | ||
| // Return false (don't prune) as a safe fallback. | ||
| let _ = part; | ||
| Ok(false) |
There was a problem hiding this comment.
Wire inlist bloom pruning into index filter
ReadDataTransform::read_parts now relies on IndexRuntimeFilter::prune, but InlistBloomIndexFilter::prune unconditionally returns Ok(false) and the real pruning logic lives in the unused prune_async helper. That means runtime IN-list bloom-index pruning is effectively disabled after this refactor, so partitions that were previously skipped by bloom index checks are always read.
Useful? React with 👍 / 👎.
| }) | ||
| .collect(); | ||
| // Use new trait-based API for row-level filtering | ||
| let row_runtime_filters = ctx.get_row_runtime_filters(scan_id); |
There was a problem hiding this comment.
Keep bloom row filters available for new hash join
ReadState now pulls runtime bloom filters only from ctx.get_row_runtime_filters(scan_id), but this commit only populates that side-channel in the legacy hash-join runtime-filter path; the new_hash_join globalization path still only calls set_runtime_filter. For plans using new hash join, row_runtime_filters stays empty and row-level bloom runtime filtering is skipped entirely.
Useful? React with 👍 / 👎.
| let index = filter.load_index(&part, &operator).await?; | ||
| let index_ref = index.as_ref().map(|b| b.as_ref() as &dyn std::any::Any); | ||
| if filter.prune(&part, index_ref)? { |
There was a problem hiding this comment.
Avoid loading spatial index before stats pruning
The new index-filter loop loads index data before calling prune, so SpatialIndexFilter::load_index runs even when the fast bounding-box stats check would immediately reject a partition. Previously the spatial pruner checked stats first and only read index files when needed; this change adds avoidable index IO per partition and can significantly regress spatial runtime-filter performance on large scans.
Useful? React with 👍 / 👎.
🤖 CI Job Analysis (Retry 1)
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
- Add PartitionRuntimeFilters/IndexRuntimeFilters/RowRuntimeFilters type aliases - Change RowRuntimeFilter::apply to take Column directly instead of &DataBlock - Add column_name() to RowRuntimeFilter trait for schema-independent resolution - BloomRowFilter::create returns Arc<dyn RowRuntimeFilter> directly - ReadState resolves column indices via column_name() at init time Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nsform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
89a48f7 to
ecc59d9
Compare
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
refactor(fuse): abstract runtime filter into PartitionRuntimeFilter, IndexRuntimeFilter, and RowRuntimeFilter traits
Tests
Type of change
This change is