Skip to content

feat: add fragment enumeration and fragment-scoped scanning APIs#3

Merged
jja725 merged 3 commits intolance-format:mainfrom
jja725:feat/fragment-apis
Mar 26, 2026
Merged

feat: add fragment enumeration and fragment-scoped scanning APIs#3
jja725 merged 3 commits intolance-format:mainfrom
jja725:feat/fragment-apis

Conversation

@jja725
Copy link
Copy Markdown
Collaborator

@jja725 jja725 commented Mar 26, 2026

Summary

Add three new C API functions for fragment-level access, enabling split-based parallelism for query engines like Velox.

  • lance_dataset_fragment_count() — returns number of fragments in the dataset
  • lance_dataset_fragment_ids() — fills caller-allocated array with fragment IDs
  • lance_scanner_set_fragment_ids() — restricts scan to specific fragment IDs

Motivation

Query engines like Velox distribute work across workers by assigning each worker a subset of fragments (splits). Without fragment-level APIs, the connector cannot partition the scan work.

Changes

File Change
src/dataset.rs Added lance_dataset_fragment_count, lance_dataset_fragment_ids
src/scanner.rs Added lance_scanner_set_fragment_ids, fragment_ids field, fragment filtering in materialize_stream and build_scanner
include/lance.h Added C declarations
include/lance.hpp Added C++ wrappers (Dataset::fragment_count(), Dataset::fragment_ids(), Scanner::fragment_ids())
tests/c_api_test.rs Added test_fragment_count, test_fragment_ids, test_scanner_with_fragment_ids

Test plan

  • cargo test — 40 tests pass (3 new fragment tests)
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo fmt -- --check — clean

🤖 Generated with Claude Code

jja725 and others added 3 commits March 24, 2026 13:08
lance-encoding's build script requires protoc. Install it via
apt-get on Linux and brew on macOS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three new C API functions for fragment-level access:
- lance_dataset_fragment_count() — returns number of fragments
- lance_dataset_fragment_ids() — fills caller-allocated array with fragment IDs
- lance_scanner_set_fragment_ids() — restricts scan to specific fragments

This enables split-based parallelism for query engines like Velox, where each
worker scans a subset of fragments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jja725 jja725 merged commit 5edc555 into lance-format:main Mar 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant