Skip to content

Commit d6e49ea

Browse files
jja725claude
andcommitted
docs: add AGENTS.md and CLAUDE.md for coding agent guidance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 64370bb commit d6e49ea

2 files changed

Lines changed: 118 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# AGENTS.md
2+
3+
This file provides guidance to coding agents collaborating on this repository.
4+
5+
## Project Overview
6+
7+
`lance-c` provides C/C++ bindings for the [Lance](https://github.com/lance-format/lance) columnar data format. It exposes Lance's functionality through a stable C-ABI with opaque handle patterns and Arrow C Data Interface for zero-copy data exchange.
8+
9+
Target consumers include C++ query engines (Velox, DuckDB), ML frameworks, and any language with C FFI capabilities.
10+
11+
## Project Requirements
12+
13+
- Always use English in code, examples, and comments.
14+
- Features should be implemented concisely, maintainably, and efficiently.
15+
- Code is not just for execution, but also for readability.
16+
- Only add meaningful comments and tests.
17+
18+
## Architecture
19+
20+
```
21+
include/
22+
├── lance.h # C header (stable ABI) — all extern "C" function declarations
23+
└── lance.hpp # C++ RAII wrappers (header-only)
24+
src/
25+
├── lib.rs # Module root, re-exports all extern "C" symbols
26+
├── error.rs # Thread-local error handling (error codes + messages)
27+
├── dataset.rs # Dataset lifecycle: open, close, metadata, schema, take
28+
├── scanner.rs # Scanner builder + three scan modes (sync, async, poll)
29+
├── batch.rs # LanceBatch: Arrow C Data Interface export
30+
├── helpers.rs # C string parsing utilities
31+
├── runtime.rs # Global Tokio runtime
32+
└── async_dispatcher.rs # Callback dispatcher for async scan
33+
tests/
34+
├── c_api_test.rs # Rust integration tests calling C API directly
35+
├── compile_and_run_test.rs # Compiles and runs C/C++ test programs
36+
└── cpp/
37+
├── test_c_api.c # C test program
38+
└── test_cpp_api.cpp # C++ test program
39+
test_data/ # Historical datasets for backwards compatibility tests
40+
```
41+
42+
## Key Design Patterns
43+
44+
1. **Opaque handles**: All Rust objects exposed as `*mut T` opaque pointers with explicit `lance_*_open`/`lance_*_close` lifecycle functions. No struct layout leaks across the FFI boundary.
45+
46+
2. **Thread-local error handling**: Errors stored in thread-local `RefCell` with `lance_last_error_code()` and `lance_last_error_message()`. The `ffi_try!` macro wraps function bodies to set errors and return null or -1.
47+
48+
3. **Arrow C Data Interface**: All data crosses the FFI boundary as `ArrowArray`/`ArrowSchema`/`ArrowArrayStream` structs — zero-copy, no custom serialization.
49+
50+
4. **Dual async model**: Three scan modes for different consumers:
51+
- `lance_scanner_to_arrow_stream()` / `lance_scanner_next()` — blocking, for simple consumers
52+
- `lance_scanner_scan_async()` — callback-based, for event-driven systems
53+
- `lance_scanner_poll_next()` — poll + waker, for cooperative async runtimes (Velox/Folly)
54+
55+
5. **Panic safety**: Release profile uses `panic = "abort"` to prevent undefined behavior from Rust panics unwinding across FFI boundaries.
56+
57+
## Common Development Commands
58+
59+
* Check for build errors: `cargo check --all-targets`
60+
* Run tests: `cargo test`
61+
* Run C/C++ compilation tests: `cargo test --test compile_and_run_test -- --ignored`
62+
* Lint: `cargo clippy --all-targets -- -D warnings`
63+
* Format: `cargo fmt`
64+
65+
## Key Technical Details
66+
67+
1. **Tokio runtime**: A global multi-threaded Tokio runtime (`LazyLock<Runtime>`) is initialized on first use. Blocking APIs use `block_on()` to bridge async/sync. The poll API uses `RT.enter()` to provide reactor context for non-Tokio caller threads.
68+
69+
2. **C++ wrappers**: `lance.hpp` is a header-only library providing RAII handles (`Handle<T, Deleter>`), exception-based error handling, and builder pattern for Scanner.
70+
71+
3. **Dependencies**: This crate depends on stable releases from [lance-format/lance](https://github.com/lance-format/lance) on crates.io (e.g., `lance = "3.0.1"`). Arrow version must match what lance uses (currently arrow 57.x).
72+
73+
4. **Build requirements**: `protobuf-compiler` (`protoc`) is required because `lance-encoding` has a protobuf build script.
74+
75+
## Development Tips
76+
77+
### FFI Safety
78+
79+
* Every `extern "C"` function must check for null pointers before dereferencing.
80+
* Use the `ffi_try!` macro to convert `Result` into error codes + thread-local error messages.
81+
* Never return Rust types across the FFI boundary — use `Box::into_raw` for heap allocation and `Box::from_raw` for deallocation.
82+
* When adding new `extern "C"` functions, also add them to `include/lance.h` and `include/lance.hpp`.
83+
84+
### Tests
85+
86+
* Prefer `tempfile::tempdir()` for test datasets.
87+
* For backwards compatibility, use the `test_data` directory with checked-in datasets and a `datagen.py` script.
88+
* The `compile_and_run_test.rs` tests build the cdylib, compile C/C++ programs against it, and run them. These require C/C++ compilers and are gated behind `#[ignore]`.
89+
90+
### Adding New APIs
91+
92+
1. Implement the Rust function in the appropriate module (e.g., `dataset.rs`, `scanner.rs`).
93+
2. Export it as `#[unsafe(no_mangle)] pub unsafe extern "C" fn lance_*()`.
94+
3. Add the declaration to `include/lance.h`.
95+
4. Add a C++ wrapper to `include/lance.hpp`.
96+
5. Add tests in `tests/c_api_test.rs` and optionally in `tests/cpp/`.
97+
98+
## Review Guidelines
99+
100+
Please note that the attention of contributors and maintainers is the MOST valuable resource. Less is more: focus on the most important aspects.
101+
102+
- Your review output SHOULD be concise and clear.
103+
- You SHOULD only highlight P0 and P1 level issues, such as severe bugs, performance degradation, or security concerns.
104+
- You MUST not reiterate detailed changes in your review.
105+
- You MUST not repeat aspects of the PR that are already well done.
106+
107+
### FFI-specific review concerns
108+
109+
* Ensure every pointer is null-checked before use.
110+
* Ensure every `Box::into_raw` has a corresponding `Box::from_raw` in a close/free function.
111+
* Ensure error paths don't leak memory (e.g., FFI stream pointers must be freed on failure).
112+
* Ensure `lance.h` and `lance.hpp` stay in sync with the Rust exports.
113+
114+
### Testing
115+
116+
* Ensure all new C API functions have tests in `c_api_test.rs`.
117+
* Ensure that all bugfixes and features have corresponding tests. **We do not merge code without tests.**

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

0 commit comments

Comments
 (0)