docs: add AGENTS.md and CLAUDE.md for coding agent guidance

jja725 · claude · jja725 · commit d6e49eab3965 · 2026-03-24T13:13:48.000-07:00
Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,117 @@
+# AGENTS.md
+
+This file provides guidance to coding agents collaborating on this repository.
+
+## Project Overview
+
+`lance-c` provides C/C++ bindings for the [Lance](https://github.com/lance-format/lance) columnar data format. It exposes Lance's functionality through a stable C-ABI with opaque handle patterns and Arrow C Data Interface for zero-copy data exchange.
+
+Target consumers include C++ query engines (Velox, DuckDB), ML frameworks, and any language with C FFI capabilities.
+
+## Project Requirements
+
+- Always use English in code, examples, and comments.
+- Features should be implemented concisely, maintainably, and efficiently.
+- Code is not just for execution, but also for readability.
+- Only add meaningful comments and tests.
+
+## Architecture
+
+```
+include/
+├── lance.h         # C header (stable ABI) — all extern "C" function declarations
+└── lance.hpp       # C++ RAII wrappers (header-only)
+src/
+├── lib.rs          # Module root, re-exports all extern "C" symbols
+├── error.rs        # Thread-local error handling (error codes + messages)
+├── dataset.rs      # Dataset lifecycle: open, close, metadata, schema, take
+├── scanner.rs      # Scanner builder + three scan modes (sync, async, poll)
+├── batch.rs        # LanceBatch: Arrow C Data Interface export
+├── helpers.rs      # C string parsing utilities
+├── runtime.rs      # Global Tokio runtime
+└── async_dispatcher.rs  # Callback dispatcher for async scan
+tests/
+├── c_api_test.rs           # Rust integration tests calling C API directly
+├── compile_and_run_test.rs # Compiles and runs C/C++ test programs
+└── cpp/
+    ├── test_c_api.c        # C test program
+    └── test_cpp_api.cpp    # C++ test program
+test_data/                  # Historical datasets for backwards compatibility tests
+```
+
+## Key Design Patterns
+
+1. **Opaque handles**: All Rust objects exposed as `*mut T` opaque pointers with explicit `lance_*_open`/`lance_*_close` lifecycle functions. No struct layout leaks across the FFI boundary.
+
+2. **Thread-local error handling**: Errors stored in thread-local `RefCell` with `lance_last_error_code()` and `lance_last_error_message()`. The `ffi_try!` macro wraps function bodies to set errors and return null or -1.
+
+3. **Arrow C Data Interface**: All data crosses the FFI boundary as `ArrowArray`/`ArrowSchema`/`ArrowArrayStream` structs — zero-copy, no custom serialization.
+
+4. **Dual async model**: Three scan modes for different consumers:
+   - `lance_scanner_to_arrow_stream()` / `lance_scanner_next()` — blocking, for simple consumers
+   - `lance_scanner_scan_async()` — callback-based, for event-driven systems
+   - `lance_scanner_poll_next()` — poll + waker, for cooperative async runtimes (Velox/Folly)
+
+5. **Panic safety**: Release profile uses `panic = "abort"` to prevent undefined behavior from Rust panics unwinding across FFI boundaries.
+
+## Common Development Commands
+
+* Check for build errors: `cargo check --all-targets`
+* Run tests: `cargo test`
+* Run C/C++ compilation tests: `cargo test --test compile_and_run_test -- --ignored`
+* Lint: `cargo clippy --all-targets -- -D warnings`
+* Format: `cargo fmt`
+
+## Key Technical Details
+
+1. **Tokio runtime**: A global multi-threaded Tokio runtime (`LazyLock<Runtime>`) is initialized on first use. Blocking APIs use `block_on()` to bridge async/sync. The poll API uses `RT.enter()` to provide reactor context for non-Tokio caller threads.
+
+2. **C++ wrappers**: `lance.hpp` is a header-only library providing RAII handles (`Handle<T, Deleter>`), exception-based error handling, and builder pattern for Scanner.
+
+3. **Dependencies**: This crate depends on stable releases from [lance-format/lance](https://github.com/lance-format/lance) on crates.io (e.g., `lance = "3.0.1"`). Arrow version must match what lance uses (currently arrow 57.x).
+
+4. **Build requirements**: `protobuf-compiler` (`protoc`) is required because `lance-encoding` has a protobuf build script.
+
+## Development Tips
+
+### FFI Safety
+
+* Every `extern "C"` function must check for null pointers before dereferencing.
+* Use the `ffi_try!` macro to convert `Result` into error codes + thread-local error messages.
+* Never return Rust types across the FFI boundary — use `Box::into_raw` for heap allocation and `Box::from_raw` for deallocation.
+* When adding new `extern "C"` functions, also add them to `include/lance.h` and `include/lance.hpp`.
+
+### Tests
+
+* Prefer `tempfile::tempdir()` for test datasets.
+* For backwards compatibility, use the `test_data` directory with checked-in datasets and a `datagen.py` script.
+* The `compile_and_run_test.rs` tests build the cdylib, compile C/C++ programs against it, and run them. These require C/C++ compilers and are gated behind `#[ignore]`.
+
+### Adding New APIs
+
+1. Implement the Rust function in the appropriate module (e.g., `dataset.rs`, `scanner.rs`).
+2. Export it as `#[unsafe(no_mangle)] pub unsafe extern "C" fn lance_*()`.
+3. Add the declaration to `include/lance.h`.
+4. Add a C++ wrapper to `include/lance.hpp`.
+5. Add tests in `tests/c_api_test.rs` and optionally in `tests/cpp/`.
+
+## Review Guidelines
+
+Please note that the attention of contributors and maintainers is the MOST valuable resource. Less is more: focus on the most important aspects.
+
+- Your review output SHOULD be concise and clear.
+- You SHOULD only highlight P0 and P1 level issues, such as severe bugs, performance degradation, or security concerns.
+- You MUST not reiterate detailed changes in your review.
+- You MUST not repeat aspects of the PR that are already well done.
+
+### FFI-specific review concerns
+
+* Ensure every pointer is null-checked before use.
+* Ensure every `Box::into_raw` has a corresponding `Box::from_raw` in a close/free function.
+* Ensure error paths don't leak memory (e.g., FFI stream pointers must be freed on failure).
+* Ensure `lance.h` and `lance.hpp` stay in sync with the Rust exports.
+
+### Testing
+
+* Ensure all new C API functions have tests in `c_api_test.rs`.
+* Ensure that all bugfixes and features have corresponding tests. **We do not merge code without tests.**
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md