This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Build the entire project
mvn clean install
# Build without running tests
mvn clean install -DskipTestsDo not attempt to run all tests at once - be specific, as the entire test suite can take a long time to complete. When running tests, these commands should be performed relative to the astra subdirectory.
# Run a specific test
mvn test -Dtest=AstraTest
# Run a specific test method
mvn test -Dtest=AstraTest#testDistributedQueryOneIndexerOneQueryNode# Format code using Google's style guide
mvn fmt:format# Run all benchmarks
cd benchmarks
./jmh.sh
# Run specific benchmark
./jmh.sh IndexingBenchmark
# Run specific benchmark method
./jmh.sh IndexingBenchmark.measureIndexingAsKafkaSerializedDocumentAstra is designed to be run with multiple components working together. The easiest way to set up a development environment is using the provided Docker Compose file:
# Start all services defined in docker-compose.yml
docker build -t slackhq/astra .
docker-compose upThis will start all required dependencies (Zookeeper, Kafka, S3Mock, OpenZipkin) and the Astra services (Preprocessor, Index, Manager, Query, Cache, Recovery).
Configuration is managed through YAML files and environment variables. The main configuration file is in config/config.yaml. All settings can be overridden by environment variables, as shown in the Docker Compose file.
To run Astra with a specific config:
java -jar astra/target/astra.jar /path/to/config.yamlAstra is a cloud-native search and analytics engine for log, trace, and audit data, built on Apache Lucene. The system consists of multiple services that work together:
- Preprocessor (port 8086): Handles data ingestion, schema validation, and writes to Kafka
- Indexer (port 8080): Consumes data from Kafka and builds Lucene indexes
- Manager (port 8083): Coordinates between components and handles metadata
- Query (port 8081): Provides API endpoints for searching data
- Cache (port 8082): Manages cached replicas for faster queries
- Recovery (port 8085): Handles data recovery operations
Each component can be run separately using the NODE_ROLES configuration option, which makes Astra horizontally scalable.
- Data is ingested through Preprocessor (or directly to Kafka)
- Indexer consumes data from Kafka and builds Lucene indexes
- Data is stored in chunks which can be persisted to S3
- Manager coordinates replica creation and assignment
- Query service handles search requests, distributing them across nodes
- Cache service provides faster access to frequently accessed data
- ChunkManager: Manages data chunks (IndexingChunkManager, CachingChunkManager, RecoveryChunkManager)
- MetadataStore: ZooKeeper-based metadata storage for datasets, replicas, schemas, etc.
- ArmeriaService: HTTP/gRPC server for all components
- BlobStore: Interface for S3 storage
- FieldRedaction: Manages field-level redaction for sensitive data
- Astra provides OpenSearch/Elasticsearch API compatibility for easy integration with tools like Grafana
- Zipkin API for tracing support
Astra has comprehensive unit tests and integration tests. The main classes involved in testing are:
- TestKafkaServer and TestingZKServer for local test dependencies
- AstraTestExecutionListener for JUnit test setup
- S3MockExtension for S3 mock testing