chore: fix strict clippy violations and apply rustfmt by anaslimem · Pull Request #8 · anaslimem/CortexaDB

anaslimem · 2026-03-02T16:58:37Z

Summary

remove unsupported Clippy config key ()
apply rustfmt across workspace
fix all strict clippy diagnostics instead of suppressing them
refactor combined index APIs to reduce argument count via and

Validation

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings

Notes

work was done on branch per request

…eight normalization

Copilot

Pull request overview

This PR focuses on bringing the workspace into strict clippy -D warnings compliance and consistent formatting, while also making a few behavior/API refinements (query intent anchors, chunking edge-cases, and combined index call ergonomics).

Changes:

Replace ad-hoc stderr printing with log in core runtime paths and adjust dependencies accordingly.
Improve/expand chunking logic and test coverage (Rust + Python wrapper edge cases).
Refactor/extend query/index APIs (intent anchors in QueryOptions, new builder for DB config, and grouping combined-index parameters).

Reviewed changes

Copilot reviewed 15 out of 19 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
crates/cortexadb-py/cortexadb/chunker.py	Adds legacy `chunk_text()` guards for empty/short inputs.
crates/cortexadb-core/tests/integration.rs	New end-to-end integration tests covering WAL recovery, checkpointing, deletion, graph edges, namespaces, metadata, and capacity eviction.
crates/cortexadb-core/src/store.rs	Switches diagnostics to `log` and adds intent-anchor-based option adjustments during query planning.
crates/cortexadb-core/src/query/mod.rs	Re-exports updated query types, including `IntentAnchors`.
crates/cortexadb-core/src/query/hybrid.rs	Adds `IntentAnchors`, relaxes `ScoreWeights` normalization, and expands tests.
crates/cortexadb-core/src/lib.rs	Reorders re-exports (rustfmt/clippy cleanup).
crates/cortexadb-core/src/index/vector.rs	Adds `Default` derive, minor API cleanups, and switches to `log` for HNSW fallback warnings.
crates/cortexadb-core/src/index/temporal.rs	Rustfmt-only formatting change.
crates/cortexadb-core/src/index/hnsw.rs	Derive defaults + minor type cleanup in result mapping.
crates/cortexadb-core/src/index/graph.rs	Small BFS optimization and assertion cleanup.
crates/cortexadb-core/src/index/combined.rs	Introduces `TimeRange`/`GraphScope` to reduce argument count in combined-index APIs.
crates/cortexadb-core/src/facade.rs	Removes config `Default`, adds `CortexaDBBuilder`, changes `open()` signature, and tweaks namespace query over-fetching.
crates/cortexadb-core/src/core/state_machine.rs	Minor clippy-driven simplifications (`or_default`).
crates/cortexadb-core/src/chunker.rs	Substantial fixed chunking refactor + significantly expanded tests.
crates/cortexadb-core/src/bin/sync_bench.rs	Formatting / minor clippy cleanups.
crates/cortexadb-core/src/bin/startup_bench.rs	Import ordering (rustfmt).
crates/cortexadb-core/src/bin/monkey_writer.rs	Switches progress output to `log`.
crates/cortexadb-core/src/bin/monkey_verify.rs	Formatting / minor clippy cleanups.
crates/cortexadb-core/src/bin/manual_store.rs	Formatting / minor clippy cleanups.
crates/cortexadb-core/benches/storage_bench.rs	Import ordering (rustfmt).
crates/cortexadb-core/Cargo.toml	Adds `log` dependency and `serial_test` dev-dependency.
clippy.toml	Removes unsupported clippy config key.
Cargo.lock	Updates lockfile for new deps and version bumps.
.gitignore	Ignores `skills/` directory.

Comments suppressed due to low confidence (7)

crates/cortexadb-core/src/store.rs:586

query_with_snapshot applies intent adjustments by calling embedder.embed(query_text) when options.intent_anchors is set, but the query execution path embeds the query again later. For expensive embedders this adds avoidable latency. Consider moving the intent_anchors adjustment to the executor (after the embedding is already computed) or passing a precomputed embedding through, so the query is embedded only once.

        let snapshot = self.snapshot();

        if let Some(anchors) = options.intent_anchors.take() {
            Self::apply_intent_adjustments(&mut options, &anchors, query_text, embedder);
        }

crates/cortexadb-core/src/store.rs:619

apply_intent_adjustments unconditionally overwrites options.score_weights when anchors are present. This differs from the existing intent auto-adjust behavior in the executor (which only adjusts when weights are still default) and can unexpectedly override caller-specified weights. Consider applying the same guard (only auto-adjust when weights are still default), or clearly documenting that providing intent_anchors implies weights will be overridden.

            ) {
                options.score_weights = adj.score_weights;
                if let Some(exp) = options.graph_expansion.as_mut() {
                    exp.hops = adj.graph_hops;
                }
            }

crates/cortexadb-core/src/facade.rs:550

This test config sets checkpoint_policy: Disabled in the struct literal and then sets config.checkpoint_policy = Disabled again immediately after. The second assignment is redundant and can be removed to keep the test setup minimal.

        let mut config = CortexaDBConfig {
            vector_dimension: 3,
            sync_policy: crate::engine::SyncPolicy::Strict,
            checkpoint_policy: crate::store::CheckpointPolicy::Disabled,
            capacity_policy: crate::engine::CapacityPolicy::new(None, None),
            index_mode: crate::index::IndexMode::Exact,
        };
        config.checkpoint_policy = crate::store::CheckpointPolicy::Disabled;

crates/cortexadb-core/src/facade.rs:366

This over-fetch implementation increases QueryOptions.top_k (4×) to get more candidates, which also increases the amount of ranking/sorting work inside the query engine (since results are truncated to options.top_k). If the goal is strictly to increase recall before namespace filtering while still returning top_k, consider over-fetching via candidate_multiplier / a dedicated “candidate_k” knob instead of inflating top_k, so downstream ranking work stays proportional to the requested result count.

        let mut options = QueryOptions::with_top_k(top_k.saturating_mul(4).max(top_k));
        options.namespace = Some(namespace.to_string());
        options.metadata_filter = metadata_filter;
        let execution = self.inner.query("", options, &embedder)?;

crates/cortexadb-core/src/chunker.rs:55

chunk_fixed treats overlap as a character budget (see comment “overlap, measured in chars”), but apply_overlap (used by recursive/semantic/markdown) interprets overlap as a word count. This makes overlap semantics inconsistent across strategies and also affects chunk_recursive’s fixed fallback (it passes the same overlap value into chunk_fixed). Consider standardizing on one unit (words or chars) across all strategies, and updating the calculation accordingly to avoid surprising behavior for callers.

fn chunk_fixed(text: &str, chunk_size: usize, overlap: usize) -> Vec<ChunkResult> {
    let text = text.trim();
    if text.is_empty() || chunk_size == 0 {
        return vec![];
    }

    // overlap >= chunk_size → step would be 0 → infinite loop guard
    if overlap >= chunk_size {
        return vec![];
    }

crates/cortexadb-core/src/chunker.rs:216

In the !split_done fallback, chunk_recursive calls chunk_fixed(text, chunk_size, overlap) and then later calls apply_overlap(chunks, overlap) on the whole result. This effectively applies overlap twice (once inside chunk_fixed, once in apply_overlap) and with potentially different units. To keep overlap behavior predictable, consider generating non-overlapping fixed chunks here (e.g., pass overlap=0) and relying on apply_overlap, or alternatively skip apply_overlap when the fallback path is used.

        if !split_done {
            // Fall back to fixed chunking when no separator works.
            let fixed_chunks = chunk_fixed(text, chunk_size, overlap);
            for chunk in fixed_chunks {
                chunks.push(ChunkResult { text: chunk.text, index: *index, metadata: None });
                *index += 1;
            }
        }

crates/cortexadb-core/src/query/hybrid.rs:82

ScoreWeights::normalized() now allows totals in 98..=102 and normalizes by the actual total, but other execution paths still enforce total == 100 (e.g. query/executor.rs re-derives weights and errors when total != 100). This makes the public ScoreWeights behavior inconsistent depending on which query engine is used. Consider centralizing weight validation/normalization (reuse ScoreWeights::normalized() everywhere) so the tolerance change applies consistently and doesn’t reintroduce errors in the planner/executor path.

    fn normalized(self) -> Result<(f32, f32, f32)> {
        let total =
            self.similarity_pct as u16 + self.importance_pct as u16 + self.recency_pct as u16;
        if !(98..=102).contains(&total) {
            return Err(HybridQueryError::InvalidScoreWeights {
                similarity_pct: self.similarity_pct,
                importance_pct: self.importance_pct,
                recency_pct: self.recency_pct,
            });
        }
        // Divide by actual total so the three floats always sum to exactly 1.0.
        let t = total as f32;
        Ok((
            self.similarity_pct as f32 / t,
            self.importance_pct as f32 / t,
            self.recency_pct as f32 / t,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

anaslimem added 7 commits March 1, 2026 16:44

Updated Test Coverage

610e00d

Fixed the test

67351f7

Implement intent-based query adjustments and add tolerance to score w…

24ac915

…eight normalization

Updated features

dbcffa0

Updated the chunker

f8feaff

Fixed the error in the CI

07b13df

Resolved clippy warnings and apply rustfmt across workspace

bcc82c1

Copilot AI review requested due to automatic review settings March 2, 2026 16:58

Copilot started reviewing on behalf of anaslimem March 2, 2026 16:59 View session

anaslimem and others added 2 commits March 2, 2026 18:01

Merge branch 'main' into features

117011c

fix: remove duplicate markdown chunker tests causing E0428

e584543

Copilot AI reviewed Mar 2, 2026

View reviewed changes

anaslimem merged commit 3b5ed07 into main Mar 2, 2026
1 check passed

anaslimem deleted the features branch March 7, 2026 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: fix strict clippy violations and apply rustfmt#8

chore: fix strict clippy violations and apply rustfmt#8
anaslimem merged 9 commits intomainfrom
features

anaslimem commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anaslimem commented Mar 2, 2026

Summary

Validation

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants