Diagnose and debug your Retrieval Augmented Generation pipelines.
RAG Doctor is an open-source CLI tool and embeddable engine that analyzes RAG execution traces and produces actionable diagnostic findings explaining why a RAG answer might fail.
RAG pipelines fail in subtle ways:
- ๐ Low retrieval scores โ the embedding model is misaligned with your domain
- ๐ Duplicate chunks โ near-identical text dilutes your context window
- ๐ฆ Oversized chunks โ individual documents are too long for the model to reason over
- ๐ Context overload โ too many retrieved documents bury the relevant signal
RAG Doctor gives you a structured, automated way to detect and fix these issues before they reach production โ as a one-shot CLI command, a CI gate, or an embeddable TypeScript library.
# Global install
npm install -g rag-doctor
# Or run without installing
npx rag-doctor analyze trace.json# Analyze a trace file
rag-doctor analyze trace.json
# Output results as JSON (for CI integration)
rag-doctor analyze trace.json --json
# Show help
rag-doctor --help1. Create a trace file (trace.json):
{
"query": "How do I reset my password?",
"retrievedChunks": [
{
"id": "chunk-1",
"text": "To reset your password, go to account settings and click 'Forgot password'.",
"score": 0.82,
"source": "help-center.md"
},
{
"id": "chunk-2",
"text": "Password resets expire after 24 hours. Request a new link if yours has expired.",
"score": 0.75,
"source": "faq.md"
}
],
"finalAnswer": "Go to account settings and click 'Forgot password'.",
"metadata": { "model": "gpt-4o" }
}2. Run the analyzer:
rag-doctor analyze trace.json3. Read the report:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RAG Doctor Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total findings: 0
High: 0
Medium: 0
Low: 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ No issues detected
A clean bill of health. Now try the included example that triggers findings:
npx rag-doctor analyze examples/low-score-trace.jsonA trace file is a JSON snapshot of a single RAG pipeline execution โ the query, the retrieved chunks, the generated answer, and any metadata your system provides.
{
"query": "How do I reset my password?",
"retrievedChunks": []
}{
"query": "How do I reset my password?",
"retrievedChunks": [
{
"id": "1",
"text": "To reset your password, go to account settings...",
"score": 0.82,
"source": "help-center.md"
}
],
"finalAnswer": "Go to account settings and click reset password.",
"metadata": {
"model": "gpt-4o",
"timestamp": "2024-01-15T10:23:00Z"
}
}| Field | Type | Required | Description |
|---|---|---|---|
query |
string | โ | The original user query โ must be non-empty |
retrievedChunks |
array | โ | Chunks retrieved from the vector store |
retrievedChunks[].id |
string | โ | Unique chunk identifier โ must be non-empty |
retrievedChunks[].text |
string | โ | Raw text content |
retrievedChunks[].score |
number | โ | Relevance score (typically 0โ1, must be finite) |
retrievedChunks[].source |
string | โ | Source document reference |
finalAnswer |
string | โ | The generated LLM answer |
metadata |
object | โ | Optional metadata (model names, timestamps, etc.) |
Tip:
scoreis optional but recommended. Rules likelow-retrieval-scoreare skipped entirely if no chunks have scores โ there are no false positives on unscored traces.
Every trace goes through the shared @rag-doctor/ingestion pipeline before analysis:
- Schema validation โ all required fields are present and have the correct types. All issues are collected in a single pass so you see every problem at once.
- Normalization โ the validated object is converted into a canonical
NormalizedTrace: query whitespace is trimmed, optional arrays default to[], and unknown extra fields are silently ignored.
If validation fails, RAG Doctor exits with a structured error:
Error: Invalid trace format: Trace validation failed
โข retrievedChunks[1].score: expected number, got string
โข retrievedChunks[2].id: expected non-empty string, got missing
In --json mode, the error is a machine-readable payload:
{
"code": "INVALID_TRACE_SCHEMA",
"message": "Trace validation failed",
"issues": [
{
"path": "retrievedChunks[1].score",
"expected": "number",
"received": "string"
}
]
}| Error | Cause |
|---|---|
query: expected non-empty string, got missing |
The query field is absent or null |
query: expected non-empty string, got empty string |
The query is whitespace-only |
retrievedChunks: expected array, got missing |
The retrievedChunks field is absent |
retrievedChunks[N].id: expected non-empty string, got missing |
A chunk is missing its id |
retrievedChunks[N].score: expected number, got string |
A score was provided as a quoted string instead of a number |
retrievedChunks[N].score: expected finite number, got Infinity |
A score is Infinity or NaN |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RAG Doctor Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total findings: 2
High: 1
Medium: 1
Low: 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Findings:
[HIGH] Average retrieval score is 0.220
โ Check your embedding model alignment with your domain.
[MEDIUM] Found 1 near-duplicate chunk pair(s).
โ Implement deduplication in your chunking pipeline.
rag-doctor analyze trace.json --json{
"findings": [
{
"ruleId": "low-retrieval-score",
"ruleName": "Low Retrieval Score",
"severity": "high",
"message": "Average retrieval score is 0.220",
"recommendation": "Check your embedding model alignment with your domain.",
"details": {
"averageScore": 0.22,
"threshold": 0.5,
"chunksEvaluated": 5,
"lowestChunks": [...]
}
}
],
"summary": {
"high": 1,
"medium": 0,
"low": 0
}
}CI gate: In terminal mode, the CLI exits with code
1when anyhigh-severity finding is present โ making it a natural CI gate. JSON mode always exits0so downstream scripts can process results programmatically.
The diagnose command goes one step further than analyze โ it infers the most likely root cause(s) of your RAG pipeline's problems and suggests concrete fixes.
rag-doctor diagnose examples/low-score-trace.jsonExample output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RAG Doctor Diagnosis
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Primary root cause:
[HIGH CONFIDENCE] Retrieval Quality Degradation
The trace shows weak retrieval relevance signals, suggesting the retriever
returned low-value context for the query.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Evidence:
[HIGH] Average retrieval score is 0.220
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Recommendations:
โ Check embedding model quality and ensure it is aligned with your domain
โ Verify retriever relevance by inspecting returned chunk content
โ Consider adding a reranker to promote the most relevant results
When multiple rules fire, the most severe finding becomes the primary cause and the rest are listed as contributing causes. Use --json to get machine-readable output for programmatic processing:
rag-doctor diagnose trace.json --json{
"primaryCause": {
"id": "retrieval-quality-degradation",
"title": "Retrieval Quality Degradation",
"confidence": "high",
"summary": "The trace shows weak retrieval relevance signals..."
},
"contributingCauses": [],
"evidence": [
{
"findingRuleId": "low-retrieval-score",
"findingMessage": "Average retrieval score is 0.220",
"severity": "high"
}
],
"recommendations": [
"Check embedding model quality and ensure it is aligned with your domain",
"Verify retriever relevance by inspecting returned chunk content",
"Consider adding a reranker to promote the most relevant results"
]
}| Cause ID | Triggered By | Description |
|---|---|---|
retrieval-quality-degradation |
low-retrieval-score |
Retriever returned low-relevance chunks |
duplicate-context-pollution |
duplicate-chunks |
Near-duplicate chunks dilute context quality |
oversized-chunking-strategy |
oversized-chunk |
Chunks are too large, inflating token usage |
excessive-context-volume |
context-overload |
Too many chunks increase noise in the prompt |
| Rule ID | Severity | Default Threshold | Description |
|---|---|---|---|
low-retrieval-score |
๐ด high | avg score < 0.5 | Flags traces where chunks have poor relevance scores |
duplicate-chunks |
๐ก medium | Jaccard similarity โฅ 0.8 | Detects near-duplicate retrieved chunks |
context-overload |
๐ก medium | > 10 chunks | Flags traces with too many retrieved documents |
oversized-chunk |
๐ข low | text length > 1200 chars | Flags individual chunks that are too long |
All thresholds are configurable. See Configuring Rules below.
RAG Doctor supports configurable rule thresholds and reusable rule packs.
| Pack | Description | Key differences from defaults |
|---|---|---|
recommended |
All rules with balanced defaults | Same as default behavior |
strict |
All rules with tighter thresholds | similarityThreshold: 0.7, averageScoreThreshold: 0.6, maxChunkLength: 1000, maxChunkCount: 8 |
Create a rag-doctor.config.json file:
{
"packs": ["recommended"],
"ruleOptions": {
"low-retrieval-score": {
"averageScoreThreshold": 0.6
},
"context-overload": {
"maxChunkCount": 8
}
}
}Pass it to any command with --config:
rag-doctor analyze trace.json --config rag-doctor.config.json
rag-doctor diagnose trace.json --config rag-doctor.config.json| Rule ID | Option | Type | Default | Constraint |
|---|---|---|---|---|
duplicate-chunks |
similarityThreshold |
number |
0.8 |
> 0 and <= 1 |
low-retrieval-score |
averageScoreThreshold |
number |
0.5 |
>= 0 and <= 1 |
oversized-chunk |
maxChunkLength |
integer |
1200 |
positive integer |
context-overload |
maxChunkCount |
integer |
10 |
positive integer |
Use the built-in strict pack for more demanding quality gates:
# Create rag-doctor.config.json with strict pack
echo '{ "packs": ["strict"] }' > rag-doctor.config.json
rag-doctor analyze trace.json --config rag-doctor.config.jsonimport { analyzeTrace } from "@rag-doctor/core";
import { ingestTrace } from "@rag-doctor/ingestion";
const trace = ingestTrace(rawJson);
// Use the strict pack
const result = analyzeTrace(trace, { packs: ["strict"] });
// Use recommended pack with custom overrides
const result2 = analyzeTrace(trace, {
packs: ["recommended"],
ruleOptions: {
"low-retrieval-score": { averageScoreThreshold: 0.6 },
"context-overload": { maxChunkCount: 8 },
},
});Each built-in rule can also be instantiated directly with typed options:
import {
createLowRetrievalScoreRule,
createContextOverloadRule,
RuleConfigurationError,
} from "@rag-doctor/rules";
try {
const strictScoreRule = createLowRetrievalScoreRule({ averageScoreThreshold: 0.7 });
const tightOverloadRule = createContextOverloadRule({ maxChunkCount: 5 });
const result = analyzeTrace(trace, { rules: [strictScoreRule, tightOverloadRule] });
} catch (err) {
if (err instanceof RuleConfigurationError) {
console.error(`Rule "${err.ruleId}": bad option "${err.optionKey}" โ ${err.constraint}`);
}
}RAG Doctor supports multiple trace formats. The adapter layer auto-detects the format, or you can specify it explicitly with --format.
RAG Doctor's native format. Auto-detected when the input has query and retrievedChunks.
{
"query": "What is RAG?",
"retrievedChunks": [{ "id": "c1", "text": "RAG is...", "score": 0.91 }],
"finalAnswer": "RAG is..."
}A generic event-based RAG trace. Auto-detected when the input has an events array.
{
"events": [
{ "type": "query.received", "query": "What is RAG?" },
{ "type": "retrieval.completed", "chunks": [
{ "id": "c1", "text": "RAG is...", "score": 0.91, "source": "wiki" }
]},
{ "type": "answer.generated", "answer": "RAG combines retrieval with generation." }
],
"metadata": { "pipeline": "custom-rag" }
}A simplified LangChain-style trace. Auto-detected when the input has input and retrieverOutput.
{
"input": "How does chunking affect retrieval?",
"retrieverOutput": [
{ "pageContent": "Smaller chunks improve precision.", "metadata": { "source": "doc-1" }, "score": 0.72 }
],
"output": "Chunking strongly influences quality."
}A simplified LangSmith-inspired trace. Auto-detected when the input has run_type, inputs, and outputs.
{
"run_type": "chain",
"inputs": { "question": "Why do duplicate chunks hurt RAG?" },
"outputs": { "answer": "Duplicate chunks waste context budget." },
"retrieval": {
"documents": [
{ "id": "doc-a", "content": "Duplicates repeat context.", "score": 0.64, "source": "guide" }
]
},
"extra": { "project": "rag-eval" }
}# Auto-detect format (recommended)
rag-doctor analyze trace.json
# Explicit format
rag-doctor analyze langchain-trace.json --format langchain
rag-doctor diagnose langsmith-trace.json --format langsmith
# Combine with other flags
rag-doctor analyze trace.json --format event-trace --json
rag-doctor analyze trace.json --format langchain --config rag-doctor.config.jsonimport { adaptTrace, detectTraceFormat } from "@rag-doctor/adapters";
import { ingestTrace } from "@rag-doctor/ingestion";
import { analyzeTrace } from "@rag-doctor/core";
const rawJson = JSON.parse(fs.readFileSync("langchain-trace.json", "utf-8"));
// Auto-detect and adapt
const adapted = adaptTrace(rawJson);
console.log(adapted.format); // "langchain"
console.log(adapted.warnings); // ["Generated deterministic IDs for 2 chunk(s)..."]
// Ingest and analyze
const trace = ingestTrace(adapted.trace);
const result = analyzeTrace(trace);Low retrieval scores mean your vector store is returning chunks that aren't very relevant to the query. This is one of the most common RAG failure modes.
Step 1 โ Run the example low-score trace:
rag-doctor analyze examples/low-score-trace.jsonYou'll see a [HIGH] finding with the average score and the three worst-scoring chunks listed in details.
Step 2 โ Get the raw details as JSON to inspect which chunks are the culprits:
rag-doctor analyze examples/low-score-trace.json --json | jq '.findings[0].details'Step 3 โ Use the chunk IDs from lowestChunks to trace back to your vector store and inspect why those documents ranked so low.
Common fixes:
- ๐ Re-embed your documents with a domain-specific embedding model
- ๐ท๏ธ Add metadata filters so the retriever only queries relevant subsets
- ๐ Use a reranker (e.g. Cohere Rerank, BGE Reranker) as a second-pass filter
Duplicate chunks waste your context window and can cause the LLM to over-weight certain information.
Step 1 โ Inspect the duplicate-chunks example:
rag-doctor analyze tests/fixtures/broken-duplicate-trace.jsonStep 2 โ Check which chunk pairs triggered the rule:
rag-doctor analyze tests/fixtures/broken-duplicate-trace.json --json | jq '.findings[0].details.pairs'Step 3 โ Implement deduplication at ingestion time using the chunk IDs.
Common fixes:
- ๐๏ธ Deduplicate at index time using hash-based or embedding-based similarity
- ๐ง Add a post-retrieval deduplication step before passing chunks to the LLM
- ๐ Use MMR (Maximal Marginal Relevance) retrieval to promote diversity
Add RAG Doctor as an automated quality gate in your pipeline:
GitHub Actions:
name: RAG Quality Gate
on: [push, pull_request]
jobs:
rag-quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Node.js
uses: actions/setup-node@v4
with:
node-version: 20
- name: Run RAG Doctor
run: npx rag-doctor analyze traces/latest-trace.json
# Exits with code 1 if any HIGH severity finding is detectedPre-commit hook (.git/hooks/pre-commit):
#!/bin/sh
if [ -f trace.json ]; then
npx rag-doctor analyze trace.json
fiThe CLI exits
0on a clean report and1on anyhigh-severity finding โ no extra configuration needed.
RAG Doctor's core engine has zero I/O dependencies and can be embedded in any environment โ API servers, VS Code extensions, browser apps, or Next.js routes.
import { analyzeTrace } from "@rag-doctor/core";
import { normalizeTrace, ParseError } from "@rag-doctor/parser";
// In an Express API route
app.post("/api/analyze", (req, res) => {
let trace;
try {
trace = normalizeTrace(req.body);
} catch (err) {
if (err instanceof ParseError) {
return res.status(400).json({ error: err.message, field: err.field });
}
throw err;
}
const result = analyzeTrace(trace);
// Gate on high-severity findings
const status = result.summary.high > 0 ? 422 : 200;
return res.status(status).json(result);
});Install only what you need:
# Just the engine (browser-safe, zero dependencies)
npm install @rag-doctor/core @rag-doctor/types
# Add the parser for input validation
npm install @rag-doctor/parser
# Add the terminal reporter for Node.js outputs
npm install @rag-doctor/reportersAny object implementing DiagnosticRule works as a rule. Here's a complete example:
// rules/empty-answer.rule.ts
import type { DiagnosticRule, DiagnosticFinding, NormalizedTrace } from "@rag-doctor/types";
export const EmptyAnswerRule: DiagnosticRule = {
id: "empty-answer",
name: "Empty Final Answer",
run(trace: NormalizedTrace): DiagnosticFinding[] {
if (trace.finalAnswer && trace.finalAnswer.trim().length > 0) {
return [];
}
return [
{
ruleId: this.id,
ruleName: this.name,
severity: "high",
message: "The pipeline produced no final answer.",
recommendation:
"Check that your LLM call is completing and that the response is captured correctly.",
details: {
hadFinalAnswer: !!trace.finalAnswer,
query: trace.query,
},
},
];
},
};Pass it alongside (or instead of) the built-in rules:
import { analyzeTrace } from "@rag-doctor/core";
import { defaultRules } from "@rag-doctor/rules";
import { EmptyAnswerRule } from "./rules/empty-answer.rule.js";
const result = analyzeTrace(trace, {
rules: [...defaultRules, EmptyAnswerRule],
});Rule authoring tips:
- โ
run()must be a pure function โ same input always produces same output - โ
Return an empty array (not
null/undefined) when the rule doesn't fire - โ
Put machine-readable data in
detailsso programmatic consumers don't need to parsemessage - โ
Use
recommendationto suggest a concrete fix, not just describe the problem
The core engine has zero CLI dependencies and can be embedded anywhere.
import { ingestTrace, TraceValidationError } from "@rag-doctor/ingestion";
import { analyzeTrace } from "@rag-doctor/core";
import fs from "fs";
const rawJson = JSON.parse(fs.readFileSync("trace.json", "utf-8"));
try {
const trace = ingestTrace(rawJson); // validates + normalizes
const result = analyzeTrace(trace);
console.log(result.summary); // { high: 1, medium: 0, low: 0 }
console.log(result.findings); // DiagnosticFinding[]
} catch (err) {
if (err instanceof TraceValidationError) {
// Structured, field-level error payload
console.error(JSON.stringify(err.toPayload(), null, 2));
// {
// "code": "INVALID_TRACE_SCHEMA",
// "message": "Trace validation failed",
// "issues": [{ "path": "retrievedChunks[0].score", "expected": "number", "received": "string" }]
// }
}
}import { ingestTrace } from "@rag-doctor/ingestion";
import { analyzeTrace, RuleConfigurationError, UnknownPackError } from "@rag-doctor/core";
import fs from "fs";
const trace = ingestTrace(JSON.parse(fs.readFileSync("trace.json", "utf-8")));
try {
const result = analyzeTrace(trace, {
packs: ["strict"], // use the strict built-in pack
ruleOptions: {
"low-retrieval-score": { averageScoreThreshold: 0.7 }, // override one threshold
},
});
console.log(result.summary);
} catch (err) {
if (err instanceof UnknownPackError) {
console.error(`Unknown pack: ${err.packName}`);
} else if (err instanceof RuleConfigurationError) {
console.error(`Bad config for rule "${err.ruleId}": ${err.message}`);
}
}import { analyzeTrace } from "@rag-doctor/core";
import { normalizeTrace } from "@rag-doctor/parser";
import { printTerminalReport } from "@rag-doctor/reporters";
import fs from "fs";
const rawJson = JSON.parse(fs.readFileSync("trace.json", "utf-8"));
const trace = normalizeTrace(rawJson);
const result = analyzeTrace(trace);
// Pretty-print to terminal
printTerminalReport(result);
// Or use structured data directly
console.log(result.summary); // { high: 1, medium: 0, low: 0 }
console.log(result.findings); // DiagnosticFinding[]Capture reporter output (no stdout side effects):
const lines: string[] = [];
printTerminalReport(result, {
write: (line) => lines.push(line),
});
// lines now contains the full formatted reportrag-doctor/
โโโ apps/
โ โโโ cli/ # ๐ CLI entry point (published as `rag-doctor`)
โโโ packages/
โ โโโ types/ # ๐ Shared TypeScript interfaces
โ โโโ adapters/ # ๐ External trace format adapters (canonical, event-trace, langchain, langsmith)
โ โโโ ingestion/ # ๐ Shared trace ingestion pipeline (validate + normalize)
โ โโโ parser/ # ๐ Trace normalizer & validator (legacy, used by ingestion)
โ โโโ rules/ # ๐ Built-in diagnostic rules
โ โโโ core/ # โ๏ธ Analysis engine (zero I/O dependencies)
โ โโโ diagnostics/ # ๐ง Root cause analysis engine
โ โโโ reporters/ # ๐จ๏ธ Terminal and future reporters
โโโ tests/
โ โโโ fixtures/ # ๐งช Shared JSON fixtures for tests
โโโ examples/
โ โโโ basic-trace.json
โ โโโ low-score-trace.json
โ โโโ context-overload-trace.json
โโโ docs/
โโโ ARCHITECTURE.md
โโโ CONTRIBUTING.md
RAG Doctor has comprehensive test coverage across all packages.
# Run all tests across the monorepo
pnpm test
# Watch mode (re-runs on file change)
pnpm test:watch
# Run tests for a single package
pnpm --filter @rag-doctor/parser test
pnpm --filter @rag-doctor/rules test
pnpm --filter @rag-doctor/core test
pnpm --filter @rag-doctor/reporters test
pnpm --filter rag-doctor test| Package | Tests | What's Covered |
|---|---|---|
@rag-doctor/adapters |
69 | Format detection (canonical, event-trace, langchain, langsmith, unknown, priority), each adapter (valid input, missing fields, generated IDs, warnings, errors), integration with ingestion |
@rag-doctor/parser |
50 | Valid traces, optional fields, all ParseError paths, chunk-level validation, score edge cases (Infinity, NaN, 0, 1), metadata passthrough |
@rag-doctor/rules |
114 | Each rule: fires / does not fire, threshold boundaries, structured details fields, edge cases; rule factories, custom thresholds, RuleConfigurationError, packs (recommended, strict), ruleOptions overrides |
@rag-doctor/core |
43 | Return shape, summary accuracy, custom rule injection, multi-rule aggregation, real rule scenarios; pack resolution, ruleOptions overrides, UnknownPackError, backward compatibility |
@rag-doctor/ingestion |
77 | Valid/invalid traces, schema validation, normalization, structured errors |
@rag-doctor/reporters |
52 | Header/structure, zero-findings message, severity labels, sort order, injected write |
rag-doctor (CLI) |
141 | Argument parsing, help display, error messages, analyze/diagnose commands, --json flag, exit codes, --config/--format flags, adapter integration, auto-detection, pack resolution, regression |
| Total | 525+ |
Fixture JSON files live in tests/fixtures/:
| File | Category | Triggers |
|---|---|---|
valid-clean-trace.json |
Valid | Nothing โ minimal clean trace (2 high-score chunks) |
valid-basic-trace.json |
Valid | Nothing โ clean full trace with all optional fields |
valid-minimal-trace.json |
Valid | Nothing โ one chunk, no optional fields |
valid-low-score-trace.json |
Valid | low-retrieval-score (HIGH) โ valid trace with low scores |
valid-medium-score-trace.json |
Valid | Nothing under default thresholds; low-retrieval-score (HIGH) under strict/tight config |
broken-low-score-trace.json |
Valid | low-retrieval-score (HIGH) โ avg score โ 0.22 |
broken-duplicate-trace.json |
Valid | duplicate-chunks (MEDIUM) โ 3 identical chunks |
context-overload-trace.json |
Valid | context-overload (MEDIUM) โ 12 retrieved chunks |
oversized-chunk-trace.json |
Valid | oversized-chunk (LOW) โ one chunk > 1200 chars |
multi-rule-trace.json |
Valid | low-retrieval-score + duplicate-chunks + context-overload simultaneously |
invalid-json.txt |
Invalid | Parse error โ not valid JSON |
invalid-schema.json |
Invalid | Schema error โ valid JSON but wrong field names |
invalid-missing-fields.json |
Invalid | Schema error โ missing query and retrievedChunks |
invalid-bad-score-type.json |
Invalid | Schema error โ scores provided as strings instead of numbers |
invalid-malformed-chunks.json |
Invalid | Schema error โ chunks array contains primitives, nulls, nested arrays |
config-recommended.json |
Config | { "packs": ["recommended"] } |
config-strict.json |
Config | { "packs": ["strict"] } |
config-tight-thresholds.json |
Config | recommended pack + tight thresholds (triggers on medium-score trace) |
config-unknown-pack.json |
Config | Invalid โ references nonexistent pack |
config-invalid-option.json |
Config | Invalid โ maxChunkCount: 0 (must be positive) |
config-invalid-json.json |
Config | Invalid โ not valid JSON |
config-not-object.json |
Config | Invalid โ root value is an array, not an object |
config-packs-not-array.json |
Config | Invalid โ packs is a string instead of an array |
event-trace-valid.json |
Adapter | Valid event-trace format (auto-detected) |
langchain-valid.json |
Adapter | Valid LangChain format (auto-detected) |
langsmith-valid.json |
Adapter | Valid LangSmith format (auto-detected) |
unknown-format.json |
Adapter | Invalid โ unrecognized format, falls through to ingestion validation |
malformed-langchain.json |
Adapter | Invalid โ LangChain-shaped but with wrong field types |
malformed-event-trace.json |
Adapter | Invalid โ event-trace shaped but events is a string |
CLI tests use two complementary layers:
- In-process unit tests (
cli.test.ts) โ inject aCliIOinterface to capture stdout/stderr and catch exit codes without spawning a subprocess. Fast, runs in milliseconds. - Subprocess integration tests (
cli.integration.test.ts) โ spawn the compileddist/bin.jsbinary viaspawnSync. Verifies real process exit codes, stdout/stderr streams, and shebang execution. Skipped automatically ifdist/bin.jshasn't been built yet.
- Node.js 18+
- pnpm 9+
git clone https://github.com/your-org/rag-doctor.git
cd rag-doctor
pnpm install
pnpm buildpnpm build # Build all packages
pnpm test # Run all tests
pnpm dev # Watch mode for all packages
pnpm lint # Lint all packages
pnpm typecheck # TypeScript type checkpnpm build
node apps/cli/dist/bin.js analyze examples/basic-trace.json
node apps/cli/dist/bin.js analyze examples/low-score-trace.json
node apps/cli/dist/bin.js analyze examples/context-overload-trace.json --jsonRAG Doctor's modular design supports these planned integrations:
| Integration | Description |
|---|---|
| ๐งฉ VS Code Extension | Inline diagnostics while editing trace files in the editor |
| โ๏ธ GitHub Action | Fail CI when high-severity issues are detected |
| โ๏ธ Cloud Dashboard | Aggregate diagnostics across production traces over time |
| ๐ฃ Custom Reporters | Output findings to Slack, PagerDuty, or Datadog |
| ๐ง LangSmith / LlamaIndex adapters | Ingest traces from popular RAG frameworks without manual conversion |
Contributions are welcome! See docs/CONTRIBUTING.md for guidelines.
MIT ยฉ RAG Doctor Contributors
