Skip to content

NeuroForgeLabs/rag-doctor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฉบ RAG Doctor

Diagnose and debug your Retrieval Augmented Generation pipelines.

RAG Doctor

RAG Doctor is an open-source CLI tool and embeddable engine that analyzes RAG execution traces and produces actionable diagnostic findings explaining why a RAG answer might fail.


๐Ÿค” Why RAG Doctor?

RAG pipelines fail in subtle ways:

  • ๐Ÿ“‰ Low retrieval scores โ€” the embedding model is misaligned with your domain
  • ๐Ÿ” Duplicate chunks โ€” near-identical text dilutes your context window
  • ๐Ÿ“ฆ Oversized chunks โ€” individual documents are too long for the model to reason over
  • ๐ŸŒŠ Context overload โ€” too many retrieved documents bury the relevant signal

RAG Doctor gives you a structured, automated way to detect and fix these issues before they reach production โ€” as a one-shot CLI command, a CI gate, or an embeddable TypeScript library.


๐Ÿ“ฆ Installation

# Global install
npm install -g rag-doctor

# Or run without installing
npx rag-doctor analyze trace.json

๐Ÿš€ Quick Start

# Analyze a trace file
rag-doctor analyze trace.json

# Output results as JSON (for CI integration)
rag-doctor analyze trace.json --json

# Show help
rag-doctor --help

In 60 seconds

1. Create a trace file (trace.json):

{
  "query": "How do I reset my password?",
  "retrievedChunks": [
    {
      "id": "chunk-1",
      "text": "To reset your password, go to account settings and click 'Forgot password'.",
      "score": 0.82,
      "source": "help-center.md"
    },
    {
      "id": "chunk-2",
      "text": "Password resets expire after 24 hours. Request a new link if yours has expired.",
      "score": 0.75,
      "source": "faq.md"
    }
  ],
  "finalAnswer": "Go to account settings and click 'Forgot password'.",
  "metadata": { "model": "gpt-4o" }
}

2. Run the analyzer:

rag-doctor analyze trace.json

3. Read the report:

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  RAG Doctor Report
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  Total findings:  0
  High:            0
  Medium:          0
  Low:             0

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  โœ“ No issues detected

A clean bill of health. Now try the included example that triggers findings:

npx rag-doctor analyze examples/low-score-trace.json

๐Ÿ“‹ Trace Format

A trace file is a JSON snapshot of a single RAG pipeline execution โ€” the query, the retrieved chunks, the generated answer, and any metadata your system provides.

Minimal valid trace

{
  "query": "How do I reset my password?",
  "retrievedChunks": []
}

Full trace with all optional fields

{
  "query": "How do I reset my password?",
  "retrievedChunks": [
    {
      "id": "1",
      "text": "To reset your password, go to account settings...",
      "score": 0.82,
      "source": "help-center.md"
    }
  ],
  "finalAnswer": "Go to account settings and click reset password.",
  "metadata": {
    "model": "gpt-4o",
    "timestamp": "2024-01-15T10:23:00Z"
  }
}

Field reference

Field Type Required Description
query string โœ“ The original user query โ€” must be non-empty
retrievedChunks array โœ“ Chunks retrieved from the vector store
retrievedChunks[].id string โœ“ Unique chunk identifier โ€” must be non-empty
retrievedChunks[].text string โœ“ Raw text content
retrievedChunks[].score number โ€” Relevance score (typically 0โ€“1, must be finite)
retrievedChunks[].source string โ€” Source document reference
finalAnswer string โ€” The generated LLM answer
metadata object โ€” Optional metadata (model names, timestamps, etc.)

Tip: score is optional but recommended. Rules like low-retrieval-score are skipped entirely if no chunks have scores โ€” there are no false positives on unscored traces.

How validation works

Every trace goes through the shared @rag-doctor/ingestion pipeline before analysis:

  1. Schema validation โ€” all required fields are present and have the correct types. All issues are collected in a single pass so you see every problem at once.
  2. Normalization โ€” the validated object is converted into a canonical NormalizedTrace: query whitespace is trimmed, optional arrays default to [], and unknown extra fields are silently ignored.

If validation fails, RAG Doctor exits with a structured error:

Error: Invalid trace format: Trace validation failed
  โ€ข retrievedChunks[1].score: expected number, got string
  โ€ข retrievedChunks[2].id: expected non-empty string, got missing

In --json mode, the error is a machine-readable payload:

{
  "code": "INVALID_TRACE_SCHEMA",
  "message": "Trace validation failed",
  "issues": [
    {
      "path": "retrievedChunks[1].score",
      "expected": "number",
      "received": "string"
    }
  ]
}

Common validation errors

Error Cause
query: expected non-empty string, got missing The query field is absent or null
query: expected non-empty string, got empty string The query is whitespace-only
retrievedChunks: expected array, got missing The retrievedChunks field is absent
retrievedChunks[N].id: expected non-empty string, got missing A chunk is missing its id
retrievedChunks[N].score: expected number, got string A score was provided as a quoted string instead of a number
retrievedChunks[N].score: expected finite number, got Infinity A score is Infinity or NaN

๐Ÿ“Š Example Output

Terminal report (default)

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  RAG Doctor Report
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  Total findings:  2
  High:            1
  Medium:          1
  Low:             0

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  Findings:

  [HIGH] Average retrieval score is 0.220
  โ†’ Check your embedding model alignment with your domain.

  [MEDIUM] Found 1 near-duplicate chunk pair(s).
  โ†’ Implement deduplication in your chunking pipeline.

JSON output (--json)

rag-doctor analyze trace.json --json
{
  "findings": [
    {
      "ruleId": "low-retrieval-score",
      "ruleName": "Low Retrieval Score",
      "severity": "high",
      "message": "Average retrieval score is 0.220",
      "recommendation": "Check your embedding model alignment with your domain.",
      "details": {
        "averageScore": 0.22,
        "threshold": 0.5,
        "chunksEvaluated": 5,
        "lowestChunks": [...]
      }
    }
  ],
  "summary": {
    "high": 1,
    "medium": 0,
    "low": 0
  }
}

CI gate: In terminal mode, the CLI exits with code 1 when any high-severity finding is present โ€” making it a natural CI gate. JSON mode always exits 0 so downstream scripts can process results programmatically.


๐Ÿ” Diagnosis

The diagnose command goes one step further than analyze โ€” it infers the most likely root cause(s) of your RAG pipeline's problems and suggests concrete fixes.

rag-doctor diagnose examples/low-score-trace.json

Example output:

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  RAG Doctor Diagnosis
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Primary root cause:
  [HIGH CONFIDENCE] Retrieval Quality Degradation

  The trace shows weak retrieval relevance signals, suggesting the retriever
  returned low-value context for the query.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Evidence:
  [HIGH] Average retrieval score is 0.220

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Recommendations:
  โ†’ Check embedding model quality and ensure it is aligned with your domain
  โ†’ Verify retriever relevance by inspecting returned chunk content
  โ†’ Consider adding a reranker to promote the most relevant results

When multiple rules fire, the most severe finding becomes the primary cause and the rest are listed as contributing causes. Use --json to get machine-readable output for programmatic processing:

rag-doctor diagnose trace.json --json
{
  "primaryCause": {
    "id": "retrieval-quality-degradation",
    "title": "Retrieval Quality Degradation",
    "confidence": "high",
    "summary": "The trace shows weak retrieval relevance signals..."
  },
  "contributingCauses": [],
  "evidence": [
    {
      "findingRuleId": "low-retrieval-score",
      "findingMessage": "Average retrieval score is 0.220",
      "severity": "high"
    }
  ],
  "recommendations": [
    "Check embedding model quality and ensure it is aligned with your domain",
    "Verify retriever relevance by inspecting returned chunk content",
    "Consider adding a reranker to promote the most relevant results"
  ]
}

๐Ÿ—‚๏ธ Root Cause Categories

Cause ID Triggered By Description
retrieval-quality-degradation low-retrieval-score Retriever returned low-relevance chunks
duplicate-context-pollution duplicate-chunks Near-duplicate chunks dilute context quality
oversized-chunking-strategy oversized-chunk Chunks are too large, inflating token usage
excessive-context-volume context-overload Too many chunks increase noise in the prompt

๐Ÿ›ก๏ธ Built-in Diagnostic Rules

Rule ID Severity Default Threshold Description
low-retrieval-score ๐Ÿ”ด high avg score < 0.5 Flags traces where chunks have poor relevance scores
duplicate-chunks ๐ŸŸก medium Jaccard similarity โ‰ฅ 0.8 Detects near-duplicate retrieved chunks
context-overload ๐ŸŸก medium > 10 chunks Flags traces with too many retrieved documents
oversized-chunk ๐ŸŸข low text length > 1200 chars Flags individual chunks that are too long

All thresholds are configurable. See Configuring Rules below.


โš™๏ธ Configuring Rules

RAG Doctor supports configurable rule thresholds and reusable rule packs.

Built-in packs

Pack Description Key differences from defaults
recommended All rules with balanced defaults Same as default behavior
strict All rules with tighter thresholds similarityThreshold: 0.7, averageScoreThreshold: 0.6, maxChunkLength: 1000, maxChunkCount: 8

Config file

Create a rag-doctor.config.json file:

{
  "packs": ["recommended"],
  "ruleOptions": {
    "low-retrieval-score": {
      "averageScoreThreshold": 0.6
    },
    "context-overload": {
      "maxChunkCount": 8
    }
  }
}

Pass it to any command with --config:

rag-doctor analyze trace.json --config rag-doctor.config.json
rag-doctor diagnose trace.json --config rag-doctor.config.json

Per-rule configurable options

Rule ID Option Type Default Constraint
duplicate-chunks similarityThreshold number 0.8 > 0 and <= 1
low-retrieval-score averageScoreThreshold number 0.5 >= 0 and <= 1
oversized-chunk maxChunkLength integer 1200 positive integer
context-overload maxChunkCount integer 10 positive integer

Strict pack example

Use the built-in strict pack for more demanding quality gates:

# Create rag-doctor.config.json with strict pack
echo '{ "packs": ["strict"] }' > rag-doctor.config.json
rag-doctor analyze trace.json --config rag-doctor.config.json

Programmatic usage with packs

import { analyzeTrace } from "@rag-doctor/core";
import { ingestTrace } from "@rag-doctor/ingestion";

const trace = ingestTrace(rawJson);

// Use the strict pack
const result = analyzeTrace(trace, { packs: ["strict"] });

// Use recommended pack with custom overrides
const result2 = analyzeTrace(trace, {
  packs: ["recommended"],
  ruleOptions: {
    "low-retrieval-score": { averageScoreThreshold: 0.6 },
    "context-overload": { maxChunkCount: 8 },
  },
});

Rule factories

Each built-in rule can also be instantiated directly with typed options:

import {
  createLowRetrievalScoreRule,
  createContextOverloadRule,
  RuleConfigurationError,
} from "@rag-doctor/rules";

try {
  const strictScoreRule = createLowRetrievalScoreRule({ averageScoreThreshold: 0.7 });
  const tightOverloadRule = createContextOverloadRule({ maxChunkCount: 5 });
  const result = analyzeTrace(trace, { rules: [strictScoreRule, tightOverloadRule] });
} catch (err) {
  if (err instanceof RuleConfigurationError) {
    console.error(`Rule "${err.ruleId}": bad option "${err.optionKey}" โ€” ${err.constraint}`);
  }
}

๐ŸŒ Supported Trace Formats

RAG Doctor supports multiple trace formats. The adapter layer auto-detects the format, or you can specify it explicitly with --format.

Canonical

RAG Doctor's native format. Auto-detected when the input has query and retrievedChunks.

{
  "query": "What is RAG?",
  "retrievedChunks": [{ "id": "c1", "text": "RAG is...", "score": 0.91 }],
  "finalAnswer": "RAG is..."
}

Event-trace

A generic event-based RAG trace. Auto-detected when the input has an events array.

{
  "events": [
    { "type": "query.received", "query": "What is RAG?" },
    { "type": "retrieval.completed", "chunks": [
      { "id": "c1", "text": "RAG is...", "score": 0.91, "source": "wiki" }
    ]},
    { "type": "answer.generated", "answer": "RAG combines retrieval with generation." }
  ],
  "metadata": { "pipeline": "custom-rag" }
}

LangChain

A simplified LangChain-style trace. Auto-detected when the input has input and retrieverOutput.

{
  "input": "How does chunking affect retrieval?",
  "retrieverOutput": [
    { "pageContent": "Smaller chunks improve precision.", "metadata": { "source": "doc-1" }, "score": 0.72 }
  ],
  "output": "Chunking strongly influences quality."
}

LangSmith

A simplified LangSmith-inspired trace. Auto-detected when the input has run_type, inputs, and outputs.

{
  "run_type": "chain",
  "inputs": { "question": "Why do duplicate chunks hurt RAG?" },
  "outputs": { "answer": "Duplicate chunks waste context budget." },
  "retrieval": {
    "documents": [
      { "id": "doc-a", "content": "Duplicates repeat context.", "score": 0.64, "source": "guide" }
    ]
  },
  "extra": { "project": "rag-eval" }
}

CLI usage

# Auto-detect format (recommended)
rag-doctor analyze trace.json

# Explicit format
rag-doctor analyze langchain-trace.json --format langchain
rag-doctor diagnose langsmith-trace.json --format langsmith

# Combine with other flags
rag-doctor analyze trace.json --format event-trace --json
rag-doctor analyze trace.json --format langchain --config rag-doctor.config.json

Programmatic usage

import { adaptTrace, detectTraceFormat } from "@rag-doctor/adapters";
import { ingestTrace } from "@rag-doctor/ingestion";
import { analyzeTrace } from "@rag-doctor/core";

const rawJson = JSON.parse(fs.readFileSync("langchain-trace.json", "utf-8"));

// Auto-detect and adapt
const adapted = adaptTrace(rawJson);
console.log(adapted.format);   // "langchain"
console.log(adapted.warnings); // ["Generated deterministic IDs for 2 chunk(s)..."]

// Ingest and analyze
const trace = ingestTrace(adapted.trace);
const result = analyzeTrace(trace);

๐ŸŽ“ Tutorials

Tutorial 1: Diagnosing low retrieval scores

Low retrieval scores mean your vector store is returning chunks that aren't very relevant to the query. This is one of the most common RAG failure modes.

Step 1 โ€” Run the example low-score trace:

rag-doctor analyze examples/low-score-trace.json

You'll see a [HIGH] finding with the average score and the three worst-scoring chunks listed in details.

Step 2 โ€” Get the raw details as JSON to inspect which chunks are the culprits:

rag-doctor analyze examples/low-score-trace.json --json | jq '.findings[0].details'

Step 3 โ€” Use the chunk IDs from lowestChunks to trace back to your vector store and inspect why those documents ranked so low.

Common fixes:

  • ๐Ÿ”„ Re-embed your documents with a domain-specific embedding model
  • ๐Ÿท๏ธ Add metadata filters so the retriever only queries relevant subsets
  • ๐Ÿ“ Use a reranker (e.g. Cohere Rerank, BGE Reranker) as a second-pass filter

Tutorial 2: Eliminating duplicate chunks

Duplicate chunks waste your context window and can cause the LLM to over-weight certain information.

Step 1 โ€” Inspect the duplicate-chunks example:

rag-doctor analyze tests/fixtures/broken-duplicate-trace.json

Step 2 โ€” Check which chunk pairs triggered the rule:

rag-doctor analyze tests/fixtures/broken-duplicate-trace.json --json | jq '.findings[0].details.pairs'

Step 3 โ€” Implement deduplication at ingestion time using the chunk IDs.

Common fixes:

  • ๐Ÿ—‘๏ธ Deduplicate at index time using hash-based or embedding-based similarity
  • ๐Ÿ”ง Add a post-retrieval deduplication step before passing chunks to the LLM
  • ๐Ÿ“Š Use MMR (Maximal Marginal Relevance) retrieval to promote diversity

Tutorial 3: Using RAG Doctor in CI

Add RAG Doctor as an automated quality gate in your pipeline:

GitHub Actions:

name: RAG Quality Gate

on: [push, pull_request]

jobs:
  rag-quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Run RAG Doctor
        run: npx rag-doctor analyze traces/latest-trace.json
        # Exits with code 1 if any HIGH severity finding is detected

Pre-commit hook (.git/hooks/pre-commit):

#!/bin/sh
if [ -f trace.json ]; then
  npx rag-doctor analyze trace.json
fi

The CLI exits 0 on a clean report and 1 on any high-severity finding โ€” no extra configuration needed.


Tutorial 4: Embedding the engine in your application

RAG Doctor's core engine has zero I/O dependencies and can be embedded in any environment โ€” API servers, VS Code extensions, browser apps, or Next.js routes.

import { analyzeTrace } from "@rag-doctor/core";
import { normalizeTrace, ParseError } from "@rag-doctor/parser";

// In an Express API route
app.post("/api/analyze", (req, res) => {
  let trace;
  try {
    trace = normalizeTrace(req.body);
  } catch (err) {
    if (err instanceof ParseError) {
      return res.status(400).json({ error: err.message, field: err.field });
    }
    throw err;
  }

  const result = analyzeTrace(trace);

  // Gate on high-severity findings
  const status = result.summary.high > 0 ? 422 : 200;
  return res.status(status).json(result);
});

Install only what you need:

# Just the engine (browser-safe, zero dependencies)
npm install @rag-doctor/core @rag-doctor/types

# Add the parser for input validation
npm install @rag-doctor/parser

# Add the terminal reporter for Node.js outputs
npm install @rag-doctor/reporters

Tutorial 5: Writing a custom rule

Any object implementing DiagnosticRule works as a rule. Here's a complete example:

// rules/empty-answer.rule.ts
import type { DiagnosticRule, DiagnosticFinding, NormalizedTrace } from "@rag-doctor/types";

export const EmptyAnswerRule: DiagnosticRule = {
  id: "empty-answer",
  name: "Empty Final Answer",

  run(trace: NormalizedTrace): DiagnosticFinding[] {
    if (trace.finalAnswer && trace.finalAnswer.trim().length > 0) {
      return [];
    }
    return [
      {
        ruleId: this.id,
        ruleName: this.name,
        severity: "high",
        message: "The pipeline produced no final answer.",
        recommendation:
          "Check that your LLM call is completing and that the response is captured correctly.",
        details: {
          hadFinalAnswer: !!trace.finalAnswer,
          query: trace.query,
        },
      },
    ];
  },
};

Pass it alongside (or instead of) the built-in rules:

import { analyzeTrace } from "@rag-doctor/core";
import { defaultRules } from "@rag-doctor/rules";
import { EmptyAnswerRule } from "./rules/empty-answer.rule.js";

const result = analyzeTrace(trace, {
  rules: [...defaultRules, EmptyAnswerRule],
});

Rule authoring tips:

  • โœ… run() must be a pure function โ€” same input always produces same output
  • โœ… Return an empty array (not null/undefined) when the rule doesn't fire
  • โœ… Put machine-readable data in details so programmatic consumers don't need to parse message
  • โœ… Use recommendation to suggest a concrete fix, not just describe the problem

๐Ÿ”Œ Programmatic Usage

The core engine has zero CLI dependencies and can be embedded anywhere.

Using the shared ingestion pipeline (recommended)

import { ingestTrace, TraceValidationError } from "@rag-doctor/ingestion";
import { analyzeTrace } from "@rag-doctor/core";
import fs from "fs";

const rawJson = JSON.parse(fs.readFileSync("trace.json", "utf-8"));

try {
  const trace = ingestTrace(rawJson);  // validates + normalizes
  const result = analyzeTrace(trace);
  console.log(result.summary);   // { high: 1, medium: 0, low: 0 }
  console.log(result.findings);  // DiagnosticFinding[]
} catch (err) {
  if (err instanceof TraceValidationError) {
    // Structured, field-level error payload
    console.error(JSON.stringify(err.toPayload(), null, 2));
    // {
    //   "code": "INVALID_TRACE_SCHEMA",
    //   "message": "Trace validation failed",
    //   "issues": [{ "path": "retrievedChunks[0].score", "expected": "number", "received": "string" }]
    // }
  }
}

Using rule packs (Phase 3)

import { ingestTrace } from "@rag-doctor/ingestion";
import { analyzeTrace, RuleConfigurationError, UnknownPackError } from "@rag-doctor/core";
import fs from "fs";

const trace = ingestTrace(JSON.parse(fs.readFileSync("trace.json", "utf-8")));

try {
  const result = analyzeTrace(trace, {
    packs: ["strict"],                 // use the strict built-in pack
    ruleOptions: {
      "low-retrieval-score": { averageScoreThreshold: 0.7 },  // override one threshold
    },
  });
  console.log(result.summary);
} catch (err) {
  if (err instanceof UnknownPackError) {
    console.error(`Unknown pack: ${err.packName}`);
  } else if (err instanceof RuleConfigurationError) {
    console.error(`Bad config for rule "${err.ruleId}": ${err.message}`);
  }
}

Using lower-level packages

import { analyzeTrace } from "@rag-doctor/core";
import { normalizeTrace } from "@rag-doctor/parser";
import { printTerminalReport } from "@rag-doctor/reporters";
import fs from "fs";

const rawJson = JSON.parse(fs.readFileSync("trace.json", "utf-8"));
const trace = normalizeTrace(rawJson);
const result = analyzeTrace(trace);

// Pretty-print to terminal
printTerminalReport(result);

// Or use structured data directly
console.log(result.summary);     // { high: 1, medium: 0, low: 0 }
console.log(result.findings);    // DiagnosticFinding[]

Capture reporter output (no stdout side effects):

const lines: string[] = [];
printTerminalReport(result, {
  write: (line) => lines.push(line),
});
// lines now contains the full formatted report

๐Ÿ—๏ธ Monorepo Structure

rag-doctor/
โ”œโ”€โ”€ apps/
โ”‚   โ””โ”€โ”€ cli/              # ๐Ÿ“Ÿ CLI entry point (published as `rag-doctor`)
โ”œโ”€โ”€ packages/
โ”‚   โ”œโ”€โ”€ types/            # ๐Ÿ“ Shared TypeScript interfaces
โ”‚   โ”œโ”€โ”€ adapters/         # ๐Ÿ”Œ External trace format adapters (canonical, event-trace, langchain, langsmith)
โ”‚   โ”œโ”€โ”€ ingestion/        # ๐Ÿ”’ Shared trace ingestion pipeline (validate + normalize)
โ”‚   โ”œโ”€โ”€ parser/           # ๐Ÿ” Trace normalizer & validator (legacy, used by ingestion)
โ”‚   โ”œโ”€โ”€ rules/            # ๐Ÿ“ Built-in diagnostic rules
โ”‚   โ”œโ”€โ”€ core/             # โš™๏ธ  Analysis engine (zero I/O dependencies)
โ”‚   โ”œโ”€โ”€ diagnostics/      # ๐Ÿง  Root cause analysis engine
โ”‚   โ””โ”€โ”€ reporters/        # ๐Ÿ–จ๏ธ  Terminal and future reporters
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ fixtures/         # ๐Ÿงช Shared JSON fixtures for tests
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ basic-trace.json
โ”‚   โ”œโ”€โ”€ low-score-trace.json
โ”‚   โ””โ”€โ”€ context-overload-trace.json
โ””โ”€โ”€ docs/
    โ”œโ”€โ”€ ARCHITECTURE.md
    โ””โ”€โ”€ CONTRIBUTING.md

๐Ÿงช Testing

RAG Doctor has comprehensive test coverage across all packages.

Running tests

# Run all tests across the monorepo
pnpm test

# Watch mode (re-runs on file change)
pnpm test:watch

# Run tests for a single package
pnpm --filter @rag-doctor/parser test
pnpm --filter @rag-doctor/rules test
pnpm --filter @rag-doctor/core test
pnpm --filter @rag-doctor/reporters test
pnpm --filter rag-doctor test

Test coverage

Package Tests What's Covered
@rag-doctor/adapters 69 Format detection (canonical, event-trace, langchain, langsmith, unknown, priority), each adapter (valid input, missing fields, generated IDs, warnings, errors), integration with ingestion
@rag-doctor/parser 50 Valid traces, optional fields, all ParseError paths, chunk-level validation, score edge cases (Infinity, NaN, 0, 1), metadata passthrough
@rag-doctor/rules 114 Each rule: fires / does not fire, threshold boundaries, structured details fields, edge cases; rule factories, custom thresholds, RuleConfigurationError, packs (recommended, strict), ruleOptions overrides
@rag-doctor/core 43 Return shape, summary accuracy, custom rule injection, multi-rule aggregation, real rule scenarios; pack resolution, ruleOptions overrides, UnknownPackError, backward compatibility
@rag-doctor/ingestion 77 Valid/invalid traces, schema validation, normalization, structured errors
@rag-doctor/reporters 52 Header/structure, zero-findings message, severity labels, sort order, injected write
rag-doctor (CLI) 141 Argument parsing, help display, error messages, analyze/diagnose commands, --json flag, exit codes, --config/--format flags, adapter integration, auto-detection, pack resolution, regression
Total 525+

Test fixtures

Fixture JSON files live in tests/fixtures/:

File Category Triggers
valid-clean-trace.json Valid Nothing โ€” minimal clean trace (2 high-score chunks)
valid-basic-trace.json Valid Nothing โ€” clean full trace with all optional fields
valid-minimal-trace.json Valid Nothing โ€” one chunk, no optional fields
valid-low-score-trace.json Valid low-retrieval-score (HIGH) โ€” valid trace with low scores
valid-medium-score-trace.json Valid Nothing under default thresholds; low-retrieval-score (HIGH) under strict/tight config
broken-low-score-trace.json Valid low-retrieval-score (HIGH) โ€” avg score โ‰ˆ 0.22
broken-duplicate-trace.json Valid duplicate-chunks (MEDIUM) โ€” 3 identical chunks
context-overload-trace.json Valid context-overload (MEDIUM) โ€” 12 retrieved chunks
oversized-chunk-trace.json Valid oversized-chunk (LOW) โ€” one chunk > 1200 chars
multi-rule-trace.json Valid low-retrieval-score + duplicate-chunks + context-overload simultaneously
invalid-json.txt Invalid Parse error โ€” not valid JSON
invalid-schema.json Invalid Schema error โ€” valid JSON but wrong field names
invalid-missing-fields.json Invalid Schema error โ€” missing query and retrievedChunks
invalid-bad-score-type.json Invalid Schema error โ€” scores provided as strings instead of numbers
invalid-malformed-chunks.json Invalid Schema error โ€” chunks array contains primitives, nulls, nested arrays
config-recommended.json Config { "packs": ["recommended"] }
config-strict.json Config { "packs": ["strict"] }
config-tight-thresholds.json Config recommended pack + tight thresholds (triggers on medium-score trace)
config-unknown-pack.json Config Invalid โ€” references nonexistent pack
config-invalid-option.json Config Invalid โ€” maxChunkCount: 0 (must be positive)
config-invalid-json.json Config Invalid โ€” not valid JSON
config-not-object.json Config Invalid โ€” root value is an array, not an object
config-packs-not-array.json Config Invalid โ€” packs is a string instead of an array
event-trace-valid.json Adapter Valid event-trace format (auto-detected)
langchain-valid.json Adapter Valid LangChain format (auto-detected)
langsmith-valid.json Adapter Valid LangSmith format (auto-detected)
unknown-format.json Adapter Invalid โ€” unrecognized format, falls through to ingestion validation
malformed-langchain.json Adapter Invalid โ€” LangChain-shaped but with wrong field types
malformed-event-trace.json Adapter Invalid โ€” event-trace shaped but events is a string

CLI test architecture

CLI tests use two complementary layers:

  1. In-process unit tests (cli.test.ts) โ€” inject a CliIO interface to capture stdout/stderr and catch exit codes without spawning a subprocess. Fast, runs in milliseconds.
  2. Subprocess integration tests (cli.integration.test.ts) โ€” spawn the compiled dist/bin.js binary via spawnSync. Verifies real process exit codes, stdout/stderr streams, and shebang execution. Skipped automatically if dist/bin.js hasn't been built yet.

๐Ÿ› ๏ธ Development

Prerequisites

  • Node.js 18+
  • pnpm 9+

Setup

git clone https://github.com/your-org/rag-doctor.git
cd rag-doctor
pnpm install
pnpm build

Commands

pnpm build        # Build all packages
pnpm test         # Run all tests
pnpm dev          # Watch mode for all packages
pnpm lint         # Lint all packages
pnpm typecheck    # TypeScript type check

Running the CLI locally

pnpm build
node apps/cli/dist/bin.js analyze examples/basic-trace.json
node apps/cli/dist/bin.js analyze examples/low-score-trace.json
node apps/cli/dist/bin.js analyze examples/context-overload-trace.json --json

๐Ÿ”ฎ Future Integrations

RAG Doctor's modular design supports these planned integrations:

Integration Description
๐Ÿงฉ VS Code Extension Inline diagnostics while editing trace files in the editor
โš™๏ธ GitHub Action Fail CI when high-severity issues are detected
โ˜๏ธ Cloud Dashboard Aggregate diagnostics across production traces over time
๐Ÿ“ฃ Custom Reporters Output findings to Slack, PagerDuty, or Datadog
๐Ÿง  LangSmith / LlamaIndex adapters Ingest traces from popular RAG frameworks without manual conversion

๐Ÿค Contributing

Contributions are welcome! See docs/CONTRIBUTING.md for guidelines.


๐Ÿ“„ License

MIT ยฉ RAG Doctor Contributors

About

๐Ÿฉบ RAG Doctor โ€” Open-source diagnostic tool for Retrieval-Augmented Generation (RAG) systems. Analyzes codebases to detect architectural issues in LLM pipelines such as missing retrieval, bad chunking, embedding mismatches, and vector database misuse.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors