Zero-Token Pre-Analysis Layer — give any LLM instant codebase understanding
- The Problem
- The Solution
- Quick Start
- What It Extracts (4 Layers)
- Supported Languages
- CLI Usage
- MCP Integration
- Configuration
- Programmatic API
- Example Output
- Contributing
- License
LLMs waste 50,000–200,000 tokens exploring unfamiliar codebases. Typical workflows involve asking the model to read file trees, open individual files, trace imports, and re-derive architecture facts it will forget next session. Context packers ship raw source code. Knowledge graphs need infrastructure.
The result: slow, expensive, and inconsistent onboarding every time a new LLM session touches your codebase.
code-dna runs static analysis in under 5 seconds and produces a compact 5–10k token "DNA file" that gives any LLM architectural understanding — without reading source files.
The DNA file captures:
- The project's module structure and symbol inventory
- Architectural style, detected framework, and layer organisation
- Coding conventions derived from the actual codebase
- Hot files, risk scores, and dependency centrality
- Git churn data and ownership information
Give any LLM the DNA file as its first context document and it hits the ground running.
# Run once, output to stdout
npx code-dna analyze
# Save to a file (recommended)
npx code-dna analyze --output CODEBASE-DNA.md
# YAML output for programmatic consumption
npx code-dna analyze --format yaml --output CODEBASE-DNA.yaml
# Analyse a specific directory
npx code-dna analyze /path/to/project --output CODEBASE-DNA.mdcode-dna runs four analysis layers in sequence (Layers 1 and 2 execute in parallel):
Discovers all source files, parses them with Tree-sitter AST grammars, and builds:
- File tree with language and role annotations (
controller,service,model, etc.) - Module map — hierarchical directory structure with per-file symbol inventories
- Dependency graph — import/export edges with fan-in/fan-out metrics and circular dependency detection
- Symbol index — every exported function, class, interface, type, and variable
Queries the local git history to surface temporal patterns:
- Commit heatmap — files ranked by total commits
- Ownership map — primary author per file
- Co-change coupling — files that change together frequently (configurable window)
- Hot files — churn hotspots with commit counts and last-modified timestamps
Gracefully skipped when no git history is available.
Uses Layer 1 results to infer higher-level patterns without configuration:
- Framework detection — identifies Next.js, Express, FastAPI, Spring Boot, NestJS, and more from dependency manifests and file markers
- Architecture style — classifies projects as MVC, hexagonal, layered, event-driven, or monolith
- Naming conventions — detects camelCase, PascalCase, snake_case, kebab-case across files, functions, classes, and variables
- File organisation — by-feature, by-layer, by-type, or hybrid
- Import and export style — relative vs. aliased paths, named vs. default exports
Combines all previous layers to produce a risk-ranked file list:
- Centrality score — files with the highest in-degree (most imported)
- Churn score — correlation between frequency of change and dependency weight
- Coverage proxy — estimated test coverage based on co-located test files
- Composite risk score — 0–100 rank with per-factor breakdowns
| Language | Extensions | Support Tier |
|---|---|---|
| TypeScript | .ts, .tsx |
Full AST parsing |
| JavaScript | .js, .jsx, .mjs, .cjs |
Full AST parsing |
| Python | .py, .pyi |
Full AST parsing |
| Go | .go |
File discovery + framework detection |
| Rust | .rs |
File discovery + framework detection |
| Java | .java |
File discovery + framework detection |
| Vue | .vue |
File discovery + framework detection |
| C# | .cs |
File discovery + framework detection |
| Ruby | .rb |
File discovery + framework detection |
| Kotlin | .kt, .kts |
File discovery + framework detection |
| Swift | .swift |
File discovery + framework detection |
| PHP | .php |
File discovery + framework detection |
| C / C++ | .c, .h, .cpp, .cc, .cxx, .hpp |
File discovery + framework detection |
| Solidity | .sol |
Discovery only |
Run code-dna info to verify the languages and tiers detected by your installed version.
Run the full analysis pipeline and output DNA.
code-dna analyze [path] [options]Arguments:
| Argument | Description | Default |
|---|---|---|
path |
Directory to analyse | Current working directory |
Options:
| Flag | Description | Default |
|---|---|---|
-f, --format <format> |
Output format: md or yaml |
md |
-o, --output <file> |
Write output to file instead of stdout | stdout |
-l, --layers <layers> |
Comma-separated layers to run | 1,2,3,4 |
--languages <langs> |
Language filter, e.g. ts,py,go |
all languages |
--scope <dir> |
Scope analysis to a subdirectory | none |
--token-budget <n> |
Target token count for Markdown output | 8000 |
--git-depth <n> |
Maximum git commits to traverse | 1000 |
--no-git |
Skip git archaeology (disables Layer 2) | false |
-q, --quiet |
Suppress progress output | false |
Examples:
# Full analysis, Markdown output to stdout
code-dna analyze
# Save to file with YAML format
code-dna analyze . --format yaml --output CODEBASE-DNA.yaml
# Only structural skeleton, no git or risk analysis
code-dna analyze --layers 1,3
# Analyse only TypeScript and Python files
code-dna analyze --languages ts,py
# Scope to a single service in a monorepo
code-dna analyze --scope services/api --output services/api/DNA.md
# Large repo with tight token budget
code-dna analyze --token-budget 5000 --git-depth 500Compare two DNA YAML snapshots and produce a Markdown diff report.
code-dna diff before.yaml after.yaml
code-dna diff before.yaml after.yaml --output diff-report.mdThe diff report covers: files added/removed/modified, symbols added/removed, dependency graph changes, risk score movements, convention and framework shifts.
Start the code-dna MCP server over stdio for use with MCP-compatible clients.
code-dna mcp
code-dna mcp --path /path/to/project
code-dna mcp --path /path/to/project --watchSee MCP Integration for client configuration details.
Show version, Node.js version, platform, and supported languages with their tiers.
code-dna infocode-dna exposes its analysis pipeline as an MCP server, allowing LLM clients to query codebase DNA directly without running CLI commands.
# Start against current directory
code-dna mcp
# Start against a specific project
code-dna mcp --path /path/to/project
# Watch mode: auto-refresh cache on file changes
code-dna mcp --path /path/to/project --watchAdd code-dna to your .mcp.json (project-scoped) or your global Claude Code settings:
{
"mcpServers": {
"code-dna": {
"command": "npx",
"args": ["code-dna", "mcp", "--path", "/absolute/path/to/project", "--watch"]
}
}
}In Cursor settings, add a new MCP server:
{
"mcp": {
"servers": {
"code-dna": {
"command": "npx",
"args": ["code-dna", "mcp", "--path", "${workspaceFolder}", "--watch"]
}
}
}
}Once connected, clients can read these resources:
| URI | Content |
|---|---|
codedna://full |
Complete DNA Markdown output |
codedna://skeleton |
Architecture and Module Map sections |
codedna://dependencies |
Dependencies section |
codedna://conventions |
Conventions section |
codedna://risks |
Risk Surface and Hot Files sections |
codedna://hotfiles |
Hot Files section only |
| Tool | Description |
|---|---|
analyze |
Run analysis on a directory, update the cache, return full DNA |
diff |
Compute a structural diff between two DNA Markdown strings |
See docs/MCP.md for the full MCP reference including tool parameter schemas.
Create a .codedna.yaml file in your project root to customise analysis:
# Additional glob patterns to ignore (built-in ignores always apply)
ignore:
- "generated/**"
- "vendor/**"
- "*.pb.go"
# Toggle individual analysis layers
layers:
skeleton: true
git: true
patterns: true
risk: true
# Git archaeology settings
git:
max_commits: 1000
max_blame_files: 50
coupling_window: 30 # days
# Per-language overrides
languages:
python:
enabled: true
framework: "fastapi" # override auto-detection
solidity:
enabled: false # skip entirely
# Output preferences
output:
format: md
token_budget: 8000
filename: CODEBASE-DNA.md
sections:
architecture: 15
module_map: 25
dependencies: 15
conventions: 15
hot_files: 10
risk_surface: 10
api_surface: 5
# Monorepo: include/exclude sub-directories
scope:
include:
- "services/api"
- "packages/shared"
exclude:
- "packages/legacy"All fields are optional and fall back to sensible defaults.
code-dna can be used as a library from TypeScript or JavaScript:
npm install code-dnaimport { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';
// Run the full 4-layer analysis
const dna = await analyze('/path/to/project', {
layers: [1, 2, 3, 4],
tokenBudget: 8000,
});
// Render as Markdown (token-budget aware)
const markdown = formatMarkdown(dna, budget);
// Render as YAML (full data, no truncation)
const yaml = formatYaml(dna);See docs/API.md for the complete programmatic API reference.
The following is a truncated excerpt from code-dna analysing itself:
# Codebase DNA -- code-dna
> Generated by code-dna v0.1.0 on 2026-03-26.
> Languages: typescript (99%), javascript (1%) | Files: 101 | LOC: 35,864
## Architecture
**Style:** layered (85% confidence)
**Framework:** Node.js / Commander CLI
### Layers
- **cli** (3 files): entry point, MCP command
- **core** (8 files): engine, types, diff engine, token budget
- **analyzers** (6 files): git, framework, architecture, conventions, risk
- **parsers** (19 files): Tree-sitter extractors for 14 languages
- **output** (3 files): Markdown and YAML formatters
- **mcp** (2 files): MCP server
## Conventions
- **Files:** kebab-case
- **Functions:** camelCase
- **Classes:** PascalCase
- **Exports:** named
- **Imports:** external-first, relative paths
- **Tests:** co-located
## Risk Surface
| File | Score | Factors |
|------|-------|---------|
| src/core/engine.ts | 82 | high-centrality, high-churn |
| src/core/types.ts | 74 | high-centrality |
| src/parsers/parser-engine.ts | 65 | high-centrality |- Clone the repository and install dependencies:
npm install - Build:
npm run build - Run all tests:
npm test(1199 tests, Node.js 20+ required) - Lint:
npm run lint - Typecheck:
npm run typecheck
All code changes require tests written first (TDD). Commits follow Conventional Commits (feat(scope):, fix(scope):).
MIT