code-dna

Zero-Token Pre-Analysis Layer — give any LLM instant codebase understanding

The Problem

LLMs waste 50,000–200,000 tokens exploring unfamiliar codebases. Typical workflows involve asking the model to read file trees, open individual files, trace imports, and re-derive architecture facts it will forget next session. Context packers ship raw source code. Knowledge graphs need infrastructure.

The result: slow, expensive, and inconsistent onboarding every time a new LLM session touches your codebase.

The Solution

code-dna runs static analysis in under 5 seconds and produces a compact 5–10k token "DNA file" that gives any LLM architectural understanding — without reading source files.

The DNA file captures:

The project's module structure and symbol inventory
Architectural style, detected framework, and layer organisation
Coding conventions derived from the actual codebase
Hot files, risk scores, and dependency centrality
Git churn data and ownership information

Give any LLM the DNA file as its first context document and it hits the ground running.

Quick Start

# Run once, output to stdout
npx code-dna analyze

# Save to a file (recommended)
npx code-dna analyze --output CODEBASE-DNA.md

# YAML output for programmatic consumption
npx code-dna analyze --format yaml --output CODEBASE-DNA.yaml

# Analyse a specific directory
npx code-dna analyze /path/to/project --output CODEBASE-DNA.md

What It Extracts (4 Layers)

code-dna runs four analysis layers in sequence (Layers 1 and 2 execute in parallel):

Layer 1: Structural Skeleton

Discovers all source files, parses them with Tree-sitter AST grammars, and builds:

File tree with language and role annotations (controller, service, model, etc.)
Module map — hierarchical directory structure with per-file symbol inventories
Dependency graph — import/export edges with fan-in/fan-out metrics and circular dependency detection
Symbol index — every exported function, class, interface, type, and variable

Layer 2: Git Archaeology

Queries the local git history to surface temporal patterns:

Commit heatmap — files ranked by total commits
Ownership map — primary author per file
Co-change coupling — files that change together frequently (configurable window)
Hot files — churn hotspots with commit counts and last-modified timestamps

Gracefully skipped when no git history is available.

Layer 3: Pattern Inference

Uses Layer 1 results to infer higher-level patterns without configuration:

Framework detection — identifies Next.js, Express, FastAPI, Spring Boot, NestJS, and more from dependency manifests and file markers
Architecture style — classifies projects as MVC, hexagonal, layered, event-driven, or monolith
Naming conventions — detects camelCase, PascalCase, snake_case, kebab-case across files, functions, classes, and variables
File organisation — by-feature, by-layer, by-type, or hybrid
Import and export style — relative vs. aliased paths, named vs. default exports

Layer 4: Risk Surface

Combines all previous layers to produce a risk-ranked file list:

Centrality score — files with the highest in-degree (most imported)
Churn score — correlation between frequency of change and dependency weight
Coverage proxy — estimated test coverage based on co-located test files
Composite risk score — 0–100 rank with per-factor breakdowns

Supported Languages

Language	Extensions	Support Tier
TypeScript	`.ts`, `.tsx`	Full AST parsing
JavaScript	`.js`, `.jsx`, `.mjs`, `.cjs`	Full AST parsing
Python	`.py`, `.pyi`	Full AST parsing
Go	`.go`	File discovery + framework detection
Rust	`.rs`	File discovery + framework detection
Java	`.java`	File discovery + framework detection
Vue	`.vue`	File discovery + framework detection
C#	`.cs`	File discovery + framework detection
Ruby	`.rb`	File discovery + framework detection
Kotlin	`.kt`, `.kts`	File discovery + framework detection
Swift	`.swift`	File discovery + framework detection
PHP	`.php`	File discovery + framework detection
C / C++	`.c`, `.h`, `.cpp`, `.cc`, `.cxx`, `.hpp`	File discovery + framework detection
Solidity	`.sol`	Discovery only

Run code-dna info to verify the languages and tiers detected by your installed version.

CLI Usage

`analyze [path]`

Run the full analysis pipeline and output DNA.

code-dna analyze [path] [options]

Arguments:

Argument	Description	Default
`path`	Directory to analyse	Current working directory

Options:

Flag	Description	Default
`-f, --format <format>`	Output format: `md` or `yaml`	`md`
`-o, --output <file>`	Write output to file instead of stdout	stdout
`-l, --layers <layers>`	Comma-separated layers to run	`1,2,3,4`
`--languages <langs>`	Language filter, e.g. `ts,py,go`	all languages
`--scope <dir>`	Scope analysis to a subdirectory	none
`--token-budget <n>`	Target token count for Markdown output	`8000`
`--git-depth <n>`	Maximum git commits to traverse	`1000`
`--no-git`	Skip git archaeology (disables Layer 2)	false
`-q, --quiet`	Suppress progress output	false

Examples:

# Full analysis, Markdown output to stdout
code-dna analyze

# Save to file with YAML format
code-dna analyze . --format yaml --output CODEBASE-DNA.yaml

# Only structural skeleton, no git or risk analysis
code-dna analyze --layers 1,3

# Analyse only TypeScript and Python files
code-dna analyze --languages ts,py

# Scope to a single service in a monorepo
code-dna analyze --scope services/api --output services/api/DNA.md

# Large repo with tight token budget
code-dna analyze --token-budget 5000 --git-depth 500

`diff <dna-a> <dna-b>`

Compare two DNA YAML snapshots and produce a Markdown diff report.

code-dna diff before.yaml after.yaml
code-dna diff before.yaml after.yaml --output diff-report.md

The diff report covers: files added/removed/modified, symbols added/removed, dependency graph changes, risk score movements, convention and framework shifts.

`mcp`

Start the code-dna MCP server over stdio for use with MCP-compatible clients.

code-dna mcp
code-dna mcp --path /path/to/project
code-dna mcp --path /path/to/project --watch

See MCP Integration for client configuration details.

`info`

Show version, Node.js version, platform, and supported languages with their tiers.

code-dna info

MCP Integration

code-dna exposes its analysis pipeline as an MCP server, allowing LLM clients to query codebase DNA directly without running CLI commands.

Starting the Server

# Start against current directory
code-dna mcp

# Start against a specific project
code-dna mcp --path /path/to/project

# Watch mode: auto-refresh cache on file changes
code-dna mcp --path /path/to/project --watch

Claude Code Configuration

Add code-dna to your .mcp.json (project-scoped) or your global Claude Code settings:

{
  "mcpServers": {
    "code-dna": {
      "command": "npx",
      "args": ["code-dna", "mcp", "--path", "/absolute/path/to/project", "--watch"]
    }
  }
}

Cursor Configuration

In Cursor settings, add a new MCP server:

{
  "mcp": {
    "servers": {
      "code-dna": {
        "command": "npx",
        "args": ["code-dna", "mcp", "--path", "${workspaceFolder}", "--watch"]
      }
    }
  }
}

Available MCP Resources

Once connected, clients can read these resources:

URI	Content
`codedna://full`	Complete DNA Markdown output
`codedna://skeleton`	Architecture and Module Map sections
`codedna://dependencies`	Dependencies section
`codedna://conventions`	Conventions section
`codedna://risks`	Risk Surface and Hot Files sections
`codedna://hotfiles`	Hot Files section only

Available MCP Tools

Tool	Description
`analyze`	Run analysis on a directory, update the cache, return full DNA
`diff`	Compute a structural diff between two DNA Markdown strings

See docs/MCP.md for the full MCP reference including tool parameter schemas.

Configuration

Create a .codedna.yaml file in your project root to customise analysis:

# Additional glob patterns to ignore (built-in ignores always apply)
ignore:
  - "generated/**"
  - "vendor/**"
  - "*.pb.go"

# Toggle individual analysis layers
layers:
  skeleton: true
  git: true
  patterns: true
  risk: true

# Git archaeology settings
git:
  max_commits: 1000
  max_blame_files: 50
  coupling_window: 30   # days

# Per-language overrides
languages:
  python:
    enabled: true
    framework: "fastapi"   # override auto-detection
  solidity:
    enabled: false         # skip entirely

# Output preferences
output:
  format: md
  token_budget: 8000
  filename: CODEBASE-DNA.md
  sections:
    architecture: 15
    module_map: 25
    dependencies: 15
    conventions: 15
    hot_files: 10
    risk_surface: 10
    api_surface: 5

# Monorepo: include/exclude sub-directories
scope:
  include:
    - "services/api"
    - "packages/shared"
  exclude:
    - "packages/legacy"

All fields are optional and fall back to sensible defaults.

Programmatic API

code-dna can be used as a library from TypeScript or JavaScript:

npm install code-dna

import { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';

// Run the full 4-layer analysis
const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  tokenBudget: 8000,
});

// Render as Markdown (token-budget aware)
const markdown = formatMarkdown(dna, budget);

// Render as YAML (full data, no truncation)
const yaml = formatYaml(dna);

See docs/API.md for the complete programmatic API reference.

Example Output

The following is a truncated excerpt from code-dna analysing itself:

# Codebase DNA -- code-dna

> Generated by code-dna v0.1.0 on 2026-03-26.
> Languages: typescript (99%), javascript (1%) | Files: 101 | LOC: 35,864

## Architecture

**Style:** layered (85% confidence)
**Framework:** Node.js / Commander CLI

### Layers
- **cli** (3 files): entry point, MCP command
- **core** (8 files): engine, types, diff engine, token budget
- **analyzers** (6 files): git, framework, architecture, conventions, risk
- **parsers** (19 files): Tree-sitter extractors for 14 languages
- **output** (3 files): Markdown and YAML formatters
- **mcp** (2 files): MCP server

## Conventions

- **Files:** kebab-case
- **Functions:** camelCase
- **Classes:** PascalCase
- **Exports:** named
- **Imports:** external-first, relative paths
- **Tests:** co-located

## Risk Surface

| File | Score | Factors |
|------|-------|---------|
| src/core/engine.ts | 82 | high-centrality, high-churn |
| src/core/types.ts | 74 | high-centrality |
| src/parsers/parser-engine.ts | 65 | high-centrality |

Contributing

Clone the repository and install dependencies: npm install
Build: npm run build
Run all tests: npm test (1199 tests, Node.js 20+ required)
Lint: npm run lint
Typecheck: npm run typecheck

All code changes require tests written first (TDD). Commits follow Conventional Commits (feat(scope):, fix(scope):).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CODEBASE-DNA.md		CODEBASE-DNA.md
CODEBASE-DNA.yaml		CODEBASE-DNA.yaml
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.bench.config.ts		vitest.bench.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-dna

Table of Contents

The Problem

The Solution

Quick Start

What It Extracts (4 Layers)

Layer 1: Structural Skeleton

Layer 2: Git Archaeology

Layer 3: Pattern Inference

Layer 4: Risk Surface

Supported Languages

CLI Usage

`analyze [path]`

`diff <dna-a> <dna-b>`

`mcp`

`info`

MCP Integration

Starting the Server

Claude Code Configuration

Cursor Configuration

Available MCP Resources

Available MCP Tools

Configuration

Programmatic API

Example Output

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

code-dna

Table of Contents

The Problem

The Solution

Quick Start

What It Extracts (4 Layers)

Layer 1: Structural Skeleton

Layer 2: Git Archaeology

Layer 3: Pattern Inference

Layer 4: Risk Surface

Supported Languages

CLI Usage

analyze [path]

diff <dna-a> <dna-b>

mcp

info

MCP Integration

Starting the Server

Claude Code Configuration

Cursor Configuration

Available MCP Resources

Available MCP Tools

Configuration

Programmatic API

Example Output

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`analyze [path]`

`diff <dna-a> <dna-b>`

`mcp`

`info`

Packages