Universal AI assistance logging with transparency.
A flexible framework for capturing, analyzing, and sharing AI agent interactions while maintaining privacy and auditability. Supporting multiple AI agents and workflows, this tool helps organizations track and understand their AI usage patterns.
- Why this exists
- Architecture Overview
- Current capabilities
- Getting started
- CLI Commands
- Documentation
- Troubleshooting
- Roadmap snapshot
- Contributing
Codex (and other AI coding tools) produce rich session logs, but they're hard to read and even harder to share responsibly. The AI Log Trail tool ingests *.json or *.jsonl session data, normalizes it into SQLite, and generates human-friendly reports that highlight:
- User prompts and the agent responses/actions.
- Token usage and cost indicators.
- Function calls, reasoning trails, and decision history.
- Redactions applied to sensitive content so transparency doesn't compromise privacy.
The goal is a workflow where AI-assisted coding can be audited, explained, and optionally published in repositories or release notes.
The AI Log Trail follows a simple 5-stage pipeline:
Session Files (JSONL)
↓
[Parser] ← discover, load, validate, group by user message
↓
[Ingest] ← normalize, sanitize, persist to SQLite
↓
[Redaction] ← apply rules & manual redactions
↓
[Export] ← generate reports, output redacted data
Key components:
- Parser (
src/parsers/) — Discovers nested session directories, loads JSONL events, groups by user prompt - Ingest (
src/services/ingest.py) — Validates, sanitizes, applies rule-based redactions, persists to SQLite in a transaction - Database (
src/services/database.py) — 12 normalized tables with cascading FK, audit trail viaraw_jsoncolumns - Redactions (
src/services/redactions.py,src/services/redaction_rules.py) — Rule library, application tracking, deduplication via fingerprints - CLI (
cli/) — Entry points: ingest, group (display), export, rules management, migration
Learn more: docs/architecture.md — Full system design, data flows, error handling.
- Structured ingest - Parse Codex session directories into tables (
files,sessions,prompts,token_messages,turn_context_messages,agent_reasoning_messages,function_plan_messages,function_calls) with raw JSON preserved. - Redaction storage -
redactionstable tracks prompt/field/global scopes with replacement text, actor, reason, and timestamps for provenance. - Rule-based redaction - YAML/JSON rule file (
user/redactions.ymlseeded with defaults for emails, tokens, paths, troubleshooting snippets, and[redact ...]markers) applied in file order with per-rule counts in summaries; manual DB redactions still take precedence. - CLI utilities
python -m cli.group_sessiongroups events under each prompt for quick console or file review and writes to[outputs].reports_dirby default.python -m cli.ingest_sessioningests one or many sessions into SQLite with--limit,--debug, and--verbosemodes using the configured database path.
- Governance docs -
AGENTS.mdsets behavioral guardrails;ROADMAP.mdtracks milestones through v1.0.0 and beyond. - Config scaffolding -
user/config.example.tomlseeds per-user setup; actual secrets stay local via.gitignore. - Migration docs -
docs/migration.mdexplains SQLite to Postgres migration, dry-run, and rollback steps. - Tests - Organized under
tests/by area:tests/services/(config, redactions, rules, DB helpers)tests/parsers/(session parsing, DB handlers)tests/cli/(ingest, CLI scripts, migration)tests/core/(models, base types, agent config)
Requires Python 3.12+
-
Clone & configure
git clone <repo-url> cd AI-Log-Trail cp user/config.example.toml user/config.toml # edit user/config.toml to set: # [sessions].root -> Codex/Copilot logs directory # [ingest].db_path -> SQLite destination # [outputs].reports_dir -> where grouped reports should be written
Optional tuning: set
[ingest].batch_sizeinuser/config.tomlif you want a larger or smaller event batch during ingest (default is 1000). -
Ingest a sample
python -m cli.ingest_session --debug
This ingests the first two sessions, logs verbose output, and writes to
[ingest].db_path. -
Explore prompts
python -m cli.group_session
Generates a grouped text report for the earliest session, stored under
[outputs].reports_dir(override with-o).
The tool provides 5 main CLI commands. See docs/cli.md for full reference and examples.
| Command | Purpose | Example |
|---|---|---|
ingest_session |
Load session logs into SQLite | python -m cli.ingest_session --debug |
group_session |
Display prompts & events | python -m cli.group_session --list |
export_session |
Export redacted session data | python -m cli.export_session --format csv |
redaction_rules |
Manage redaction library | python -m cli.redaction_rules list --source yaml |
migrate_sqlite_to_postgres |
Scale to PostgreSQL | python -m cli.migrate_sqlite_to_postgres --dry-run |
| Document | Purpose |
|---|---|
docs/architecture.md |
System design: components, data flow, pipelines, algorithms |
docs/schema.md |
Database schema: table definitions, relationships, indexes |
docs/cli.md |
CLI reference: all commands, options, examples, workflows |
docs/schema_changes.md |
Migration history: schema evolution, backward compatibility |
docs/redaction_rules.md |
Redaction authoring: rule syntax, ordering, examples |
docs/migration.md |
SQLite → Postgres migration: steps, dry-run, rollback |
AGENTS.md |
Development: coding standards, testing, security, versioning |
ROADMAP.md |
Feature roadmap: phases, milestones, priorities |
- Ensure
[sessions].rootinuser/config.tomlpoints to an existing directory - See
docs/cli.md#troubleshootingfor detailed steps
- Verify Codex logs are in the configured directory under
YYYY/MM/DD/structure - Run
ls -la /path/to/sessions/2025/to check
- Session JSONL file is malformed; inspect with
python -m json.tool - Try a different session file or repair JSONL before ingesting
- Increase
[ingest].batch_sizeinuser/config.toml(default 1000) - Run with
--limit 1to process fewer files at once
For more details, see docs/cli.md#troubleshooting.
Refer to ROADMAP.md for the full plan. Highlights for v1.0.0:
- Schema migrations and automated tests.
- Redaction system (manual + rule-based) with CLI and UI controls.
- Streamlit review app for prompt/action browsing and export.
- Markdown/CSV transparency reports filtered by repo, date, or session.
- Pipx-friendly packaging and documentation.
Beyond v1.0.0 we're targeting tagging, audit trails, API integrations, VS Code extensions, and compliance-ready exports.
- Architecture - Components are wired manually; no dependency injection framework is in place yet. Larger deployments should plan for DI or service registries before extending the tool.
- Session paths - Ingest expects Codex logs under
~/.codex/sessions/<year>/<month>/<day>/file.jsonl(or the Windows equivalent). Symlinks and junctions must preserve this structure and point to readable directories; atypical mount points are not traversed automatically. - Memory profile - JSON payloads are read as-is with no max size enforcement. Very large sessions can exhaust memory; split oversized logs before ingesting or ingest them incrementally.
- Concurrency - SQLite writes run in a single process and rely on SQLite's default locking. Running multiple ingests against the same database concurrently is unsupported and may deadlock.
- Encoding - All file I/O assumes UTF-8. Convert logs encoded differently before processing.
- Timestamps - Session timestamps are stored verbatim. Downstream analytics should normalize timezones explicitly (e.g., convert to UTC) to avoid skew.
These constraints will be revisited as part of resilience and scaling work.
This is an "AI-assisted" project-experiments will happen-but the mandate is transparency:
- Every commit notes AI assistance.
- Raw logs remain user-owned; ingest only reads from configured paths.
- Redactions are first-class citizens with provenance.
Ideas, bug reports, and questions are welcome. Please review CONTRIBUTING.md for expectations before contributing.
This project is licensed under the terms of the MIT License.