Last updated: 2026-02-21 (v0.2.6)
This document is a comprehensive reference for AI assistants working on the TokenHub codebase. It covers architecture, build system, data flow, debugging, and known gotchas.
TokenHub is an LLM routing gateway written in Go. It sits between clients and multiple LLM providers (OpenAI, Anthropic, vLLM, etc.), providing:
- Intelligent model selection via a multi-objective scoring engine
- Provider failover with health tracking and cooldown
- OpenAI-compatible
/v1/chat/completionsAPI (translates Anthropic responses) - Encrypted credential vault (AES-256-GCM + Argon2id)
- API key management with scoping, budgets, and rotation enforcement
- Thompson Sampling (contextual bandit) for adaptive routing
- Multi-model orchestration (adversarial, vote, refine modes)
- Full observability: Prometheus, embedded TSDB, SSE event stream, request logs
- Admin dashboard (single-file HTML with Cytoscape.js + D3.js)
- Optional Temporal workflow engine for durable orchestration
tokenhub/
├── cmd/
│ ├── tokenhub/main.go # Server binary entry point
│ └── tokenhubctl/main.go # CLI admin tool
├── internal/
│ ├── app/
│ │ ├── config.go # Env-var config loading (TOKENHUB_*)
│ │ └── server.go # Server init, provider registration, background loops
│ ├── apikey/
│ │ ├── manager.go # API key CRUD (bcrypt hashing, prefix matching)
│ │ ├── budget.go # Monthly spend budget enforcement
│ │ └── middleware.go # HTTP auth middleware (Bearer token validation)
│ ├── circuitbreaker/
│ │ └── breaker.go # Circuit breaker for Temporal dispatch
│ ├── events/
│ │ └── bus.go # In-memory pub/sub for SSE streaming
│ ├── health/
│ │ ├── tracker.go # Per-provider health state machine (healthy/degraded/down)
│ │ └── prober.go # Background HTTP health probe goroutine
│ ├── httpapi/
│ │ ├── routes.go # Chi router mount, admin auth middleware, Dependencies struct
│ │ ├── observe.go # recordObservability() — central sink for all 6 o13y backends
│ │ ├── handlers_chat.go # POST /v1/chat (native TokenHub format)
│ │ ├── handlers_openai.go # POST /v1/chat/completions (OpenAI-compatible)
│ │ ├── handlers_plan.go # POST /v1/plan (multi-model orchestration)
│ │ ├── handlers_admin.go # CRUD for providers, models, vault, routing config
│ │ ├── handlers_apikeys.go # CRUD for API keys
│ │ ├── handlers_events.go # GET /admin/v1/events (SSE stream)
│ │ ├── handlers_stats.go # GET /admin/v1/stats (rolling window aggregates)
│ │ ├── handlers_tsdb.go # TSDB query/metrics/prune/retention endpoints
│ │ ├── handlers_workflows.go # Temporal workflow visibility endpoints
│ │ └── handlers_extended.go # Simulate, discover, rewards, audit, logs handlers
│ ├── idempotency/
│ │ ├── cache.go # In-memory TTL cache for idempotency keys
│ │ └── middleware.go # X-Idempotency-Key middleware
│ ├── logging/
│ │ └── logging.go # slog setup with dynamic level
│ ├── metrics/
│ │ └── metrics.go # Prometheus counters/histograms registry
│ ├── providers/
│ │ ├── http.go # DoRequest/DoStreamRequest shared HTTP helpers
│ │ ├── contract.go # StatusError type
│ │ ├── context.go # Request ID context propagation
│ │ ├── openai/adapter.go # OpenAI adapter (Sender + StreamSender + Describer)
│ │ ├── anthropic/adapter.go # Anthropic adapter (translates response format)
│ │ └── vllm/adapter.go # vLLM adapter (round-robin endpoints, optional auth)
│ ├── ratelimit/
│ │ └── ratelimit.go # Per-IP token bucket rate limiter
│ ├── router/
│ │ ├── engine.go # Core routing engine (1121 lines) — model selection, failover
│ │ ├── types.go # Request, Message, Policy, Decision, Model, etc.
│ │ ├── directives.go # In-band routing directive parsing from messages
│ │ ├── thompson.go # Thompson Sampling bandit policy
│ │ ├── thompson_refresh.go # Background refresh loop for bandit parameters
│ │ ├── rewards.go # Reward computation for bandit feedback
│ │ ├── format.go # Output format shaping (JSON, markdown, strip think)
│ │ └── schema.go # JSON Schema extraction/validation
│ ├── stats/
│ │ └── collector.go # In-memory rolling-window stats (1m, 5m, 1h, 24h)
│ ├── store/
│ │ ├── store.go # Store interface + domain types (ModelRecord, etc.)
│ │ └── sqlite.go # SQLite implementation (modernc.org/sqlite, pure Go)
│ ├── temporal/
│ │ ├── types.go # Workflow/activity input/output structs
│ │ ├── workflows.go # ChatWorkflow, OrchestrationWorkflow, StreamLogWorkflow
│ │ ├── activities.go # SendToProvider, LogResult, ClassifyAndEscalate, etc.
│ │ └── manager.go # Temporal client/worker lifecycle
│ ├── tracing/
│ │ └── tracing.go # OpenTelemetry OTLP/HTTP setup
│ ├── tsdb/
│ │ └── tsdb.go # Embedded SQLite-backed time-series DB
│ └── vault/
│ └── vault.go # AES-256-GCM encrypted key-value store
├── web/
│ ├── index.html # Admin dashboard (~1240 lines, self-contained)
│ ├── cytoscape.min.js # Topology graph library
│ └── d3.min.js # Charting library
├── web.go # go:embed web/* for the admin UI
├── config/config.example.yaml # Example config
├── scripts/
│ ├── release.sh # Semantic version tagging + GHCR push
│ ├── setup-docker.sh # Fix macOS Docker symlink issues
│ └── backup.sh # SQLite backup helper
├── tests/
│ ├── integration.sh # Integration test runner
│ ├── e2e-temporal.sh # Temporal end-to-end test
│ └── mock-provider.conf # nginx config for mock vLLM
├── k8s/ # Kubernetes manifests
├── deploy/prometheus-alerts.yml # Prometheus alerting rules
├── docs/src/ # mdBook documentation source
├── Makefile # Primary build interface
├── Dockerfile # Multi-stage production image (Alpine 3.21)
├── Dockerfile.dev # Builder image with Go + mdbook + golangci-lint
├── docker-compose.yaml # Local dev stack (tokenhub + Temporal + mock vLLM)
├── .gitignore # Includes bootstrap.local (legacy, superseded by ~/.tokenhub/credentials)
└── CLAUDE.md # AI assistant workflow instructions
All interactions go through the Makefile. The build runs inside a Docker container (tokenhub-builder) to ensure reproducibility.
| Command | What It Does |
|---|---|
make build |
Compiles bin/tokenhub and bin/tokenhubctl inside the builder container |
make install |
Builds natively on the host and installs to ~/.local/bin (requires Go 1.24+) |
make package |
Builds the production Docker image (tokenhub:$(VERSION) + tokenhub:latest) |
make run |
Builds image, starts compose stack, tails logs |
make start |
Start the service (no rebuild) |
make stop |
Stop the service |
make restart |
Stop and start the service |
make logs |
Tail service logs |
make test |
Runs go test ./... inside the builder container |
make test-race |
Tests with race detector |
make test-coverage |
Tests with coverage profile |
make lint |
golangci-lint |
make docs |
Builds mdBook documentation in docs/book/ |
make release |
Bumps patch version, tags, builds, pushes to GHCR |
make release-minor |
Bumps minor version |
make clean |
Removes bin/, docs/book/, coverage.out |
VERSION is derived from git describe --tags --always --dirty. Release tags follow semver (v0.2.6). The scripts/release.sh script enforces a clean working tree.
- Build stage:
golang:1.24-alpine— compiles binary, builds docs with mdbook - Runtime stage:
alpine:3.21— runs as non-roottokenhubuser - Uses
modernc.org/sqlite(pure Go, no CGO) soCGO_ENABLED=0works - Published to
ghcr.io/jordanhubbard/tokenhub:{version}and:latest
All config is via environment variables (TOKENHUB_*). See internal/app/config.go for the full list with defaults.
| Variable | Default | Purpose |
|---|---|---|
TOKENHUB_LISTEN_ADDR |
:8080 |
HTTP listen address |
TOKENHUB_DB_DSN |
file:/data/tokenhub.sqlite |
SQLite database path |
TOKENHUB_ADMIN_TOKEN |
(auto-generated if empty) | Bearer token for /admin/v1/* endpoints |
TOKENHUB_VAULT_ENABLED |
true |
Enable encrypted credential vault |
TOKENHUB_VAULT_PASSWORD |
— | Auto-unlock vault at startup (headless/automated mode) |
TOKENHUB_CREDENTIALS_FILE |
~/.tokenhub/credentials |
Declarative JSON file with providers/models; persisted to DB on load (must be mode 0600) |
TOKENHUB_TEMPORAL_ENABLED |
false |
Enable Temporal workflow engine |
TOKENHUB_OTEL_ENABLED |
false |
Enable OpenTelemetry tracing |
TOKENHUB_RATE_LIMIT_RPS |
60 |
Per-IP rate limit (applied to /v1/* only) |
TOKENHUB_PROVIDER_TIMEOUT_SECS |
30 |
HTTP timeout for provider requests |
Send SIGHUP to reload rate limits, routing defaults, and log level without restart.
Client
│
├─ POST /v1/chat/completions (OpenAI-compatible)
├─ POST /v1/chat (native TokenHub format)
└─ POST /v1/plan (multi-model orchestration)
│
├─ API key auth middleware (apikey.AuthMiddleware)
├─ Rate limiting (ratelimit.Limiter.Middleware)
├─ Idempotency check (idempotency.Middleware)
│
├─ [Optional] Temporal dispatch (if enabled + circuit breaker closed)
│ └─ ChatWorkflow → SendToProvider activity → LogResult activity
│
└─ Direct engine path:
├─ engine.RouteAndSend() or engine.RouteAndStream()
│ ├─ Parse in-band directives from messages
│ ├─ Score eligible models (cost, weight, latency, failure rate)
│ ├─ Thompson Sampling (if mode=thompson)
│ ├─ Select top model, call adapter.Send()
│ ├─ On failure: classify error, failover/escalate/retry
│ └─ Return Decision + ProviderResponse
│
├─ extractUsage(resp) → parse actual tokens from provider response
├─ computeActualCost() → replace estimate with real token-based cost
│
└─ recordObservability() → fans out to 6 sinks:
├─ Prometheus (tokenhub_requests_total, _latency_ms, _cost_usd_total, _tokens_total)
├─ Store (request_logs table + reward_logs table)
├─ EventBus → SSE stream (route_success/route_error events)
├─ Stats collector (in-memory rolling windows: 1m, 5m, 1h, 24h)
├─ TSDB (embedded SQLite: latency, cost, tokens time-series)
└─ Budget cache invalidation
The central function is recordObservability() in internal/httpapi/observe.go. Every successful or failed request passes through it.
| Field | Source | Sinks |
|---|---|---|
InputTokens / OutputTokens |
Parsed from provider response via extractUsage() |
All 6 sinks |
CostUSD |
computeActualCost() using actual tokens + per-1K rates |
All 6 sinks |
LatencyMs |
Wall-clock time.Since(start) |
All 6 sinks |
ModelID / ProviderID |
From router.Decision |
All 6 sinks |
Success / ErrorClass |
Determined by handler | All 6 sinks |
RequestID / APIKeyID |
From middleware context | Store only |
extractUsage() in observe.go handles two response formats:
- OpenAI:
usage.prompt_tokens/usage.completion_tokens - Anthropic:
usage.input_tokens/usage.output_tokens
Streaming responses don't include usage blocks, so tokens remain 0 for streamed requests. Cost falls back to the pre-flight estimate in that case.
The TSDB buffers writes and flushes every 30 seconds (or when the buffer hits 100 points). Newly written metrics won't appear in /admin/v1/tsdb/query until the next flush.
SQLite via modernc.org/sqlite (pure Go, no CGO). Schema in internal/store/sqlite.go Migrate().
models— Model configuration (id, provider_id, weight, pricing, enabled)providers— Provider configuration (id, type, base_url, cred_store)request_logs— Per-request audit trail (tokens, cost, latency, status)reward_logs— Thompson Sampling feedback dataaudit_logs— Admin mutation audit trailapi_keys— API key records (bcrypt hashes, scopes, budgets)vault_blob— Encrypted vault data (single row, id=1)routing_config— Persisted routing policy defaults (single row, id=1)tsdb_points— Time-series data (ts, metric, model_id, provider_id, value)
New columns are added via idempotent ALTER TABLE migrations in Migrate(). Each migration checks pragma_table_info() before adding the column. To add a new column:
alterMigrations := []struct{ table, column, ddl string }{
{"request_logs", "new_column", "ALTER TABLE request_logs ADD COLUMN new_column TYPE NOT NULL DEFAULT value"},
}Single HTML file at web/index.html (~1240 lines). Embedded via go:embed web/* in web.go and served at /admin/.
- Topology graph (Cytoscape.js): Shows provider→model edges, animated on SSE events
- Trend charts (D3.js): Cost, latency, and tokens over time from TSDB data
- Overview stat cards: Requests, Tokens, Cost, Avg Latency, Errors — seeded from
/admin/v1/statson load, then incremented by SSE events - Model Leaderboard: Rolling-window stats per model (prefers 24h > 1h > 5m > 1m)
- Setup Wizard: Multi-step flow for adding providers (type → endpoint → credentials → test → discover models)
- Edit modals: For providers and models (PATCH endpoints)
- What-If Simulator: Tests routing decisions without sending requests
- SSE Decision Feed: Real-time event stream with latency, cost, tokens, reason
- Request Log: Paginated historical request log from the database
- Vault controls: Setup/unlock/lock/rotate UI
HTML is served with no-cache, must-revalidate. Static assets (cytoscape.min.js, d3.min.js) get ?v={hash} query params computed from the HTML content hash.
All adapters implement router.Sender (and optionally router.StreamSender, router.Describer).
| Adapter | Package | Auth | Health Endpoint |
|---|---|---|---|
| OpenAI | providers/openai |
Authorization: Bearer {key} |
{base}/v1/models |
| Anthropic | providers/anthropic |
x-api-key: {key} + anthropic-version |
{base}/v1/messages (405 = healthy) |
| vLLM | providers/vllm |
Optional Bearer key | {endpoint}/health |
registerProviderAdapter() in handlers_admin.go constructs and registers adapters at runtime when providers are created/updated via the API. The engine's RegisterAdapter() is idempotent (replaces existing adapter with same ID).
Caveat: The health prober is initialized once at startup with the adapter list at that time. Dynamically registered adapters won't be probed until restart. The health tracker still receives success/failure signals from actual request routing, so it's not completely blind.
internal/router/engine.go (1121 lines) is the core.
- Filter eligible models (enabled, adapter exists, not in cooldown, within context/budget limits)
- For each model, compute a multi-objective score:
normCost— normalized estimated cost (lower is better)normWeight— normalized configured weight (higher is better)normLatency— normalized historical avg latency (lower is better)normFailure— normalized failure rate (lower is better)
- Apply mode-specific weights to produce final score:
cheap: cost=0.7, weight=0.1, latency=0.1, failure=0.1normal: cost=0.25, weight=0.25, latency=0.25, failure=0.25high_confidence: cost=0.05, weight=0.7, latency=0.1, failure=0.15planning: cost=0.1, weight=0.6, latency=0.1, failure=0.2thompson: Uses Thompson Sampling (Beta distribution draws)
- Sort by score descending, attempt top model first
- On failure: classify error (rate_limited, transient, context_overflow, fatal), failover to next model
- adversarial: Send to primary model, then critique/refine with review model
- vote: Send to N models in parallel, use a judge model to pick best response
- refine: Iterative self-refinement with the same model
// Provider adapter (must implement)
type Sender interface {
ID() string
Send(ctx context.Context, model string, req Request) (ProviderResponse, error)
ClassifyError(err error) *ClassifiedError
}
// Optional: streaming support
type StreamSender interface {
Sender
SendStream(ctx context.Context, model string, req Request) (io.ReadCloser, error)
}
// Optional: metadata for admin UI
type Describer interface {
HealthEndpoint() string
}
// Persistence layer
type Store interface {
// Models, Providers, RequestLogs, Vault, Routing, Audit, Rewards, API Keys, Log retention
Migrate(ctx context.Context) error
Close() error
}| Method | Path | Purpose |
|---|---|---|
| POST | /v1/chat |
Native TokenHub chat (policy hints, orchestration) |
| POST | /v1/chat/completions |
OpenAI-compatible chat completions |
| GET | /v1/models |
OpenAI-compatible model listing |
| POST | /v1/plan |
Multi-model orchestration |
| Method | Path | Purpose |
|---|---|---|
| GET/POST | /admin/v1/providers |
List / upsert providers |
| PATCH/DELETE | /admin/v1/providers/{id} |
Update / delete provider |
| GET/POST | /admin/v1/models |
List / upsert models |
| PATCH/DELETE | /admin/v1/models/* |
Update / delete model (wildcard for IDs with /) |
| POST | /admin/v1/vault/unlock |
Unlock vault with master password |
| POST | /admin/v1/vault/lock |
Lock vault |
| POST | /admin/v1/vault/rotate |
Rotate vault master password |
| GET/PUT | /admin/v1/routing-config |
Get / set routing policy defaults |
| GET | /admin/v1/health |
Provider health status |
| GET | /admin/v1/stats |
Rolling-window stats (1m/5m/1h/24h) |
| GET | /admin/v1/logs |
Request log (paginated) |
| GET | /admin/v1/audit |
Audit log (paginated) |
| GET | /admin/v1/rewards |
Reward log (paginated) |
| GET | /admin/v1/engine/models |
Runtime engine state (models + adapter_info) |
| POST | /admin/v1/routing/simulate |
What-If routing simulation |
| GET | /admin/v1/providers/{id}/discover |
Discover models from provider endpoint |
| GET | /admin/v1/events |
SSE event stream |
| GET/POST/PUT | /admin/v1/tsdb/* |
TSDB query, metrics list, prune, retention |
| GET/POST/PATCH/DELETE | /admin/v1/apikeys[/{id}] |
API key CRUD |
| GET | /admin/v1/workflows[/{id}] |
Temporal workflow visibility |
| Method | Path | Purpose |
|---|---|---|
| GET | /healthz |
Health check (checks adapter + model count) |
| GET | /metrics |
Prometheus metrics |
| GET | /admin/ |
Admin dashboard UI |
| GET | /docs/ |
mdBook documentation |
| GET | / |
Redirects to /admin/ |
Model IDs can contain slashes (e.g., Qwen/Qwen2.5-Coder-32B-Instruct). The PATCH and DELETE model routes use Chi's wildcard * parameter (not {id}) to capture the full path. The wildcardID() helper in handlers_admin.go extracts the ID by trimming the leading /. Do not use encodeURIComponent() for model IDs in URLs — send the literal slashes.
# Create an API key
curl -X POST http://localhost:8080/admin/v1/apikeys \
-H 'Content-Type: application/json' \
-d '{"name":"test","scopes":"[\"chat\"]"}'
# Send a request (OpenAI-compatible)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'# Request log (most recent)
curl http://localhost:8080/admin/v1/logs?limit=1
# Stats (all rolling windows, includes token counts)
curl http://localhost:8080/admin/v1/stats
# Prometheus metrics (check token counters)
curl http://localhost:8080/metrics | grep tokenhub_tokens
# TSDB metrics list
curl http://localhost:8080/admin/v1/tsdb/metrics
# SSE event stream (live)
curl -N http://localhost:8080/admin/v1/events
# Health status
curl http://localhost:8080/admin/v1/health-
"no eligible models registered": Either no adapters are registered (check
~/.tokenhub/credentialsand/admin/v1/providers), or the only adapter's provider is in health cooldown. Check/admin/v1/healthfor cooldown state. -
Health prober uses stale endpoint: The prober is initialized once at startup. If you PATCH a provider's
base_url, the adapter is re-registered but the prober keeps probing the old endpoint. Restart the container to fix. -
Tokens show 0 for streaming requests: Streaming responses (
stream: true) don't include ausageblock in most providers. Token counts will be 0 for streamed requests. -
TSDB query returns empty right after requests: The TSDB write buffer flushes every 30 seconds. Wait or call the flush endpoint if one exists.
-
Model IDs with slashes fail on PATCH/DELETE: Use the wildcard routes (
/admin/v1/models/*) and do NOT URL-encode the slashes. The UI sends literal/characters. -
Cost is $0 for all requests: Check that models have
input_per_1kandoutput_per_1kset to non-zero values. Models registered via the credentials file or admin API may have pricing set to 0 if not explicitly provided. -
Vault is locked after restart: The vault salt is persisted in the database, but the master password is not stored anywhere. Set
TOKENHUB_VAULT_PASSWORDfor automatic unlock, or unlock manually via the UI or API after each restart.
make test # All unit tests (24 packages, ~33 test files)
make test-race # With race detector
make test-coverage # With coverage report
make test-integration # Integration tests against running container
make test-e2e # End-to-end Temporal testsTest files are co-located with source (*_test.go). Key test files:
internal/httpapi/handlers_test.go(1547 lines) — comprehensive HTTP handler testsinternal/httpapi/handlers_extended_test.go(1625 lines) — extended handler testsinternal/temporal/workflows_test.go— Temporal workflow tests with mock environments
- ~25,200 lines of Go across 52 source files and 33 test files
- ~1,240 lines of HTML/JS in the admin dashboard
- 24 Go packages under
internal/ - 3 provider adapters (OpenAI, Anthropic, vLLM)
- 6 observability sinks (Prometheus, Store, EventBus/SSE, Stats, TSDB, Budget)
| Dependency | Purpose |
|---|---|
github.com/go-chi/chi/v5 |
HTTP router |
modernc.org/sqlite |
Pure-Go SQLite driver (no CGO) |
golang.org/x/crypto |
Argon2id for vault key derivation |
github.com/prometheus/client_golang |
Prometheus metrics |
go.temporal.io/sdk |
Temporal workflow SDK |
go.opentelemetry.io/otel |
OpenTelemetry tracing |
make release # Patch bump (v0.2.5 → v0.2.6)
make release-minor # Minor bump (v0.2.6 → v0.3.0)
make release-major # Major bump (v0.3.0 → v1.0.0)The scripts/release.sh script:
- Ensures clean working tree
- Bumps the version tag
- Builds the Docker image
- Tags for GHCR
- Pushes images to
ghcr.io/jordanhubbard/tokenhub - Creates a git tag and pushes it
# First time setup
make setup # Fix Docker CLI issues (macOS)
cp .env.example .env # Edit with your tokens
# Create credentials file
mkdir -p ~/.tokenhub && chmod 700 ~/.tokenhub
# Edit ~/.tokenhub/credentials with your providers and API keys (see docs)
chmod 600 ~/.tokenhub/credentials
# Build and run
make run # Builds image, starts stack, tails logs
# After code changes, rebuild
make package # Rebuild image
docker compose down tokenhub && docker compose up -d tokenhub
# Or use the Makefile build for faster iteration (creates bin/tokenhub)
make build-
extractUsagefalse positive on OpenAI format: The OpenAI JSON parser would succeed even when theusageblock had zero-valuedprompt_tokens/completion_tokensfields, preventing fallthrough to the Anthropic parser. Fixed by requiring at least one token count to be non-zero before accepting a parse. Same fix applied to the duplicateextractProviderUsageininternal/temporal/activities.go. -
Stats collector lock gap:
Summary(),Global(), andSummaryByProvider()calledPrune()(write lock), released it, then acquired a read lock — creating a window where data could change. Fixed withsnapshotsAfterPrune()that atomically prunes and copies the snapshot slice under a single write lock. -
Missing
/v1/modelsendpoint: OpenAI SDK clients expectGET /v1/models. The scope mapping existed inrouteToScope()but no handler was mounted. AddedModelsListPublicHandlerreturning an OpenAI-compatible model list. -
Missing
Content-Typeon/healthz: The health endpoint returned JSON without theapplication/jsonContent-Type header. -
Timestamp precision loss:
ListRequestLogsand other SQLite read paths usedtime.Parse(time.RFC3339, ...)which truncates sub-second precision. Go'stime.Now()produces nanosecond timestamps stored as RFC3339Nano strings. AddedparseTime()helper that tries RFC3339Nano first. -
docker-compose vLLM endpoint mismatch: The compose file had a mock vLLM endpoint while the real endpoint was different. Resolved by removing provider env vars entirely — providers are now loaded from
~/.tokenhub/credentialsat startup or registered at runtime via the admin API,tokenhubctl, or the UI. -
Persisted providers not restored on restart: Providers registered via the admin API had their DB records preserved but no runtime adapters were created at startup. Added
loadPersistedProviders()inserver.gothat reads provider records from the DB and creates adapters before the health prober starts, so persisted providers survive restarts. -
Provider upsert defaults
enabledtofalse: TheProvidersUpsertHandlerdecoded the JSON request into aProviderUpsertRequeststruct. When the JSON omitted theenabledfield, Go's zero-valuefalsewas stored in the DB, overwriting previously-enabled providers. Fixed by defaultingreq.Enabled = truebefore JSON decode. -
Credentials file was runtime-only: The
loadCredentialsFile()function only created runtime adapters without persisting to the database or storing API keys in the vault. Fixed by enhancing it to upsert providers/models to the DB and store API keys in the vault when unlocked. This makes the credentials file a complete bootstrap mechanism — providers persist across restarts after first load.
- Pure-Go SQLite (
modernc.org/sqlite) avoids CGO, enabling static builds and scratch/Alpine containers - Single-binary server — no sidecar processes needed; Temporal and OTel are opt-in
- Embedded UI via
go:embed— no separate frontend build step or asset server - Provider response treated as opaque
json.RawMessage— the router doesn't parse responses; it's the handler layer that extracts usage data for observability - Health prober includes persisted providers —
loadPersistedProviders()runs before the prober starts, so both credentials-file and DB-stored providers are probed from boot - Thompson Sampling parameters are refreshed from the reward_logs table every 5 minutes by a background goroutine
- Idempotency is enforced via an in-memory cache with 5-minute TTL and 10k max entries