Skip to content

Koiiichi/symphony

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Symphony

Symphony

LLM-first web reliability orchestrator.


Symphony runs goal-driven browser flows against your web application, evaluates results with a first-class reliability gate, and drives an evidence-based patch cycle until the application passes or the pass limit is reached.

It replaces brittle keyword heuristics and hardcoded interaction patterns with a structured plan emitted by an LLM, executed by a deterministic browser layer, and evaluated against explicit assertions.

How it works

Plan. You provide a goal in plain language. An LLM reads minimal project context — your dependency manifest, top-level file listing — and emits a typed TaskGraph: an ordered sequence of nodes connected by explicit dependency edges. If the LLM call fails or returns invalid output, a minimal heuristic graph (detect → start → test → finalize) is used instead.

Execute. Nodes run in topological order. Stack detection scans for language and framework markers. Service startup launches your backend and frontend as subprocesses. Browser flow nodes drive a real Chrome instance using a validated Flow DSL — the LLM cannot run arbitrary scripts, only emit structured actions that are validated before execution. API check nodes hit endpoints directly. Code patch nodes apply search-replace diffs to your source files. Screenshots are captured at every step.

Evaluate. A Reliability Evaluator checks every flow assertion, HTTP expectation, and accessibility requirement. It produces a machine-readable report with typed failure reasons and artifact pointers. Exit code is 0 on pass, 1 on failure.

Patch. If --passes is greater than 1 and the run failed, Symphony feeds the failure evidence back to the LLM, which proposes a targeted source patch. The graph reruns. This repeats until the application passes or the pass limit is reached.

Comparison

Symphony is a runtime reliability tool, not a coding assistant. The distinction matters.

Tools like GitHub Copilot and Cursor operate on source code. Their loop is: read code → suggest a change → you apply it. There is no running application involved, no browser, and no way to know whether the change actually worked without manually testing it.

Symphony operates on a running application. Its loop is: run the app → drive a real browser → observe what happens → patch if needed → retest automatically. Code edits are a side effect, not the primary action.

Coding assistants Symphony
Primary input Source code A goal and a running application
Evidence Static analysis Screenshots, HTTP traces, DOM snapshots
Loop Edit → you test Run → observe → patch → retest
Pass/fail signal None Typed assertions, machine-readable report
CI integration No Yes (exit code 0/1)

The tools are complementary. A coding assistant writes the feature; Symphony verifies it works in a real browser before it ships.

Requirements

  • Python 3.12 or later
  • Chrome and ChromeDriver (for browser execution)
  • An OpenAI or Google Gemini API key

Installation

# With OpenAI support
uv sync --extra openai

# With Gemini support
uv sync --extra gemini

# With token budget enforcement
uv sync --extra openai --extra budget

Configuration

Set one of the following environment variables:

export OPENAI_API_KEY=sk-...
# or
export GEMINI_API_KEY=AI...

The provider is auto-detected from whichever key is present. If both are set, pass --provider explicitly.

Usage

Run a reliability check:

symphony run \
  --project ./my-app \
  --goal "Submit the contact form and verify the success message" \
  --passes 3

Inspect the plan without executing:

symphony plan \
  --goal "Verify login with valid and invalid credentials" \
  --emit-taskgraph

Replay a previous run:

symphony replay --run-id run_20260419_150159

Options

Flag Description
--project Path to the project directory
--goal Goal in plain language
--profile Execution profile: web (default) or api
--passes Number of fix-retest cycles (default: 1)
--provider openai or gemini (auto-detected if unset)
--model Model name (defaults to provider default)
--token-budget Max prompt tokens — unlimited by default
--artifact-dir Directory for screenshots, traces, and reports

Provider defaults

Provider Default model
OpenAI gpt-5.4-mini
Gemini gemini-3-flash-preview

Pass --model to override.

Output

Every run writes to the artifact directory:

artifacts/run_<timestamp>/
  taskgraph.json     TaskGraph the planner produced
  report.json        Machine-readable pass/fail report
  <node_id>/         Per-node screenshots and DOM snapshots
    step_001_navigate.png
    step_002_click.png
    ...

report.json schema:

{
  "status": "pass | fail",
  "failing_reasons": [
    { "id": "...", "severity": "critical | warning", "message": "..." }
  ],
  "assertion_results": [
    { "assertion_id": "...", "passed": true, "message": "..." }
  ],
  "artifacts": [
    { "type": "screenshot", "path": "...", "node_id": "..." }
  ],
  "token_usage": { "total": 1240 },
  "planner_confidence": 0.9
}

Exit code is 0 on pass and 1 on failure, suitable for use in CI.

Flow DSL

Symphony's browser actions are structured and validated before execution:

Action Required fields
navigate value (URL)
scroll params.direction, params.pixels
click selector
fill selector, value
press value (key name)
wait_for selector
assert_text selector, value
assert_http_status value (status code)
assert_banner value (expected text)
other other_action_type (description)

The other action type accepts any task that doesn't fit the standard set. It is logged and treated as a no-op in the executor, keeping execution strictly deterministic.

TaskGraph node types

Type Purpose
stack_detect Identify project language and framework
service_start Start backend or frontend servers
ui_discovery Explore the UI before testing
web_flow_test Execute a browser flow with assertions
api_check Verify an API endpoint directly
code_patch Modify application source to fix a failure
retest Re-run a prior test node after a patch
finalize Cleanup and report generation
other Any task outside the standard types

Development

python -m venv .venv && source .venv/bin/activate
pip install -e ".[openai,budget,dev]"
python -m pytest

All tests run without a live browser or API key. Browser-dependent tests require Chrome and a valid API key.

License

MIT

About

Lightweight multi-agent framework for autonomous web app generation, testing, and refinement.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages