LLM-first web reliability orchestrator.
Symphony runs goal-driven browser flows against your web application, evaluates results with a first-class reliability gate, and drives an evidence-based patch cycle until the application passes or the pass limit is reached.
It replaces brittle keyword heuristics and hardcoded interaction patterns with a structured plan emitted by an LLM, executed by a deterministic browser layer, and evaluated against explicit assertions.
Plan. You provide a goal in plain language. An LLM reads minimal project context — your dependency manifest, top-level file listing — and emits a typed TaskGraph: an ordered sequence of nodes connected by explicit dependency edges. If the LLM call fails or returns invalid output, a minimal heuristic graph (detect → start → test → finalize) is used instead.
Execute. Nodes run in topological order. Stack detection scans for language and framework markers. Service startup launches your backend and frontend as subprocesses. Browser flow nodes drive a real Chrome instance using a validated Flow DSL — the LLM cannot run arbitrary scripts, only emit structured actions that are validated before execution. API check nodes hit endpoints directly. Code patch nodes apply search-replace diffs to your source files. Screenshots are captured at every step.
Evaluate. A Reliability Evaluator checks every flow assertion, HTTP expectation, and accessibility requirement. It produces a machine-readable report with typed failure reasons and artifact pointers. Exit code is 0 on pass, 1 on failure.
Patch. If --passes is greater than 1 and the run failed, Symphony feeds the failure evidence back to the LLM, which proposes a targeted source patch. The graph reruns. This repeats until the application passes or the pass limit is reached.
Symphony is a runtime reliability tool, not a coding assistant. The distinction matters.
Tools like GitHub Copilot and Cursor operate on source code. Their loop is: read code → suggest a change → you apply it. There is no running application involved, no browser, and no way to know whether the change actually worked without manually testing it.
Symphony operates on a running application. Its loop is: run the app → drive a real browser → observe what happens → patch if needed → retest automatically. Code edits are a side effect, not the primary action.
| Coding assistants | Symphony | |
|---|---|---|
| Primary input | Source code | A goal and a running application |
| Evidence | Static analysis | Screenshots, HTTP traces, DOM snapshots |
| Loop | Edit → you test | Run → observe → patch → retest |
| Pass/fail signal | None | Typed assertions, machine-readable report |
| CI integration | No | Yes (exit code 0/1) |
The tools are complementary. A coding assistant writes the feature; Symphony verifies it works in a real browser before it ships.
- Python 3.12 or later
- Chrome and ChromeDriver (for browser execution)
- An OpenAI or Google Gemini API key
# With OpenAI support
uv sync --extra openai
# With Gemini support
uv sync --extra gemini
# With token budget enforcement
uv sync --extra openai --extra budgetSet one of the following environment variables:
export OPENAI_API_KEY=sk-...
# or
export GEMINI_API_KEY=AI...The provider is auto-detected from whichever key is present. If both are set, pass --provider explicitly.
Run a reliability check:
symphony run \
--project ./my-app \
--goal "Submit the contact form and verify the success message" \
--passes 3Inspect the plan without executing:
symphony plan \
--goal "Verify login with valid and invalid credentials" \
--emit-taskgraphReplay a previous run:
symphony replay --run-id run_20260419_150159| Flag | Description |
|---|---|
--project |
Path to the project directory |
--goal |
Goal in plain language |
--profile |
Execution profile: web (default) or api |
--passes |
Number of fix-retest cycles (default: 1) |
--provider |
openai or gemini (auto-detected if unset) |
--model |
Model name (defaults to provider default) |
--token-budget |
Max prompt tokens — unlimited by default |
--artifact-dir |
Directory for screenshots, traces, and reports |
| Provider | Default model |
|---|---|
| OpenAI | gpt-5.4-mini |
| Gemini | gemini-3-flash-preview |
Pass --model to override.
Every run writes to the artifact directory:
artifacts/run_<timestamp>/
taskgraph.json TaskGraph the planner produced
report.json Machine-readable pass/fail report
<node_id>/ Per-node screenshots and DOM snapshots
step_001_navigate.png
step_002_click.png
...
report.json schema:
{
"status": "pass | fail",
"failing_reasons": [
{ "id": "...", "severity": "critical | warning", "message": "..." }
],
"assertion_results": [
{ "assertion_id": "...", "passed": true, "message": "..." }
],
"artifacts": [
{ "type": "screenshot", "path": "...", "node_id": "..." }
],
"token_usage": { "total": 1240 },
"planner_confidence": 0.9
}Exit code is 0 on pass and 1 on failure, suitable for use in CI.
Symphony's browser actions are structured and validated before execution:
| Action | Required fields |
|---|---|
navigate |
value (URL) |
scroll |
params.direction, params.pixels |
click |
selector |
fill |
selector, value |
press |
value (key name) |
wait_for |
selector |
assert_text |
selector, value |
assert_http_status |
value (status code) |
assert_banner |
value (expected text) |
other |
other_action_type (description) |
The other action type accepts any task that doesn't fit the standard set. It is logged and treated as a no-op in the executor, keeping execution strictly deterministic.
| Type | Purpose |
|---|---|
stack_detect |
Identify project language and framework |
service_start |
Start backend or frontend servers |
ui_discovery |
Explore the UI before testing |
web_flow_test |
Execute a browser flow with assertions |
api_check |
Verify an API endpoint directly |
code_patch |
Modify application source to fix a failure |
retest |
Re-run a prior test node after a patch |
finalize |
Cleanup and report generation |
other |
Any task outside the standard types |
python -m venv .venv && source .venv/bin/activate
pip install -e ".[openai,budget,dev]"
python -m pytestAll tests run without a live browser or API key. Browser-dependent tests require Chrome and a valid API key.
MIT