Skip to content

Phase 4.5/4.6: WebP screenshots + navigate body dedup#747

Merged
bicced merged 4 commits intomainfrom
feat/phase4-screenshots-body
Apr 25, 2026
Merged

Phase 4.5/4.6: WebP screenshots + navigate body dedup#747
bicced merged 4 commits intomainfrom
feat/phase4-screenshots-body

Conversation

@bicced
Copy link
Copy Markdown
Contributor

@bicced bicced commented Apr 25, 2026

Summary

§7.5 + §7.6 of the browser-automation roadmap — token-efficiency wins on the browser tool surface.

  • §7.5 WebP screenshots. browser_screenshot defaults to WebP (quality=75, scale=1.0) with new format/quality/scale params. Encodes via Pillow post-capture so the page renderer is untouched (no fingerprint signal). Pillow missing or encode failure → graceful PNG fallback so agents always get a usable image. Service-side response now includes a bytes field for dashboard observability.
  • §7.6 Navigate body dedup. navigate(snapshot_after=true) caps the body preview at 1000 chars instead of 5000 because the snapshot already carries the element tree. Without snapshot_after the historical 5000-char cap stays.

Risk and rollout

  • WebP encoding has a CPU cost of ~10–20 ms per ~1080p image — invisible against the ~hundreds of ms a real screenshot already takes.
  • Multimodal LLMs (Claude, GPT-4 vision) accept image/webp data URIs; media_type is set from the actual encoded format, not the requested one.
  • Pillow added to pyproject.toml dependencies and to Dockerfile.browser pip install. WebP encoder is bundled in Pillow's manylinux wheel — no apt-level libwebp needed.
  • format='png' with scale=1.0 is a fast pass-through — zero Pillow round-trip.

Test plan

  • WebP encode produces RIFF/WEBP-magic bytes and round-trips through Pillow.
  • Explicit format='png' + scale=1.0 returns Playwright bytes verbatim.
  • scale=0.5 halves dimensions.
  • Unknown format rejected with clear error.
  • Pillow missing → PNG fallback (monkeypatch of __import__).
  • Body cap = 1000 with snapshot_after=true; 5000 without.
  • Existing screenshot/navigate tests still pass (490/490 in test_browser_service + test_builtins).

bicced added 4 commits April 25, 2026 20:09
§7.5 — browser_screenshot now defaults to WebP (quality=75) with
optional scale=0.5–1.0 downscale, gated by Pillow availability.
WebP yields ~5–10× smaller payloads than PNG for typical web pages,
which compounds heavily across multi-step browsing tasks. Pillow
missing or encode failure → graceful PNG fallback (existing tests
keep passing; agents always get a usable image). Service-side adds
a 'bytes' field so dashboards can chart payload sizes.

§7.6 — navigate(snapshot_after=true) caps body preview at 1000
chars instead of 5000 because the snapshot already carries the
full element tree. Without snapshot_after the agent depends on
body for content and the historical 5000-char cap stays.

Pillow added to pyproject.toml dependencies and Dockerfile.browser
pip install. WebP encoding uses Pillow's bundled libwebp on the
manylinux wheel, no apt-level dep needed.
- format input now strips whitespace before lowercasing (matches the
  agent skill's documented contract; LLMs frequently pass " WEBP ").
- Empty/None format consults BROWSER_SCREENSHOT_FORMAT (per §2.1)
  before falling back to "webp", giving operators a globally
  enforceable default without rebuilding agent prompts.
- Pillow encode now runs through asyncio.to_thread so a 10-20 ms
  encode doesn't block the event loop. Pillow releases the GIL
  during its C-level encode, so concurrent agents parallelize.

Adds two regression tests: format=None hits the operator default,
format=' WEBP ' normalizes to webp.
Pre-fix: when snapshot_after=True but the snapshot itself failed or
returned empty, the agent received body truncated to 1000 chars AND
an empty/{} snapshot — strictly worse than the snapshot_after=False
path. The 1000-char trim was applied at body-extraction time (before
the snapshot ran), so we couldn't recover.

Fix: extract body at the historical 5000-char cap up front, then
trim to 1000 ONLY when the snapshot succeeded. Failure path keeps
the full body so the agent still has usable page text alongside
the empty snapshot.

Adds test_body_falls_back_to_5000_when_snapshot_fails covering this
regression.
@bicced bicced merged commit 407702e into main Apr 25, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant