Phase 4.5/4.6: WebP screenshots + navigate body dedup#747
Merged
Conversation
§7.5 — browser_screenshot now defaults to WebP (quality=75) with optional scale=0.5–1.0 downscale, gated by Pillow availability. WebP yields ~5–10× smaller payloads than PNG for typical web pages, which compounds heavily across multi-step browsing tasks. Pillow missing or encode failure → graceful PNG fallback (existing tests keep passing; agents always get a usable image). Service-side adds a 'bytes' field so dashboards can chart payload sizes. §7.6 — navigate(snapshot_after=true) caps body preview at 1000 chars instead of 5000 because the snapshot already carries the full element tree. Without snapshot_after the agent depends on body for content and the historical 5000-char cap stays. Pillow added to pyproject.toml dependencies and Dockerfile.browser pip install. WebP encoding uses Pillow's bundled libwebp on the manylinux wheel, no apt-level dep needed.
- format input now strips whitespace before lowercasing (matches the agent skill's documented contract; LLMs frequently pass " WEBP "). - Empty/None format consults BROWSER_SCREENSHOT_FORMAT (per §2.1) before falling back to "webp", giving operators a globally enforceable default without rebuilding agent prompts. - Pillow encode now runs through asyncio.to_thread so a 10-20 ms encode doesn't block the event loop. Pillow releases the GIL during its C-level encode, so concurrent agents parallelize. Adds two regression tests: format=None hits the operator default, format=' WEBP ' normalizes to webp.
Pre-fix: when snapshot_after=True but the snapshot itself failed or
returned empty, the agent received body truncated to 1000 chars AND
an empty/{} snapshot — strictly worse than the snapshot_after=False
path. The 1000-char trim was applied at body-extraction time (before
the snapshot ran), so we couldn't recover.
Fix: extract body at the historical 5000-char cap up front, then
trim to 1000 ONLY when the snapshot succeeded. Failure path keeps
the full body so the agent still has usable page text alongside
the empty snapshot.
Adds test_body_falls_back_to_5000_when_snapshot_fails covering this
regression.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
§7.5 + §7.6 of the browser-automation roadmap — token-efficiency wins on the browser tool surface.
browser_screenshotdefaults to WebP (quality=75, scale=1.0) with newformat/quality/scaleparams. Encodes via Pillow post-capture so the page renderer is untouched (no fingerprint signal). Pillow missing or encode failure → graceful PNG fallback so agents always get a usable image. Service-side response now includes abytesfield for dashboard observability.navigate(snapshot_after=true)caps thebodypreview at 1000 chars instead of 5000 because the snapshot already carries the element tree. Withoutsnapshot_afterthe historical 5000-char cap stays.Risk and rollout
image/webpdata URIs;media_typeis set from the actual encoded format, not the requested one.pyproject.tomldependencies and toDockerfile.browserpip install. WebP encoder is bundled in Pillow's manylinux wheel — no apt-level libwebp needed.format='png'withscale=1.0is a fast pass-through — zero Pillow round-trip.Test plan
format='png'+scale=1.0returns Playwright bytes verbatim.scale=0.5halves dimensions.__import__).snapshot_after=true; 5000 without.