Mark web pages for use with vision-language models
-
Updated
Mar 8, 2026 - TypeScript
Mark web pages for use with vision-language models
Set-of-Mark detection pipeline for macOS — Apple Vision, YOLO11, and VLM on MLX. Transforms screenshots into numbered element maps and structured JSON manifests.
Temporal smoothing for UI element detection with OmniParser integration
Add a description, image, and links to the set-of-mark topic page so that developers can more easily learn about it.
To associate your repository with the set-of-mark topic, visit your repo's landing page and select "manage topics."