0.1.0 — engine-agnostic Task abstraction, no_std-tiered features, HIR-anchored regex#1
Merged
Conversation
…-anchored regex
First publishable cut of the crate. Renamed from `vlm-tasks` and rebuilt
around the principle that the prompt + grammar + parser shape is
engine-independent — the same `Task` should run against `lfm`
(llguidance) and `qwen` (mistralrs) without translation.
## Generic `Task` trait
Three associated types (`Output`, `Value`, `ParseError: core::error::Error`)
replace the prior fixed-JSON-schema, fixed-error-enum shape. Engines
bind `Value` to the schema type they consume directly
(`serde_json::Value` for JSON-schema-only engines, `SmolStr` for
Lark/Regex engines). No thread-safety bounds at the trait level —
`Send + Sync + 'static` is added at engine call sites where it's
actually needed, so non-`Send` Tasks compile without ceremony.
## `Grammar` enum, `#[non_exhaustive]`
`JsonSchema` (behind `json`), `Lark`, `Regex` (behind `regex`).
Engines pattern-match and return `UnsupportedGrammar` for variants
they don't speak, so callers can route to a different backend.
A bare `Grammar::Regex(regex::Regex)` would let callers smuggle in a
`RegexBuilder::case_insensitive(true)` regex whose `as_str()` returned
the plain pattern but whose `is_match` matched additional case-flipped
strings — silently diverging local validation from engine constraint.
The `RegexGrammar` private wrapper forces construction through
`Grammar::regex(&str)` (default options only).
## HIR-anchored full-match validator
`Grammar::is_regex_full_match` / `RegexGrammar::is_full_match`. The
Rust `regex` crate is unanchored leftmost-first, but engines like
llguidance treat the supplied pattern as anchor-implicit / full-match.
Two simpler approaches don't work and are rejected on purpose:
- `Regex::new(format!(r"\A(?:{p})\z"))` breaks verbose-mode patterns:
`(?x)[0-9]+ # comment` compiles bare but explodes wrapped because
the comment swallows the injected `\z`.
- `find()` + span equality breaks prefix alternations: `a|ab` against
input `ab` returns the shorter `0..1` match for `a`, and the span
check fails — even though `ab` is in the language.
The HIR path (`Hir::concat([Look::Start, parsed_hir, Look::End])`)
puts the anchors in the regex grammar itself, so backtracking has a
reason to retry longer alternatives. Pinned by regression tests for
both failure modes.
## `ImageAnalysis` canonical type
Single-image VLM output shape (scene/description/subjects/objects/
actions/mood/lighting/shot-type/tags). Detection-array fields are
flat `Vec<SmolStr>` — VLM self-reported confidence is poorly
calibrated, and a flat hardcoded confidence on every entry is a no-op
for both UX and search ranking. Per-detection scoring belongs in
search-time embedding similarity, not in the VLM output type.
## no_std/alloc/std feature tiers
Bare no_std builds against the `alloc` prelude via
`extern crate alloc as std;`. Tests that need `std::sync::OnceLock`
are gated on `feature = "std"` AND `any(feature = "json",
feature = "regex")` to avoid unused-import warnings under
`--features std` alone (the test module is empty when neither
test-bearing feature is on).
## Feature graph hardening
- All deps declared with `default-features = false` so consumers
don't get silent std linkage from `--no-default-features
--features alloc,…`.
- `std` re-enables dep-side `std`/`default` features via weak
`?/std` features.
- `json` no longer transitively pulls the public `serde` feature —
`serde_json` brings serde-the-crate into the dep tree, but
`#[cfg(feature = "serde")]` derives only fire when the user opts
in explicitly.
- `regex` adds `regex-syntax` as a direct dep (already a transitive
dep of `regex`) for HIR access.
## Verification
- `cargo test --all-features` — 23 lib tests, 0 failures
- `cargo hack --feature-powerset test` — all 21 feature
combinations build and pass
- `cargo check --no-default-features` — bare no_std builds clean
- Codex adversarial review approved (rebuilt from 13 review rounds)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First publishable cut of the crate. Renamed from
vlm-tasksand rebuilt around the principle that the prompt + grammar + parser shape is engine-independent — the sameTaskshould run againstlfm(llguidance) andqwen(mistralrs) without translation.What's in scope
Generic
Tasktrait. Three associated types (Output,Value,ParseError: core::error::Error) replace the prior fixed-JSON-schema, fixed-error-enum shape. Engines bindValueto the schema type they consume directly (serde_json::Valuefor JSON-schema-only engines,SmolStrfor Lark/Regex engines). No thread-safety bounds at the trait level —Send + Sync + 'staticis added at engine call sites where it's actually needed, so non-SendTasks compile without ceremony.Grammarenum,#[non_exhaustive].JsonSchema(behindjson),Lark,Regex(behindregex). Engines pattern-match and returnUnsupportedGrammarfor variants they don't speak, so callers can route to a different backend.RegexGrammarprivate wrapper. A bareGrammar::Regex(regex::Regex)would let callers smuggle in aRegexBuilder::case_insensitive(true)regex whoseas_str()returned the plain pattern but whoseis_matchmatched additional case-flipped strings — silently diverging local validation from engine constraint. The wrapper forces construction throughGrammar::regex(&str)(default options only).HIR-anchored full-match validator (
Grammar::is_regex_full_match/RegexGrammar::is_full_match). The Rustregexcrate is unanchored leftmost-first, but engines like llguidance treat the supplied pattern as anchor-implicit / full-match. Two simpler approaches don't work and are rejected on purpose:Regex::new(format!(r"\A(?:{p})\z"))breaks verbose-mode patterns:(?x)[0-9]+ # commentcompiles bare but explodes wrapped because the comment swallows the injected\z.find()+ span equality breaks prefix alternations:a|abagainst inputabreturns the shorter0..1match fora, and the span check fails — even thoughabis in the language.The HIR path (
Hir::concat([Look::Start, parsed_hir, Look::End])) puts the anchors in the regex grammar itself, so backtracking has a reason to retry longer alternatives. Pinned by regression tests for both failure modes.ImageAnalysiscanonical type. Single-image VLM output shape (scene/description/subjects/objects/actions/mood/lighting/shot-type/tags). Detection-array fields are flatVec<SmolStr>— VLM self-reported confidence is poorly calibrated, and a flat hardcoded confidence on every entry is a no-op for both UX and search ranking. Per-detection scoring belongs in search-time embedding similarity, not in the VLM output type.no_std/alloc/std feature tiers. Bare no_std builds against the
allocprelude viaextern crate alloc as std;. Tests that needstd::sync::OnceLockare gated onfeature = "std"; the rest typecheck under--no-default-features --features alloc.Feature graph hardening:
default-features = falseso consumers don't get silent std linkage from--no-default-features --features alloc,….stdre-enables dep-sidestd/defaultfeatures via weak?/stdfeatures.jsonno longer transitively pulls the publicserdefeature —serde_jsonbrings serde-the-crate into the dep tree, but#[cfg(feature = "serde")]derives only fire when the user opts in explicitly.regexaddsregex-syntaxas a direct dep (already a transitive dep ofregex) for HIR access.Diff scope
27 commits, +1590/-101 lines across 11 files. Crate rename, README rewrite in mediatime style, then 13 rounds of Codex adversarial review iterating on the feature graph, generic Task trait, error type, regex wrapper, and finally the HIR-anchored validator. Final cleanup commit removes round-number meta-commentary from doc comments.
Test plan
cargo test --all-features— 23 lib tests, 0 failurescargo check --no-default-features— bare no_std builds cleancargo check --no-default-features --features alloc— alloc-only API surface compilescargo check --no-default-features --features regex— regex without std works (anchored validator + HIR path)cargo check --no-default-features --features serde— opt-in serde derive onImageAnalysisworks21c2127)🤖 Generated with Claude Code