Skip to content

0.1.0 — engine-agnostic Task abstraction, no_std-tiered features, HIR-anchored regex#1

Merged
uqio merged 1 commit into
0.1.0-basefrom
0.1.0
May 10, 2026
Merged

0.1.0 — engine-agnostic Task abstraction, no_std-tiered features, HIR-anchored regex#1
uqio merged 1 commit into
0.1.0-basefrom
0.1.0

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented May 10, 2026

First publishable cut of the crate. Renamed from vlm-tasks and rebuilt around the principle that the prompt + grammar + parser shape is engine-independent — the same Task should run against lfm (llguidance) and qwen (mistralrs) without translation.

PR base is 0.1.0-base (a snapshot of the real local main, pushed as a fresh branch). The repo's actual main ref is an unrelated template-rs placeholder that has no shared history with this work.

What's in scope

Generic Task trait. Three associated types (Output, Value, ParseError: core::error::Error) replace the prior fixed-JSON-schema, fixed-error-enum shape. Engines bind Value to the schema type they consume directly (serde_json::Value for JSON-schema-only engines, SmolStr for Lark/Regex engines). No thread-safety bounds at the trait level — Send + Sync + 'static is added at engine call sites where it's actually needed, so non-Send Tasks compile without ceremony.

Grammar enum, #[non_exhaustive]. JsonSchema (behind json), Lark, Regex (behind regex). Engines pattern-match and return UnsupportedGrammar for variants they don't speak, so callers can route to a different backend.

RegexGrammar private wrapper. A bare Grammar::Regex(regex::Regex) would let callers smuggle in a RegexBuilder::case_insensitive(true) regex whose as_str() returned the plain pattern but whose is_match matched additional case-flipped strings — silently diverging local validation from engine constraint. The wrapper forces construction through Grammar::regex(&str) (default options only).

HIR-anchored full-match validator (Grammar::is_regex_full_match / RegexGrammar::is_full_match). The Rust regex crate is unanchored leftmost-first, but engines like llguidance treat the supplied pattern as anchor-implicit / full-match. Two simpler approaches don't work and are rejected on purpose:

  • Regex::new(format!(r"\A(?:{p})\z")) breaks verbose-mode patterns: (?x)[0-9]+ # comment compiles bare but explodes wrapped because the comment swallows the injected \z.
  • find() + span equality breaks prefix alternations: a|ab against input ab returns the shorter 0..1 match for a, and the span check fails — even though ab is in the language.

The HIR path (Hir::concat([Look::Start, parsed_hir, Look::End])) puts the anchors in the regex grammar itself, so backtracking has a reason to retry longer alternatives. Pinned by regression tests for both failure modes.

ImageAnalysis canonical type. Single-image VLM output shape (scene/description/subjects/objects/actions/mood/lighting/shot-type/tags). Detection-array fields are flat Vec<SmolStr> — VLM self-reported confidence is poorly calibrated, and a flat hardcoded confidence on every entry is a no-op for both UX and search ranking. Per-detection scoring belongs in search-time embedding similarity, not in the VLM output type.

no_std/alloc/std feature tiers. Bare no_std builds against the alloc prelude via extern crate alloc as std;. Tests that need std::sync::OnceLock are gated on feature = "std"; the rest typecheck under --no-default-features --features alloc.

Feature graph hardening:

  • All deps declared with default-features = false so consumers don't get silent std linkage from --no-default-features --features alloc,….
  • std re-enables dep-side std/default features via weak ?/std features.
  • json no longer transitively pulls the public serde feature — serde_json brings serde-the-crate into the dep tree, but #[cfg(feature = "serde")] derives only fire when the user opts in explicitly.
  • regex adds regex-syntax as a direct dep (already a transitive dep of regex) for HIR access.

Diff scope

27 commits, +1590/-101 lines across 11 files. Crate rename, README rewrite in mediatime style, then 13 rounds of Codex adversarial review iterating on the feature graph, generic Task trait, error type, regex wrapper, and finally the HIR-anchored validator. Final cleanup commit removes round-number meta-commentary from doc comments.

Test plan

  • cargo test --all-features — 23 lib tests, 0 failures
  • cargo check --no-default-features — bare no_std builds clean
  • cargo check --no-default-features --features alloc — alloc-only API surface compiles
  • cargo check --no-default-features --features regex — regex without std works (anchored validator + HIR path)
  • cargo check --no-default-features --features serde — opt-in serde derive on ImageAnalysis works
  • Codex adversarial review approved at round 13 (21c2127)

🤖 Generated with Claude Code

…-anchored regex

First publishable cut of the crate. Renamed from `vlm-tasks` and rebuilt
around the principle that the prompt + grammar + parser shape is
engine-independent — the same `Task` should run against `lfm`
(llguidance) and `qwen` (mistralrs) without translation.

## Generic `Task` trait

Three associated types (`Output`, `Value`, `ParseError: core::error::Error`)
replace the prior fixed-JSON-schema, fixed-error-enum shape. Engines
bind `Value` to the schema type they consume directly
(`serde_json::Value` for JSON-schema-only engines, `SmolStr` for
Lark/Regex engines). No thread-safety bounds at the trait level —
`Send + Sync + 'static` is added at engine call sites where it's
actually needed, so non-`Send` Tasks compile without ceremony.

## `Grammar` enum, `#[non_exhaustive]`

`JsonSchema` (behind `json`), `Lark`, `Regex` (behind `regex`).
Engines pattern-match and return `UnsupportedGrammar` for variants
they don't speak, so callers can route to a different backend.

A bare `Grammar::Regex(regex::Regex)` would let callers smuggle in a
`RegexBuilder::case_insensitive(true)` regex whose `as_str()` returned
the plain pattern but whose `is_match` matched additional case-flipped
strings — silently diverging local validation from engine constraint.
The `RegexGrammar` private wrapper forces construction through
`Grammar::regex(&str)` (default options only).

## HIR-anchored full-match validator

`Grammar::is_regex_full_match` / `RegexGrammar::is_full_match`. The
Rust `regex` crate is unanchored leftmost-first, but engines like
llguidance treat the supplied pattern as anchor-implicit / full-match.
Two simpler approaches don't work and are rejected on purpose:

- `Regex::new(format!(r"\A(?:{p})\z"))` breaks verbose-mode patterns:
  `(?x)[0-9]+ # comment` compiles bare but explodes wrapped because
  the comment swallows the injected `\z`.
- `find()` + span equality breaks prefix alternations: `a|ab` against
  input `ab` returns the shorter `0..1` match for `a`, and the span
  check fails — even though `ab` is in the language.

The HIR path (`Hir::concat([Look::Start, parsed_hir, Look::End])`)
puts the anchors in the regex grammar itself, so backtracking has a
reason to retry longer alternatives. Pinned by regression tests for
both failure modes.

## `ImageAnalysis` canonical type

Single-image VLM output shape (scene/description/subjects/objects/
actions/mood/lighting/shot-type/tags). Detection-array fields are
flat `Vec<SmolStr>` — VLM self-reported confidence is poorly
calibrated, and a flat hardcoded confidence on every entry is a no-op
for both UX and search ranking. Per-detection scoring belongs in
search-time embedding similarity, not in the VLM output type.

## no_std/alloc/std feature tiers

Bare no_std builds against the `alloc` prelude via
`extern crate alloc as std;`. Tests that need `std::sync::OnceLock`
are gated on `feature = "std"` AND `any(feature = "json",
feature = "regex")` to avoid unused-import warnings under
`--features std` alone (the test module is empty when neither
test-bearing feature is on).

## Feature graph hardening

- All deps declared with `default-features = false` so consumers
  don't get silent std linkage from `--no-default-features
  --features alloc,…`.
- `std` re-enables dep-side `std`/`default` features via weak
  `?/std` features.
- `json` no longer transitively pulls the public `serde` feature —
  `serde_json` brings serde-the-crate into the dep tree, but
  `#[cfg(feature = "serde")]` derives only fire when the user opts
  in explicitly.
- `regex` adds `regex-syntax` as a direct dep (already a transitive
  dep of `regex`) for HIR access.

## Verification

- `cargo test --all-features` — 23 lib tests, 0 failures
- `cargo hack --feature-powerset test` — all 21 feature
  combinations build and pass
- `cargo check --no-default-features` — bare no_std builds clean
- Codex adversarial review approved (rebuilt from 13 review rounds)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@uqio uqio merged commit 2a93592 into 0.1.0-base May 10, 2026
28 checks passed
@uqio uqio deleted the 0.1.0 branch May 10, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant