Conversation
First publishable cut of the crate. Rust ONNX Runtime port of LiquidAI/LFM2.5-VL-450M, with schema-constrained sampling via llguidance and the full engine-agnostic `llmtask::Task` surface. ## Three layers - **`lfm::Engine`** — sync, single-threaded; built on `ort` 2.0. `Engine::generate(messages, images, opts)` is the unconstrained free-form path. `Engine::run<T: llmtask::Task>(task, messages, images, opts)` is the constrained path: any `Task` whose `Grammar` is JSON Schema, Lark, or Regex routes through llguidance and the result is decoded by the task's `parse` impl. - **`lfm::ImageAnalysisTask`** — built-in image-analysis preset that produces the canonical `llmtask::ImageAnalysis` output type, sharing the schema and resilient parser with `qwen3-vl`. - **`lfm::preproc`** — wasm-friendly image preprocessing surface: `Preprocessor`, `TileGrid`, EXIF-aware decode helpers. Compiles under `--no-default-features --features decoders` for use in contexts that don't need the inference runtime. ## llmtask-driven generic engine The whole inference path takes `&impl Task<Value = Value>`, so a `Task` written once against the `llmtask` contract runs through `lfm` (llguidance) and `qwen3-vl` (mistralrs) without translation. Because lfm's backend is llguidance, all three `Grammar` variants (JSON Schema, Lark, Regex) are accepted; engines that only speak JSON Schema reject the others via `UnsupportedGrammar` and the caller can route to lfm. ## Strict from_dir + bundled escape hatch - `Engine::from_dir` byte-validates the supplied `tokenizer.json`, `chat_template.jinja`, `preprocessor_config.json`, and the `text_config.max_position_embeddings` field of `config.json` against the bundled blobs. A model directory whose ONNX shapes pass but whose tokenizer/template/preprocessor drifted would silently corrupt prompts; this fail-closed check forces the drift into a clear load-time error. - `Engine::from_onnx_dir` (under `bundled` feature) accepts an ONNX-only directory; the bundled tokenizer / chat template / configs are written to a temp file on first use. - `Engine::from_paths` is the unchecked escape hatch for advanced callers pairing custom tokenizers with custom ONNX. ## Architecture (per-image vision encoding) Per-image vision encoding → text+image embedding splice → hybrid KV/conv-state cache decoder loop → optional schema-constrained sampling. Three ONNX graphs: - `vision_encoder.onnx` — SigLIP2 encoder, single image per call (multi-image batching produces silently-wrong embeddings). - `embed_tokens.onnx` — token embedding lookup. - `decoder_model_merged.onnx` — LFM2 hybrid LM (10 conv-state + 6 KV-attn layers at sparse indices). The `Decoder` manages the non-contiguous cache layout transparently. ## Compatibility with the current LFM2.5-VL-450M-ONNX exports Two fixes were needed against the published HF repo: 1. The SigLIP2 NaFlex `pos_embed` Resize-target is computed as `(max_h, max_w) = ReduceMax(spatial_shapes, axis=0)` per axis and reshaped to `[max_h * max_w, dim]`. So `pixel_values.shape[1]` must equal `max_h * max_w` (cross-axis product), not the per- entry `max(h * w)`. `flatten_to_patches` now pads accordingly; per-entry `spatial_shapes` and `pixel_attention_mask` still describe each entry's actual layout. 2. `ort 2.0`'s `Tensor::from_array` rejects any zero-dim shape with `Invalid dimension #N; all dimensions must be >= 1`. This broke `Decoder::new_cache` initialising the empty `[1, 8, 0, 64]` attn cache. Routed the empty cache through `Tensor::<f32>::new( allocator, shape)` (ONNX Runtime allocator path), which accepts zero-element shapes. ## Admission-control DoS guards Bounded request-shape cap (max messages, max content parts), text-size cap, image-count lower bound from `min_image_tokens`, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer's `added_vocabulary`. All run BEFORE any image decode or template render. ## Hardware backends ORT execution providers gated behind feature flags: `cuda`, `tensorrt`, `directml`, `rocm`, `coreml`. None are required for CPU inference; each requires its vendor SDK at build time. ## Verification - `cargo test --lib --all-features` — 138 lib tests, 0 failures - `cargo check --no-default-features` — bare wasm-friendly preproc surface compiles - `cargo check --no-default-features --features decoders` — pure preprocessing build (no `ort`, no `tokenizers`) - Integration suite vs `LiquidAI/LFM2.5-VL-450M-ONNX` (HEAD revision): 9/9 tests pass, including a cross-engine `ImageAnalysis` comparison against the airport thumbnails shared with `qwen3-vl`. - Codex adversarial review approved (round 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… mismatch
The Windows CI test job fails to link with:
libesaxx_rs … error LNK2038: mismatch detected for 'RuntimeLibrary':
value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease'
in libort_sys
fatal error LNK1319: 1 mismatches detected
`esaxx-rs` is the C++ suffix-array trainer for tokenizers' Unigram
training. It's pulled in by tokenizers' default `esaxx_fast` feature
and built with the static CRT `/MT`, which conflicts with `ort_sys`
built with the dynamic CRT `/MD`.
lfm only uses `tokenizers::Tokenizer::from_file` + encode/decode at
inference time — the trainer paths are dead code for us. Drop
default features (`progressbar`, `onig`, `esaxx_fast`) and re-enable
just `fancy-regex`, the pure-Rust regex backend that JSON-defined
BPE tokenizers need at runtime. This matches the feature set the
sibling `toktrie_hf_tokenizers` 1.7 already uses on tokenizers 0.21
transitively, so the dep tree stays consistent.
138/138 lib tests pass locally with the new feature set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
uqio
added a commit
that referenced
this pull request
May 10, 2026
uqio
added a commit
that referenced
this pull request
May 10, 2026
uqio
added a commit
that referenced
this pull request
May 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First publishable cut of the crate. Rust ONNX Runtime port of LiquidAI/LFM2.5-VL-450M, with schema-constrained sampling via llguidance and the full engine-agnostic
llmtask::Tasksurface.Three layers
lfm::Engine— sync, single-threaded; built onort2.0.Engine::generate(messages, images, opts)is the unconstrained free-form path.Engine::run<T: llmtask::Task>(task, messages, images, opts)is the constrained path: anyTaskwhoseGrammaris JSON Schema, Lark, or Regex routes through llguidance and the result is decoded by the task'sparseimpl.lfm::ImageAnalysisTask— built-in image-analysis preset that produces the canonicalllmtask::ImageAnalysisoutput type, sharing the schema and resilient parser withqwen3-vl.lfm::preproc— wasm-friendly image preprocessing surface (Preprocessor,TileGrid, EXIF-aware decode helpers). Compiles under--no-default-features --features decodersfor use in contexts that don't need the inference runtime.llmtask-driven generic engine
The whole inference path takes
&impl Task<Value = Value>, so aTaskwritten once against thellmtaskcontract runs throughlfm(llguidance) andqwen3-vl(mistralrs) without translation. Because lfm's backend is llguidance, all threeGrammarvariants (JSON Schema, Lark, Regex) are accepted; engines that only speak JSON Schema reject the others viaUnsupportedGrammarand the caller can route to lfm.Strict from_dir + bundled escape hatch
Engine::from_dirbyte-validates the suppliedtokenizer.json,chat_template.jinja,preprocessor_config.json, andtext_config.max_position_embeddingsfield ofconfig.jsonagainst the bundled blobs. A model directory whose ONNX shapes pass but whose tokenizer/template/preprocessor drifted would silently corrupt prompts; this fail-closed check forces the drift into a clear load-time error.Engine::from_onnx_dir(underbundledfeature) accepts an ONNX-only directory; the bundled tokenizer / chat template / configs are written to a temp file on first use.Engine::from_pathsis the unchecked escape hatch for advanced callers pairing custom tokenizers with custom ONNX.Architecture
Per-image vision encoding → text+image embedding splice → hybrid KV/conv-state cache decoder loop → optional schema-constrained sampling.
vision_encoder.onnxembed_tokens.onnxdecoder_model_merged.onnxDecodermanages the non-contiguous cache layout transparently.Compatibility with the current LFM2.5-VL-450M-ONNX exports
Two fixes were needed against the published HF repo:
pixel_values. The SigLIP2 NaFlexpos_embedResize-target is computed as(max_h, max_w) = ReduceMax(spatial_shapes, axis=0)per axis and reshaped to[max_h * max_w, dim]. Sopixel_values.shape[1]must equalmax_h * max_w(cross-axis product), not the per-entrymax(h * w).flatten_to_patchesnow pads accordingly; per-entryspatial_shapesandpixel_attention_maskstill describe each entry's actual layout.ort 2.0'sTensor::from_arrayrejects any zero-dim shape withInvalid dimension #N; all dimensions must be >= 1. This brokeDecoder::new_cacheinitialising the empty[1, 8, 0, 64]attn cache. Routed the empty cache throughTensor::<f32>::new(allocator, shape)(ONNX Runtime allocator path), which accepts zero-element shapes.Admission-control DoS guards
Bounded request-shape cap (max messages, max content parts), text-size cap, image-count lower bound from
min_image_tokens, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer'sadded_vocabulary. All run BEFORE any image decode or template render.Hardware backends
ORT execution providers gated behind feature flags:
cuda,tensorrt,directml,rocm,coreml. None are required for CPU inference; each requires its vendor SDK at build time.Test plan
cargo test --lib --all-features— 138 lib tests, 0 failurescargo check --no-default-features— bare wasm-friendly preproc surface compilescargo check --no-default-features --features decoders— pure preprocessing build (noort, notokenizers)LiquidAI/LFM2.5-VL-450M-ONNX(HEAD revision): 9/9 tests pass, including a cross-engineImageAnalysiscomparison against the airport thumbnails shared withqwen3-vl.🤖 Generated with Claude Code