Skip to content

feat(Model Support): add Krea-2-Turbo model + LoRA support (WIP)#9304

Draft
Pfannkuchensack wants to merge 3 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/krea2-turbo-support
Draft

feat(Model Support): add Krea-2-Turbo model + LoRA support (WIP)#9304
Pfannkuchensack wants to merge 3 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/krea2-turbo-support

Conversation

@Pfannkuchensack

Copy link
Copy Markdown
Collaborator

Summary

Integrate Krea-2-Turbo (krea/Krea-2-Turbo) text-to-image per NEW_MODEL_INTEGRATION.md: Krea2Transformer2DModel (single-stream MMDiT, ~12B) + Qwen3-VL text encoder (12-layer hidden-state tap → 4D prompt_embeds) + reused Qwen-Image VAE + FlowMatchEulerDiscreteScheduler. Turbo is distilled (is_distilled=true → fixed mu=1.15, 8 steps, CFG off by default).

WIP: requires diffusers main (>= 0.39 dev) for Krea2Transformer2DModel; pyproject.toml temporarily pins diffusers to git main. Flip to the stable release containing Krea-2 once it ships, then un-draft.

Related Issues / Discussions

Depends on Krea2Transformer2DModel / Krea2Pipeline landing in a stable diffusers release (currently diffusers main only).

QA Instructions

  1. Dependency: install diffusers main into the venv — uv pip install "git+https://github.com/huggingface/diffusers.git"; confirm python -c "from diffusers import Krea2Transformer2DModel, Krea2Pipeline".
  2. Install model: point InvokeAI at a Krea-2-Turbo diffusers folder. Confirm it probes as main / diffusers / krea-2 / krea2_turbo and the Qwen3-VL encoder as qwen3_vl_encoder.
  3. Generate (txt2img): 1024², 8 steps, cfg 1.0. Enable FP8 in the model's Default Settings (24 GB cards). Confirm an image renders.
  4. LoRA: load a Krea-2 LoRA (diffusers PEFT). Confirm it probes as lora.lycoris.krea-2 (not qwen-image) and applies; confirm 1024² + LoRA + fp8 no longer OOMs (partial loading enabled).
  5. CFG: at cfg 1.0 confirm no negative prompt is run and metadata recall shows no negative prompt; at cfg > 1 confirm negative conditioning is used.
  6. Enhancers (Advanced Options under CFG Scale): with both off, output matches stock. Enable Conditioning Rebalance → stronger prompt adherence; enable Seed Variance → meaningfully different images across seeds. Confirm tooltips render.
  7. img2img / inpaint / outpaint: round-trip an init image.

Merge Plan

Draft until diffusers ships Krea-2 in a stable release. Before merging: flip the diffusers pin in pyproject.toml from git main to that release and update uv.lock. Isolate the pyproject.toml/uv.lock change so it's easy to review/revert. No DB schema changes.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Integrate Krea-2-Turbo (krea/Krea-2-Turbo) text-to-image per
NEW_MODEL_INTEGRATION.md: Krea2Transformer2DModel (single-stream MMDiT)
+ Qwen3-VL text encoder (12-layer hidden-state tap, 4D prompt_embeds)
+ reused Qwen-Image VAE + FlowMatchEulerDiscrete scheduler.

Backend:
- taxonomy: BaseModelType.Krea2, ModelType/ModelFormat.Qwen3VLEncoder,
  Krea2VariantType (Turbo = "krea2_turbo" to avoid Z-Image collision)
- config probes: Main_Diffusers/Checkpoint_Krea2, Qwen3VLEncoder,
  LoRA_LyCORIS_Krea2 (text_fusion/time_mod_proj signature; excluded
  from the Qwen-Image probe to avoid double-match)
- loaders for the diffusers pipeline + standalone Qwen3-VL encoder,
  with runtime workarounds for the HF model's version mismatches
  (AutoTokenizer, extra_special_tokens={}, rope_parameters->rope_scaling)
- native sampling (pack/unpack, position_ids, linear-mu shift) and
  hand-written Euler denoise loop; reuses qwen_image l2i/i2l
- invocations: model_loader, text_encoder, denoise, lora_loader, plus
  two ecosystem enhancers (conditioning rebalance, seed variance)
- LoRA conversion for diffusers PEFT (lora_transformer- prefix)

Frontend:
- 'krea-2' base + qwen3_vl_encoder type/format across model maps,
  buildKrea2Graph, addKrea2LoRAs, graph-builder denoise/base lists,
  optimal dimension 1024, regenerated schema.ts

Fixes:
- estimate transformer working memory in krea2_denoise so the cache
  reserves activation headroom and offloads more model under partial
  loading; fixes fp8 + LoRA OOM at 1024 (model was placed before LoRA
  patches were applied, leaving no room for their activations)

WIP: requires diffusers main (>=0.39 dev) for Krea2Transformer2DModel;
pyproject.toml temporarily pins diffusers to git main.
@github-actions github-actions Bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-deps PRs that change python dependencies labels Jun 25, 2026
@lstein

lstein commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Amazing! I was just thinking of working on this myself and you did it for me!

Allow non-diffusers Krea-2 transformers (GGUF/fp8) to run with standalone
single-file VAE + Qwen3-VL encoder, fixing several blockers found in testing.

- buildKrea2Graph: drop the hard "requires Diffusers-format" assert; instead
  require both a VAE and a Qwen3-VL encoder to be selected when the transformer
  is not diffusers (mirrors readiness.ts).
- Qwen3-VL encoder remap: handle both single-file key conventions — implicit
  (model.layers.*) and explicit (model.language_model.*). The old blind
  model.* -> language_model.* turned the bf16 file's keys into
  language_model.language_model.* (398 meta tensors -> "Cannot copy out of meta
  tensor" crash). Both files now load 0 missing / 0 unexpected / 0 meta.
- Qwen3-VL tokenizer/config: broaden the offline-cache fallback from OSError to
  Exception so a partial HF cache (config present, vocab missing) re-fetches
  instead of dying with TypeError.
- Qwen3-VL encoder fp8: keep an fp8 source checkpoint fp8-resident with
  per-layer upcast (storage float8_e4m3fn, compute bf16) instead of dequantizing
  to bf16. Halves resident VRAM (~8.9GB -> ~4.4GB), avoiding partial-load
  thrashing alongside a large transformer. Auto-enabled for fp8 sources on CUDA;
  bf16 files stay bf16.
- Qwen-Image VAE: a native-layout qwen_image_vae single file is classified with
  the Anima base and loaded as AutoencoderKLWan, but the qwen l2i/i2l nodes need
  AutoencoderKLQwenImage. Add backend/krea2/vae_compat.py::as_qwen_image_vae to
  reinterpret a Wan VAE as AutoencoderKLQwenImage (state dicts are identical,
  194/194 keys); both qwen VAE nodes use it. Idempotent for real QwenImage VAEs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files DO NOT MERGE frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root services PRs that change app services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants