Skip to content

perf: keep chunk-K residency engaged with runtime LoRA#1598

Merged
leejet merged 2 commits into
leejet:masterfrom
fszontagh:feature/streaming-lora-chunk-k
Jun 3, 2026
Merged

perf: keep chunk-K residency engaged with runtime LoRA#1598
leejet merged 2 commits into
leejet:masterfrom
fszontagh:feature/streaming-lora-chunk-k

Conversation

@fszontagh
Copy link
Copy Markdown
Contributor

Summary

Re-enable chunk-K residency for the runtime LoRA path. Two related fixes in compute_streaming_segments / resolve_graph_cut_plan:

  • Drop the weight_adapter != nullptr bypass. Runtime LoRA composes weight + diff in the compute graph via ggml_add; the resident weight is never mutated, so the cached GPU copy stays valid across sampling steps.
  • Add the resident buffer size back to free_vram before clamping the streaming budget. Otherwise chunk-K's own allocation is read as "taken by someone else", the budget shrinks step-to-step, and the resident set rebuilds every step instead of every generation.

Related Issue / Discussion

Follow-up to #1576 (--stream-layers). Closes the LoRA-path perf gap left there.

Additional Information

Z-Image bf16, 512x512, 8 steps, --offload-to-cpu --stream-layers --max-vram 8 on RTX 3060:

Config Before After Delta
LoRA 0.8 46.78 s 37.75 s -19%
LoRA 1.5 45.83 s 37.14 s -19%
no LoRA 44.64 s 34.72 s -22%

LoRA multiplier scaling unchanged (0.8 vs 1.5 mean pixel diff 26.17 -> 26.05).

Checklist

@fszontagh fszontagh changed the title Keep chunk-K residency engaged with runtime LoRA perf: keep chunk-K residency engaged with runtime LoRA Jun 2, 2026
@leejet leejet merged commit a7f2e03 into leejet:master Jun 3, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants