Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 2.03 KB

File metadata and controls

39 lines (27 loc) · 2.03 KB

How to Use

PiD is NVIDIA's Pixel Diffusion Decoder. It replaces the usual VAE decode or decode-then-upscale path with a pixel-space diffusion decoder conditioned on a source latent and text prompt.

In stable-diffusion.cpp, PiD currently runs as an image edit pipeline: provide a reference image with -r/--ref-image, encode that image with a matching VAE, then let the PiD diffusion model decode/upscale directly to RGB.

Download weights

The official PiD model card should be checked before use. At the time of the initial PiD release, the official weights are under the NSCLv1 non-commercial license.

Examples

.\bin\Release\sd-cli.exe --diffusion-model ..\..\ComfyUI\models\diffusion_models\pid_flux1_512_to_2048_4step_bf16.safetensors --llm "..\..\ComfyUI\models\text_encoders\gemma_2_2b_it_elm_bf16.safetensors" --vae ..\..\ComfyUI\models\vae\ae.sft --vae-format flux --cfg-scale 1.0  -p "a lovely cat" -r ..\assets\ernie_image\turbo_example.png --diffusion-fa -v --steps 4 -H 2048 -W 2048 --rng cpu

Before:

ERNIE-Image Turbo example

After: PiD example

Notes

  • -r/--ref-image is required. PiD uses the first reference image as the source latent condition.
  • --vae-format should match the VAE latent layout used by the PiD checkpoint. This is important when using standalone VAE files because the PiD diffusion checkpoint alone does not identify the VAE format.