vLLM/Recipes
Qwen

Qwen/Qwen-Image

Text-to-image diffusion model (20B parameters) from the Qwen-Image family, served via vLLM-Omni.

View on HuggingFace
dense20B0 ctxvLLM 0.18.0+omni
Guide

Overview

Qwen-Image is a diffusion-based text-to-image model. This recipe documents the Qwen-Image family served via vLLM-Omni:

ModelHuggingFaceDescription
Qwen-ImageQwen/Qwen-ImageText-to-image (20B, Aug 2025)
Qwen-Image-2512Qwen/Qwen-Image-2512Updated T2I with enhanced realism (Dec 2025)
Qwen-Image-EditQwen/Qwen-Image-EditSingle-image editing (Aug 2025)
Qwen-Image-Edit-2509Qwen/Qwen-Image-Edit-2509Multi-image editing (Sep 2025)
Qwen-Image-Edit-2511Qwen/Qwen-Image-Edit-2511Enhanced consistency + built-in LoRA (Nov 2025)
Qwen-Image-LayeredQwen/Qwen-Image-LayeredDecomposes input into RGBA layers (Dec 2025)

All models share the same DiT transformer core — acceleration methods are applicable across the entire series.

Prerequisites

git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv venv
source .venv/bin/activate
uv pip install -e . vllm==0.18.0

Usage

Text-to-Image

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image \
    --prompt "a cup of coffee on the table" \
    --output output_qwen_image.png \
    --num-inference-steps 50 \
    --cfg-scale 4.0

Image Editing (Qwen-Image-Edit)

python3 ./examples/offline_inference/image_to_image/image_edit.py \
    --model Qwen/Qwen-Image-Edit \
    --image qwen_bear.png \
    --prompt "Let this mascot dance under the moon, surrounded by floating stars" \
    --output output_image_edit.png \
    --num-inference-steps 50 \
    --cfg-scale 4.0

Layered RGBA Decomposition

python3 ./examples/offline_inference/image_to_image/image_edit.py \
    --model Qwen/Qwen-Image-Layered \
    --image input.png \
    --prompt "" \
    --output layered \
    --num-inference-steps 50 \
    --cfg-scale 4.0 \
    --layers 4 \
    --color-format "RGBA"

Acceleration

Pick one cache backend AND any supported parallel strategy.

Cache-DiT

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --cache-backend cache_dit

TeaCache

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --cache-backend tea_cache

Ulysses / Ring Sequence Parallelism

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --ulysses-degree 4
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --ring-degree 4

CFG Parallelism (2 GPUs, non-distilled models with cfg-scale > 1)

python3 ./examples/offline_inference/image_to_image/image_edit.py \
    --model Qwen/Qwen-Image-Edit --image qwen_bear.png --prompt "..." \
    --cfg-parallel-size 2 --num-inference-steps 50 --cfg-scale 4.0

Tensor Parallelism

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --tensor-parallel-size 2

CPU / Layerwise Offload (low VRAM)

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --enable-cpu-offload
python3 ./examples/offline_inference/image_to_image/image_edit.py \
    --model Qwen/Qwen-Image-Edit --image qwen_bear.png --prompt "..." \
    --enable-layerwise-offload

VAE Patch Parallelism

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." \
    --height 1536 --width 1536 \
    --ulysses-degree 2 --vae-patch-parallel-size 2

Must be combined with another parallel method.

Quantization (Qwen-Image / Qwen-Image-2512 only)

python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --quantization fp8 \
    --ignored-layers "img_mlp"
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
    --model Qwen/Qwen-Image --prompt "..." --quantization int8

Qwen-Image-Edit variants do not support quantization.

Configuration Tips

  • Cache + SP is the recommended combo for long-sequence generation.
  • Sequence parallelism (Ulysses / Ring) beats TP for high-res / long-sequence.
  • Tensor parallelism is most useful when model weights alone don't fit on one GPU.
  • CFG parallelism targets non-distilled diffusion with full CFG (not for guidance-distilled models).
  • To reduce peak VRAM, use CPU/layerwise offload and/or VAE patch parallelism.
  • TeaCache and Cache-DiT cannot be used together.
  • --enforce-eager disables torch.compile if needed.

See the Feature Support Table and Feature Compatibility Guide for combinations.

References