meituan-longcat/LongCat-Image-Edit
Bilingual (Chinese-English) image editing model from Meituan LongCat, served via vLLM-Omni
View on HuggingFaceGuide
Overview
LongCat-Image-Edit is the image-editing variant of LongCat-Image by Meituan. It supports bilingual (Chinese-English) editing instructions and is served via vLLM-Omni (not standard vLLM). It achieves SOTA performance among open-source image editing models.
Prerequisites
- Hardware: 1x GPU with >=40 GB VRAM
- vLLM-Omni (runs on top of vLLM 0.12.0)
- diffusers (latest from source)
- xformers (latest)
Installation
# Clone and install vllm-omni
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv venv
source .venv/bin/activate
uv pip install -e . vllm==0.12.0
# Update xformers to the latest version
uv pip install -U xformers --index-url https://download.pytorch.org/whl/cu128
# Update diffusers to the latest version
git clone https://github.com/huggingface/diffusers.git
cd diffusers
uv pip install -e .
Usage
cd vllm-omni
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--image qwen_bear.png \
--prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. Position the bear standing in front of the art board as if painting." \
--output output_image_edit.png \
--num_inference_steps 50 \
--guidance_scale 4.5 \
--seed 42 \
--model meituan-longcat/LongCat-Image-Edit \
--cache_backend cache_dit \
--cache_dit_max_continuous_cached_steps 2