PaddlePaddle/PaddleOCR-VL
PaddleOCR-VL (0.9B) — compact vision-language model for document parsing, OCR, tables, formulas, charts
View on HuggingFaceGuide
Overview
PaddleOCR-VL is a SOTA resource-efficient model for document parsing. Its core (PaddleOCR-VL-0.9B) combines a NaViT-style dynamic resolution visual encoder with an ERNIE-4.5-0.3B language model, optimized for OCR, tables, formulas, and chart recognition.
Prerequisites
- Hardware: 1x GPU (small VRAM footprint)
- vLLM >= 0.11.1 (nightly if not released yet)
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --pre \
--extra-index-url https://wheels.vllm.ai/nightly \
--extra-index-url https://download.pytorch.org/whl/cu129 \
--index-strategy unsafe-best-match
Launch command
vllm serve PaddlePaddle/PaddleOCR-VL \
--trust-remote-code \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0
Tip: OCR workloads don't benefit much from prefix caching or image reuse, so disable those to avoid hashing/caching overhead.
Client Usage
Task-specific prompts:
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)
TASKS = {
"ocr": "OCR:",
"table": "Table Recognition:",
"formula": "Formula Recognition:",
"chart": "Chart Recognition:",
}
response = client.chat.completions.create(
model="PaddlePaddle/PaddleOCR-VL",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://.../receipt.png"}},
{"type": "text", "text": TASKS["ocr"]},
],
}],
temperature=0.0,
)
print(response.choices[0].message.content)
Offline Inference with PP-DocLayoutV2
Use separate venvs for vllm and paddlepaddle to avoid conflicts. If you see
"The model PaddleOCR-VL-0.9B does not exist.", add --served-model-name PaddleOCR-VL-0.9B.
uv pip install paddlepaddle-gpu==3.2.1 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
uv pip install -U "paddleocr[doc-parser]"
uv pip install safetensors
from paddleocr import PaddleOCRVL
pipeline = PaddleOCRVL(
vl_rec_backend="vllm-server",
vl_rec_server_url="http://localhost:8000/v1",
layout_detection_model_name="PP-DocLayoutV2",
layout_detection_model_dir="/path/to/your/PP-DocLayoutV2/",
)
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
for i, res in enumerate(output):
res.save_to_json(save_path=f"output_{i}.json")
res.save_to_markdown(save_path=f"output_{i}.md")