vLLM/Recipes
InternVL (OpenGVLab)

OpenGVLab/InternVL3_5-8B

InternVL 3.5 vision-language models from Shanghai AI Lab with thinking-mode prompting

View on HuggingFace
dense8B40,960 ctxvLLM 0.10.0+multimodal
Guide

Overview

InternVL3.5 is a vision-language model developed by Shanghai AI Laboratory. It supports single-image and multi-image prompts, plus an optional "thinking mode" via a custom system prompt.

Prerequisites

  • Hardware: 1x GPU with >=20 GB VRAM (A100, L40S, H100, etc.)
  • vLLM >= 0.10.0

Install vLLM (CUDA)

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

Install vLLM (AMD ROCm MI300X/MI325X/MI355X)

uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700

Launch command

vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code

On AMD:

export VLLM_ROCM_USE_AITER=1
vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code

Client Usage

Single image:

from openai import OpenAI
client = OpenAI(api_key="", base_url="http://0.0.0.0:8000/v1")
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the image."},
            {"type": "image_url", "image_url": {"url": "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg"}},
        ],
    }],
    temperature=0.0,
)
print(response.choices[0].message.content)

Thinking Mode

Set a thinking system prompt and use temperature=0.6 to mitigate repetition:

THINKING_SYSTEM_PROMPT = """
You are an AI assistant that rigorously follows this response protocol:

1. First, conduct a detailed analysis of the question. Consider different angles, potential
solutions, and reason through the problem step-by-step. Enclose this entire thinking process
within <think> and </think> tags.

2. After the thinking section, provide a clear, concise, and direct answer to the user's
question. Separate the answer from the think section with a newline.
""".strip()

References