OpenGVLab/InternVL3_5-8B
InternVL 3.5 vision-language models from Shanghai AI Lab with thinking-mode prompting
View on HuggingFaceGuide
Overview
InternVL3.5 is a vision-language model developed by Shanghai AI Laboratory. It supports single-image and multi-image prompts, plus an optional "thinking mode" via a custom system prompt.
Prerequisites
- Hardware: 1x GPU with >=20 GB VRAM (A100, L40S, H100, etc.)
- vLLM >= 0.10.0
Install vLLM (CUDA)
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
Install vLLM (AMD ROCm MI300X/MI325X/MI355X)
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
Launch command
vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code
On AMD:
export VLLM_ROCM_USE_AITER=1
vllm serve OpenGVLab/InternVL3_5-8B --trust-remote-code
Client Usage
Single image:
from openai import OpenAI
client = OpenAI(api_key="", base_url="http://0.0.0.0:8000/v1")
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image."},
{"type": "image_url", "image_url": {"url": "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg"}},
],
}],
temperature=0.0,
)
print(response.choices[0].message.content)
Thinking Mode
Set a thinking system prompt and use temperature=0.6 to mitigate repetition:
THINKING_SYSTEM_PROMPT = """
You are an AI assistant that rigorously follows this response protocol:
1. First, conduct a detailed analysis of the question. Consider different angles, potential
solutions, and reason through the problem step-by-step. Enclose this entire thinking process
within <think> and </think> tags.
2. After the thinking section, provide a clear, concise, and direct answer to the user's
question. Separate the answer from the think section with a newline.
""".strip()