Qwen/Qwen3-ASR-1.7B
Speech-to-text model supporting 11 languages, multiple accents, and singing voice with customizable text-context prompting.
View on HuggingFaceGuide
Overview
Qwen3-ASR is a speech-to-text model that achieves accurate and robust recognition across 11 languages and multiple accents. It supports prompting the model with text context in any format to produce customized ASR results and performs well on singing-voice recognition. This guide demonstrates how to deploy Qwen3-ASR efficiently with vLLM.
Prerequisites
Install vLLM with audio dependencies:
uv venv
source .venv/bin/activate
uv pip install -U vllm --pre \
--extra-index-url https://wheels.vllm.ai/nightly/cu129 \
--extra-index-url https://download.pytorch.org/whl/cu129 \
--index-strategy unsafe-best-match
uv pip install "vllm[audio]"
Launching with vLLM
vllm serve Qwen/Qwen3-ASR-1.7B
Client Usage
Chat Completions (OpenAI SDK)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="Qwen/Qwen3-ASR-1.7B",
messages=[{
"role": "user",
"content": [{
"type": "audio_url",
"audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"},
}],
}],
)
print(response.choices[0].message.content)
Transcription API
import httpx
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
audio_file = httpx.get(audio_url).content
transcription = client.audio.transcriptions.create(
model="Qwen/Qwen3-ASR-1.7B",
file=audio_file,
)
print(transcription.text)
cURL
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": [
{"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"}}
]}
]
}'
Offline Inference
from vllm import LLM, SamplingParams
from vllm.assets.audio import AudioAsset
llm = LLM(model="Qwen/Qwen3-ASR-1.7B")
audio_asset = AudioAsset("winning_call")
conversation = [{
"role": "user",
"content": [{"type": "audio_url", "audio_url": {"url": audio_asset.url}}],
}]
sampling_params = SamplingParams(temperature=0.01, max_tokens=256)
outputs = llm.chat(conversation, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
Troubleshooting
- Make sure
vllm[audio]extras are installed or audio requests will fail. - Use the nightly wheel until Qwen3-ASR support lands in the stable release.