tencent/Hunyuan-A13B-Instruct
Tencent Hunyuan A13B instruct-tuned MoE language model with AITER-accelerated AMD ROCm deployment
View on HuggingFaceGuide
Overview
Hunyuan-A13B-Instruct is Tencent's instruct-tuned Hunyuan MoE model. This recipe
covers deployment on AMD ROCm GPUs (MI300X / MI325X / MI355X) with
AITER acceleration enabled via
VLLM_ROCM_USE_AITER=1.
Prerequisites
- vLLM version: ROCm build
- Python: 3.12
- Hardware: AMD MI300X / MI325X / MI355X
- ROCm: 7.0+, glibc >= 2.35 (or use Docker)
Install vLLM (ROCm)
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
If the environment does not meet the Python/ROCm/glibc requirements, use the Docker-based setup from the vLLM install docs.
Launching the Server
export VLLM_ROCM_USE_AITER=1
vllm serve tencent/Hunyuan-A13B-Instruct \
--tensor-parallel-size 2 \
--trust-remote-code
Benchmarking
vllm bench serve \
--model "tencent/Hunyuan-A13B-Instruct" \
--dataset-name random \
--random-input-len 8000 \
--random-output-len 1000 \
--request-rate 10000 \
--num-prompts 16 \
--ignore-eos
Troubleshooting
- First launch delay: AITER JIT-compiles optimized kernels on first launch, which can take several minutes. Subsequent runs use cached kernels.
- Environment mismatch: If wheel install fails, fall back to the vLLM ROCm Docker image.