tencent/Hunyuan-A13B-Instruct

Tencent Hunyuan A13B instruct-tuned MoE language model with AITER-accelerated AMD ROCm deployment

moe80B / 13B32,768 ctxvLLM 0.11.0+text

Guide

Overview

Hunyuan-A13B-Instruct is Tencent's instruct-tuned Hunyuan MoE model. This recipe covers deployment on AMD ROCm GPUs (MI300X / MI325X / MI355X) with AITER acceleration enabled via VLLM_ROCM_USE_AITER=1.

Prerequisites

vLLM version: ROCm build
Python: 3.12
Hardware: AMD MI300X / MI325X / MI355X
ROCm: 7.0+, glibc >= 2.35 (or use Docker)

Install vLLM (ROCm)

uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/

If the environment does not meet the Python/ROCm/glibc requirements, use the Docker-based setup from the vLLM install docs.

Launching the Server

export VLLM_ROCM_USE_AITER=1
vllm serve tencent/Hunyuan-A13B-Instruct \
    --tensor-parallel-size 2 \
    --trust-remote-code

Benchmarking

vllm bench serve \
  --model "tencent/Hunyuan-A13B-Instruct" \
  --dataset-name random \
  --random-input-len 8000 \
  --random-output-len 1000 \
  --request-rate 10000 \
  --num-prompts 16 \
  --ignore-eos

Troubleshooting