vLLM/Recipes
Stability AI

stabilityai/stable-audio-open-1.0

Text-to-audio generation model (1.2B params) producing up to ~47 s stereo audio at 44.1 kHz, served via vLLM-Omni

View on HuggingFace
dense1.2B0 ctxvLLM 0.14.1+omni
Guide

Overview

Stable Audio Open 1.0 is Stability AI's text-to-audio generation model (~1.2B parameters). It produces stereo audio at 44.1 kHz, up to ~47 seconds. Served via vLLM-Omni (not standard vLLM).

Limitations:

  • No realistic vocals (no singing or speech).
  • English-only training data.
  • Better at sound effects than complex music.

Prerequisites

  • vLLM-Omni on top of vLLM 0.14.1
  • soundfile or scipy for saving audio

Installation

uv venv
source .venv/bin/activate
uv pip install vllm==0.14.1
uv pip install git+https://github.com/vllm-project/vllm-omni.git

# Audio saving
uv pip install soundfile

Python Usage

import torch
import soundfile as sf
from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="stabilityai/stable-audio-open-1.0")
generator = torch.Generator(device="cuda").manual_seed(42)

audio = omni.generate(
    "The sound of a dog barking",
    negative_prompt="Low quality.",
    generator=generator,
    guidance_scale=7.0,
    num_inference_steps=100,
    extra={"audio_start_in_s": 0.0, "audio_end_in_s": 10.0},
)

audio_data = audio[0].cpu().float().numpy().T  # [samples, channels]
sf.write("output.wav", audio_data, 44100)

CLI Usage (from vLLM-Omni repo)

python examples/offline_inference/text_to_audio/text_to_audio.py \
  --model stabilityai/stable-audio-open-1.0 \
  --prompt "The sound of a dog barking" \
  --audio-length 10.0 \
  --num-inference-steps 100 \
  --guidance-scale 7.0 \
  --output dog_barking.wav

Key Parameters

ParameterDefaultDescription
audio_start_in_s0.0Start time in seconds
audio_end_in_s10.0End time in seconds
num_inference_steps100Denoising steps (higher = better quality, slower)
guidance_scale7.0Classifier-free guidance scale
negative_prompt"Low quality."Text to avoid
num_waveforms1Samples per prompt
sample_rate44100Output sample rate (Hz)

License

Released under the Stability AI Community License. Commercial use requires a separate license.

References