Qwen3 TTS 1.7B
Will it run on your hardware?
Pick your GPU memory - see which quantizations fit, and the cheapest card for the rest
Need an exact figure for your context length? Use the VRAM calculator.
Run it locally
Copy-paste - running in under a minute
vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoiceNew to this? Start with Ollama · serve to many users with vLLM.
Deep dive
Notes, sources, and the full write-up
Qwen3 TTS 1.7B
Qwen3 TTS 1.7B is Alibaba's state-of-the-art text-to-speech model with 1.87 million monthly downloads. It supports 10 languages with voice cloning, voice design, and ultra-low 97ms streaming latency. Apache 2.0 licensed.
Key features
- 10 languages - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
- Voice cloning - 3-second reference audio to clone any voice
- Voice design - natural language voice descriptions
- 97ms latency - extreme low-latency streaming
- 9 premium speakers - diverse built-in voices
- vLLM support - production deployment via vLLM-Omni
- 3 model variants - CustomVoice, VoiceDesign, Base (clone)
Model variants
| Model | Params | Best For |
|---|---|---|
| CustomVoice | 1.7B | 9 premium speakers with style control |
| VoiceDesign | 1.7B | Natural language voice design |
| Base | 1.7B | 3-second voice cloning |
| CustomVoice (0.6B) | 0.6B | Lightweight, same 9 speakers |
| Base (0.6B) | 0.6B | Lightweight voice cloning |
Quick start
pip install -U qwen-ttsfrom qwen_tts import Qwen3TTSModel
import torch
import soundfile as sf
model = Qwen3TTSModel.from_pretrained(
"Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
device_map="cuda:0",
dtype=torch.bfloat16,
)
wavs, sr = model.generate_custom_voice(
text="Hello! This is Qwen3 TTS speaking.",
language="English",
speaker="Ryan",
)
sf.write("output.wav", wavs[0], sr)When to use
- Voice assistants - low-latency streaming TTS
- Content creation - multilingual voiceovers
- Accessibility - screen readers and narration
- Audiobooks - emotional tone control
- Voice cloning - personalized voices
Frequently asked
Quick answers to common questions
How much VRAM does Qwen3 TTS 1.7B need?
Qwen3 TTS 1.7B with 2B parameters needs approximately 4 GB at Q4_K_M quantization. Use our VRAM calculator for an exact estimate.
Is Qwen3 TTS 1.7B better than other Qwen models?
Qwen3 TTS 1.7B has 2B parameters with 8,192 context - a strong choice for text-to-speech, voice-cloning, voice-design.
What license is Qwen3 TTS 1.7B under?
Qwen3 TTS 1.7B is released under the Apache 2.0 license, making it suitable for most commercial and personal projects.
What hardware runs Qwen3 TTS 1.7B well?
With 2B parameters, Qwen3 TTS 1.7B requires adequate VRAM. High-end GPUs like the RTX 4090 (24GB), RTX 5090 (32GB), or Mac Studio with unified memory are good options. Check our hardware directory for specific recommendations.
What is the best quantization for Qwen3 TTS 1.7B?
Q4_K_M is the recommended sweet spot - ~98% of FP16 quality at ~27% of the size. Step up to Q5_K_M or Q8_0 only if you have spare VRAM. Use our VRAM calculator to compare.
What models compete with Qwen3 TTS 1.7B?
Qwen3 TTS 1.7B competes with other models in its class. Browse our model directory for comparisons, benchmarks, and community reviews to find the best fit.
Compare & pair with
Similar models and recommended hardware
Related models
Nearby options
Similar models and compatible hardware by spec
Similar by size
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.