Qwen3 TTS 1.7B
QwenFeaturedApache 2.0audio

Qwen3 TTS 1.7B

Updated Jun 7, 2026
Parameters
2B
Context
8,192
License
Apache 2.0
Updated
Jun 7, 2026

Will it run on your hardware?

Pick your GPU memory - see which quantizations fit, and the cheapest card for the rest

Runs on your 24 GB - best at FP16
1 of 1 quantizations fit Qwen3 TTS 1.7B with real runtime overhead.
FP16
4 GB
fits tight too big

Need an exact figure for your context length? Use the VRAM calculator.

Run it locally

Copy-paste - running in under a minute

vLLMOpenAI-compatible API
vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

New to this? Start with Ollama · serve to many users with vLLM.

Deep dive

Notes, sources, and the full write-up

Qwen3 TTS 1.7B

Qwen3 TTS 1.7B is Alibaba's state-of-the-art text-to-speech model with 1.87 million monthly downloads. It supports 10 languages with voice cloning, voice design, and ultra-low 97ms streaming latency. Apache 2.0 licensed.

Key features

  1. 10 languages - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
  2. Voice cloning - 3-second reference audio to clone any voice
  3. Voice design - natural language voice descriptions
  4. 97ms latency - extreme low-latency streaming
  5. 9 premium speakers - diverse built-in voices
  6. vLLM support - production deployment via vLLM-Omni
  7. 3 model variants - CustomVoice, VoiceDesign, Base (clone)

Model variants

ModelParamsBest For
CustomVoice1.7B9 premium speakers with style control
VoiceDesign1.7BNatural language voice design
Base1.7B3-second voice cloning
CustomVoice (0.6B)0.6BLightweight, same 9 speakers
Base (0.6B)0.6BLightweight voice cloning

Quick start

pip install -U qwen-tts
from qwen_tts import Qwen3TTSModel
import torch
import soundfile as sf
 
model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)
 
wavs, sr = model.generate_custom_voice(
    text="Hello! This is Qwen3 TTS speaking.",
    language="English",
    speaker="Ryan",
)
sf.write("output.wav", wavs[0], sr)

When to use

  • Voice assistants - low-latency streaming TTS
  • Content creation - multilingual voiceovers
  • Accessibility - screen readers and narration
  • Audiobooks - emotional tone control
  • Voice cloning - personalized voices

Frequently asked

Quick answers to common questions

How much VRAM does Qwen3 TTS 1.7B need?

Qwen3 TTS 1.7B with 2B parameters needs approximately 4 GB at Q4_K_M quantization. Use our VRAM calculator for an exact estimate.

Is Qwen3 TTS 1.7B better than other Qwen models?

Qwen3 TTS 1.7B has 2B parameters with 8,192 context - a strong choice for text-to-speech, voice-cloning, voice-design.

What license is Qwen3 TTS 1.7B under?

Qwen3 TTS 1.7B is released under the Apache 2.0 license, making it suitable for most commercial and personal projects.

What hardware runs Qwen3 TTS 1.7B well?

With 2B parameters, Qwen3 TTS 1.7B requires adequate VRAM. High-end GPUs like the RTX 4090 (24GB), RTX 5090 (32GB), or Mac Studio with unified memory are good options. Check our hardware directory for specific recommendations.

What is the best quantization for Qwen3 TTS 1.7B?

Q4_K_M is the recommended sweet spot - ~98% of FP16 quality at ~27% of the size. Step up to Q5_K_M or Q8_0 only if you have spare VRAM. Use our VRAM calculator to compare.

What models compete with Qwen3 TTS 1.7B?

Qwen3 TTS 1.7B competes with other models in its class. Browse our model directory for comparisons, benchmarks, and community reviews to find the best fit.

Compare & pair with

Similar models and recommended hardware

Related models

Nearby options

Similar models and compatible hardware by spec

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.