Qwen2.5 72B
Standard benchmarks
Performance across standard evaluations
| Benchmark | Score |
|---|---|
| MMLU | 85 |
| HumanEval | 82 |
| MT-Bench | 8.6 |
| GSM8K | 92 |
Will it run on your hardware?
Pick your GPU memory - see which quantizations fit, and the cheapest card for the rest
Need an exact figure for your context length? Use the VRAM calculator.
Run it locally
Copy-paste - running in under a minute
ollama run qwen2.5:72bvllm serve Qwen/Qwen2.5-72B-InstructNew to this? Start with Ollama · serve to many users with vLLM.
Deep dive
Notes, sources, and the full write-up
Qwen2.5 72B
Qwen2.5 72B is Alibaba's largest dense model (excluding MoE variants). With 72.7 billion parameters and 128K context, it rivals Llama 3.3 70B while offering Apache 2.0 licensing.
Key features
- 85 MMLU - top-tier general knowledge
- 128K context - full document support
- Apache 2.0 - less restrictive than Llama license
- Extensive fine-tune ecosystem
VRAM math
| Quant | VRAM | Recommended Hardware |
|---|---|---|
| Q3_K_M | ~30 GB | RTX 5090 |
| Q4_K_M | ~40 GB | Dual RTX 3090 |
| Q5_K_M | ~48 GB | Dual RTX 4090 |
| Q8_0 | ~78 GB | Quad GPU server |
How to run
ollama run qwen2.5:72bFrequently asked
Quick answers to common questions
How much VRAM does Qwen2.5 72B need?
Qwen2.5 72B with 72B parameters needs approximately 40 GB at Q4_K_M quantization. Use our VRAM calculator for an exact estimate.
Is Qwen2.5 72B better than other Qwen models?
Qwen2.5 72B scores 85 on MMLU and 82 on HumanEval. It has 72B parameters with 131,072 context - a strong choice for max-quality, coding, general-purpose.
What license is Qwen2.5 72B under?
Qwen2.5 72B is released under the Apache 2.0 license, making it suitable for most commercial and personal projects.
What hardware runs Qwen2.5 72B well?
With 72B parameters, Qwen2.5 72B requires adequate VRAM. High-end GPUs like the RTX 4090 (24GB), RTX 5090 (32GB), or Mac Studio with unified memory are good options. Check our hardware directory for specific recommendations.
What is the best quantization for Qwen2.5 72B?
Q4_K_M is the recommended sweet spot - ~98% of FP16 quality at ~27% of the size. Q5_K_M (~48 GB) is an option if you have spare VRAM. Use our VRAM calculator to compare.
How long can Qwen2.5 72B's context window handle?
Qwen2.5 72B supports a 131,072-token context window - enough for very long documents, codebases, or multi-turn conversations. Real-world usable context may vary by implementation.
What models compete with Qwen2.5 72B?
Qwen2.5 72B competes with other 36B–108B. Browse our model directory for comparisons, benchmarks, and community reviews to find the best fit.
Compare & pair with
Similar models and recommended hardware
Related models
Nearby options
Similar models and compatible hardware by spec
Similar by size
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.