Qwen2.5 72B
QwenFeaturedApache 2.0text

Qwen2.5 72B

Updated Jun 7, 2026
Parameters
72B
Context
131,072
License
Apache 2.0
Updated
Jun 7, 2026

Standard benchmarks

Performance across standard evaluations

BenchmarkScore
MMLU85
HumanEval82
MT-Bench8.6
GSM8K92

Will it run on your hardware?

Pick your GPU memory - see which quantizations fit, and the cheapest card for the rest

Too big for 24 GB at any quant
0 of 4 quantizations fit Qwen2.5 72B with real runtime overhead.

Need an exact figure for your context length? Use the VRAM calculator.

Run it locally

Copy-paste - running in under a minute

Ollamaeasiest
ollama run qwen2.5:72b
vLLMOpenAI-compatible API
vllm serve Qwen/Qwen2.5-72B-Instruct

New to this? Start with Ollama · serve to many users with vLLM.

Deep dive

Notes, sources, and the full write-up

Qwen2.5 72B

Qwen2.5 72B is Alibaba's largest dense model (excluding MoE variants). With 72.7 billion parameters and 128K context, it rivals Llama 3.3 70B while offering Apache 2.0 licensing.

Key features

  1. 85 MMLU - top-tier general knowledge
  2. 128K context - full document support
  3. Apache 2.0 - less restrictive than Llama license
  4. Extensive fine-tune ecosystem

VRAM math

QuantVRAMRecommended Hardware
Q3_K_M~30 GBRTX 5090
Q4_K_M~40 GBDual RTX 3090
Q5_K_M~48 GBDual RTX 4090
Q8_0~78 GBQuad GPU server

How to run

ollama run qwen2.5:72b

Frequently asked

Quick answers to common questions

How much VRAM does Qwen2.5 72B need?

Qwen2.5 72B with 72B parameters needs approximately 40 GB at Q4_K_M quantization. Use our VRAM calculator for an exact estimate.

Is Qwen2.5 72B better than other Qwen models?

Qwen2.5 72B scores 85 on MMLU and 82 on HumanEval. It has 72B parameters with 131,072 context - a strong choice for max-quality, coding, general-purpose.

What license is Qwen2.5 72B under?

Qwen2.5 72B is released under the Apache 2.0 license, making it suitable for most commercial and personal projects.

What hardware runs Qwen2.5 72B well?

With 72B parameters, Qwen2.5 72B requires adequate VRAM. High-end GPUs like the RTX 4090 (24GB), RTX 5090 (32GB), or Mac Studio with unified memory are good options. Check our hardware directory for specific recommendations.

What is the best quantization for Qwen2.5 72B?

Q4_K_M is the recommended sweet spot - ~98% of FP16 quality at ~27% of the size. Q5_K_M (~48 GB) is an option if you have spare VRAM. Use our VRAM calculator to compare.

How long can Qwen2.5 72B's context window handle?

Qwen2.5 72B supports a 131,072-token context window - enough for very long documents, codebases, or multi-turn conversations. Real-world usable context may vary by implementation.

What models compete with Qwen2.5 72B?

Qwen2.5 72B competes with other 36B–108B. Browse our model directory for comparisons, benchmarks, and community reviews to find the best fit.

Compare & pair with

Similar models and recommended hardware

Nearby options

Similar models and compatible hardware by spec

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.