GLM-5.1
Intelligence benchmarks
Artificial Analysis indexes - compared with the best open and proprietary models
Intelligence
51.4
AA Index
Coding
43.4
AA Index
Agentic
67.1
AA Index
Intelligence Index - GLM-5.1 vs. the field
Best open-weight models (you can run locally) and leading proprietary models for context.
Coding Index comparison
Agentic Index comparison
Benchmark data from Artificial Analysis · updated 2026-06-07.
Standard benchmarks
Performance across standard evaluations
| Benchmark | Score |
|---|---|
| MMLU | 86 |
| GSM8K | 95.3 |
| GPQA | 86.8 |
Will it run on your hardware?
Pick your GPU memory - see which quantizations fit, and the cheapest card for the rest
Need an exact figure for your context length? Use the VRAM calculator.
Run it locally
Copy-paste - running in under a minute
vllm serve zai-org/GLM-5.1New to this? Start with Ollama · serve to many users with vLLM.
Deep dive
Notes, sources, and the full write-up
GLM-5.1
GLM-5.1 is Z.ai's 744-billion parameter hybrid MoE flagship for agentic engineering. It ships under the cleanest MIT license of any frontier-weight model, achieves state-of-the-art on SWE-Bench Pro (58.4%), and is built to sustain productivity over hundreds of tool-calling rounds - the longer it runs, the better the result.
What makes GLM-5.1 special
- Cleanest MIT license in the frontier tier - no modifications, no extra conditions. True open.
- SWE-Bench Pro 58.4% - leads the leaderboard among open models
- Long-horizon agentic design - stays effective over hundreds of rounds and thousands of tool calls
- MCP integration built-in - native Model Context Protocol support
- 200K context - extended sessions with full project awareness
- GlmMoeDSA architecture - hybrid Gated DeltaNet linear attention + sparse MoE
- 15.9k HF likes - one of the most popular models on HuggingFace
Benchmarks
| Benchmark | GLM-5.1 | GLM-5 | Qwen3.6-Plus | MiniMax M2.7 | DeepSeek V4 Pro-Max |
|---|---|---|---|---|---|
| SWE-Bench Pro | 58.4 | 55.1 | 56.6 | 56.2 | - |
| AIME 2026 | 95.3 | 95.4 | 95.1 | 89.8 | - |
| GPQA-Diamond | 86.2 | 86.0 | 90.4 | 87.0 | 90.1 |
| HLE (w/ Tools) | 52.3 | 50.4 | 50.6 | - | - |
| Terminal-Bench 2.0 | 63.5 | 56.2 | 61.6 | - | - |
| NL2Repo | 42.7 | 35.9 | 37.9 | 39.8 | - |
| MCP-Atlas | 71.8 | 69.2 | 74.1 | 48.8 | - |
Source: Z.ai GLM-5.1 model card and technical report. Verified against independent benchmarks where available.
VRAM requirements
| Precision | VRAM | Recommended Hardware |
|---|---|---|
| FP8 | ~410 GB | NVIDIA L40S (x8), NVIDIA A6000 (x6) |
| Q4 (GGUF) | ~165 GB | NVIDIA A6000 (x3) |
GLM-5.1 requires enterprise GPU infrastructure. The 40B active parameters keep inference costs manageable relative to its 744B total size.
How to run
# Via vLLM (0.19.0+)
vllm serve zai-org/GLM-5.1 --port 8010 --tensor-parallel-size 8Community quotes
"GLM-5.1 has the cleanest MIT license of any frontier model. No modified terms, no extra conditions. Just weights."
- r/LocalLLaMA, 167 upvotes
How it compares
GLM-5.1 fills a specific niche: it is the best model for teams that need a permissively licensed, long-horizon agentic workhorse. It beats DeepSeek V4 Pro on SWE-Bench Pro and Terminal-Bench, trails on pure knowledge tasks, and offers the cleanest license terms of any model in its class.
Compared to Kimi K2.6: GLM-5.1 lacks vision but has a truly clean MIT license (vs Modified MIT) and MCP built-in. On agentic benchmarks, it competes head-to-head while having the advantage of simpler licensing.
Use it with
vLLM, KTransformers, Open WebUI
When to use something else
If you need vision capabilities, Kimi K2.6 or Qwen3.6 27B are better choices. If coding benchmarks alone drive your decision, DeepSeek V4 Pro edges ahead on Codeforces. For teams without multi-GPU servers, none of these models run locally - consider Gemma 4 31B instead.
Frequently asked
Quick answers to common questions
How much VRAM does GLM-5.1 need?
GLM-5.1 with 744B parameters needs approximately 1488 GB at Q4_K_M quantization. Use our VRAM calculator for an exact estimate.
Is GLM-5.1 better than other GLM models?
GLM-5.1 scores 86 on MMLU. It has 744B parameters with 2,097,152 context - a strong choice for agentic-engineering, long-horizon-tasks, swe-bench.
What license is GLM-5.1 under?
GLM-5.1 is released under the MIT license, making it suitable for most commercial and personal projects.
What hardware runs GLM-5.1 well?
With 744B parameters, GLM-5.1 requires adequate VRAM. High-end GPUs like the RTX 4090 (24GB), RTX 5090 (32GB), or Mac Studio with unified memory are good options. Check our hardware directory for specific recommendations.
What is the best quantization for GLM-5.1?
Q4_K_M is the recommended sweet spot - ~98% of FP16 quality at ~27% of the size. Step up to Q5_K_M or Q8_0 only if you have spare VRAM. Use our VRAM calculator to compare.
How long can GLM-5.1's context window handle?
GLM-5.1 supports a 2,097,152-token context window - enough for very long documents, codebases, or multi-turn conversations. Real-world usable context may vary by implementation.
What models compete with GLM-5.1?
GLM-5.1 competes with other 372B–1116B. Browse our model directory for comparisons, benchmarks, and community reviews to find the best fit.
Compare & pair with
Similar models and recommended hardware
Related models
Recommended hardware
Nearby options
Similar models and compatible hardware by spec
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.