gpullm-inferenceimage-gencomfyui

RTX PRO 6000 Blackwell Max-Q

Updated Jun 8, 2026
VRAM
96 GB
Bandwidth
1790 GB/s
TDP
300 W
MSRP
-
Category
gpu

RTX PRO 6000 Blackwell Max-Q

The RTX PRO 6000 Blackwell Max-Q is an NVIDIA GPU built on the Blackwell 2.0 architecture, released 2025-03-18. For running AI locally, the numbers that matter are its 96 GB of GDDR7 and 1790 GB/s of memory bandwidth. VRAM decides which models fit at all; bandwidth sets how fast they generate text.

What you can run on 96 GB

At Q4_K_M quantization (the usual local default), 96 GB holds models up to roughly 152B parameters, leaving headroom for context. On this card you can run, among others:

Larger models need a higher-VRAM card, a second GPU, or CPU offload (which is much slower). Check any specific model with the VRAM calculator, or see the full picture on what can I run.

Local LLM speed (LLaMA 3, llama.cpp)

Single-stream token-generation throughput - estimated from memory bandwidth:

Model (quant)Speed on RTX PRO 6000 Blackwell Max-Q
Llama 3 8B (Q4_K_M)209.5 tok/s
Llama 3 8B (F16)95.1 tok/s
Llama 3 70B (Q4_K_M)24.6 tok/s

Because decode is memory-bandwidth bound, the 1790 GB/s figure is the best single predictor of chat speed on this card. Estimates are calibrated against measured RTX-40-series cards and are typically within ~15%.

Memory and power

  • VRAM: 96 GB GDDR7 (512-bit bus)
  • Bandwidth: 1790 GB/s
  • TDP: 300 W - a 700 W+ power supply is recommended
  • Process: 5 nm
  • Interface: PCIe 5.0 x16

Quantization and context

Quantization trades a little quality for a lot of VRAM. On 96 GB you can fit roughly a 152B model at Q4_K_M, about a 82B model at the higher-quality Q8, or a smaller model at full FP16. Longer context windows also consume VRAM (the KV cache grows with context length), so leave a few GB of headroom if you plan to use large prompts or many concurrent requests. For most chat and coding use, Q4_K_M on this card is the sweet spot between speed, quality, and the 96 GB budget.

How it compares

Similar cards for local AI, by VRAM and 8B-Q4 speed:

GPUVRAMBandwidthLlama 3 8B Q4
RTX PRO 6000 Blackwell Max-Q96 GB1790 GB/s209.5 tok/s
RTX PRO 6000 Blackwell96 GB1790 GB/s209.5 tok/s
RTX PRO 5000 72 GB Blackwell72 GB1340 GB/s156.8 tok/s
NVIDIA RTX A600048 GB768 GB/s89.9 tok/s

Bottom line

The RTX PRO 6000 Blackwell Max-Q is best for llm-inference, image-gen, comfyui. With 24 GB+ it comfortably handles most open models, including 30B-class at Q4. If you need more, compare with RTX PRO 6000 Blackwell and RTX PRO 5000 72 GB Blackwell.

Sources

Specs and benchmarks last checked 2026-06-08. Verify current pricing before buying.

Frequently asked

Quick answers to common questions

How much VRAM does the RTX PRO 6000 Blackwell Max-Q have?

The RTX PRO 6000 Blackwell Max-Q has 96 GB of VRAM with 1790 GB/s memory bandwidth.

What local AI models can run on the RTX PRO 6000 Blackwell Max-Q?

The RTX PRO 6000 Blackwell Max-Q with 96 GB VRAM can run many models depending on quantization. Models up to ~147B params may fit at Q4_K_M. Use our VRAM calculator to check specific models.

Is the RTX PRO 6000 Blackwell Max-Q good for local AI inference?

RTX PRO 6000 Blackwell Max-Q is best for llm-inference, image-gen, comfyui. With ample VRAM it handles most open models well.

Where can I buy the RTX PRO 6000 Blackwell Max-Q?

Check our buy links above for the best current prices on Amazon, Newegg, and B&H. Prices vary by retailer and availability.

How does the RTX PRO 6000 Blackwell Max-Q compare to other GPUs?

RTX PRO 6000 Blackwell Max-Q has 96 GB VRAM and 1790 GB/s bandwidth. This puts it in the high-end category, suitable for most open models. Browse our hardware directory for side-by-side comparisons.

What power supply do I need for the RTX PRO 6000 Blackwell Max-Q?

The RTX PRO 6000 Blackwell Max-Q has a TDP of 300W. A standard quality PSU of 650W+ should suffice. Always check the manufacturer's recommendations for your specific build.

Nearby options

Similar hardware and models that fit

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.