GeForce RTX 5090 D V2
GeForce RTX 5090 D V2
The GeForce RTX 5090 D V2 is an NVIDIA GPU built on the Blackwell 2.0 architecture, released 2025-08-15. For running AI locally, the numbers that matter are its 24 GB of GDDR7 and 1340 GB/s of memory bandwidth. VRAM decides which models fit at all; bandwidth sets how fast they generate text.
What you can run on 24 GB
At Q4_K_M quantization (the usual local default), 24 GB holds models up to roughly 38B parameters, leaving headroom for context. On this card you can run, among others:
- Qwen3.5 35B A3B - 36B parameters
- Qwen3.6 35B A3B - 36B parameters
- dolphin-2.9.1-yi-1.5-34b - 34.4B parameters
- EXAONE 4.5 33B - 34.4B parameters
- Qwen2.5-32B-Instruct - 32.8B parameters
Larger models need a higher-VRAM card, a second GPU, or CPU offload (which is much slower). Check any specific model with the VRAM calculator, or see the full picture on what can I run.
Local LLM speed (LLaMA 3, llama.cpp)
Single-stream token-generation throughput - estimated from memory bandwidth:
| Model (quant) | Speed on GeForce RTX 5090 D V2 |
|---|---|
| Llama 3 8B (Q4_K_M) | 156.8 tok/s |
| Llama 3 8B (F16) | 71.2 tok/s |
| Llama 3 70B (Q4_K_M) | ✗ won't fit |
Because decode is memory-bandwidth bound, the 1340 GB/s figure is the best single predictor of chat speed on this card. Estimates are calibrated against measured RTX-40-series cards and are typically within ~15%.
Memory and power
- VRAM: 24 GB GDDR7 (384-bit bus)
- Bandwidth: 1340 GB/s
- TDP: 575 W - a 950 W+ power supply is recommended
- Process: 5 nm
- Interface: PCIe 5.0 x16
Quantization and context
Quantization trades a little quality for a lot of VRAM. On 24 GB you can fit roughly a 38B model at Q4_K_M, about a 20B model at the higher-quality Q8, or a smaller model at full FP16. Longer context windows also consume VRAM (the KV cache grows with context length), so leave a few GB of headroom if you plan to use large prompts or many concurrent requests. For most chat and coding use, Q4_K_M on this card is the sweet spot between speed, quality, and the 24 GB budget.
How it compares
Similar cards for local AI, by VRAM and 8B-Q4 speed:
| GPU | VRAM | Bandwidth | Llama 3 8B Q4 |
|---|---|---|---|
| GeForce RTX 5090 D V2 | 24 GB | 1340 GB/s | 156.8 tok/s |
| AMD Radeon RX 7900 XTX | 24 GB | 960 GB/s | 112.3 tok/s |
| Arc Pro B60 | 24 GB | 456 GB/s | 53.4 tok/s |
| NVIDIA Tesla P40 | 24 GB | 347 GB/s | 40.6 tok/s |
Bottom line
The GeForce RTX 5090 D V2 is best for llm-inference, image-gen, comfyui. With 24 GB+ it comfortably handles most open models, including 30B-class at Q4. If you need more, compare with AMD Radeon RX 7900 XTX and Arc Pro B60.
Sources
- Specifications: RightNow GPU Database (TechPowerUp data)
- Benchmarks: GPU-Benchmarks-on-LLM-Inference (basis for the bandwidth estimate)
Specs and benchmarks last checked 2026-06-08. Verify current pricing before buying.
Frequently asked
Quick answers to common questions
How much VRAM does the GeForce RTX 5090 D V2 have?
The GeForce RTX 5090 D V2 has 24 GB of VRAM with 1340 GB/s memory bandwidth.
What local AI models can run on the GeForce RTX 5090 D V2?
The GeForce RTX 5090 D V2 with 24 GB VRAM can run many models depending on quantization. Models up to ~36B params may fit at Q4_K_M. Use our VRAM calculator to check specific models.
Is the GeForce RTX 5090 D V2 good for local AI inference?
GeForce RTX 5090 D V2 is best for llm-inference, image-gen, comfyui. With ample VRAM it handles most open models well.
Where can I buy the GeForce RTX 5090 D V2?
Check our buy links above for the best current prices on Amazon, Newegg, and B&H. Prices vary by retailer and availability.
How does the GeForce RTX 5090 D V2 compare to other GPUs?
GeForce RTX 5090 D V2 has 24 GB VRAM and 1340 GB/s bandwidth. This puts it in the high-end category, suitable for most open models. Browse our hardware directory for side-by-side comparisons.
What power supply do I need for the GeForce RTX 5090 D V2?
The GeForce RTX 5090 D V2 has a TDP of 575W. This requires a high-wattage PSU (850W+ recommended). Always check the manufacturer's recommendations for your specific build.
Nearby options
Similar hardware and models that fit
Similar hardware
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.