What Can I Run?

Select your GPU or enter your VRAM, and discover exactly which local AI models run on your machine - with fit levels, quantization options, and real performance estimates.

Frequently asked

How do I auto-detect my GPU?

Click “Auto-detect my machine” at the top. We read your GPU name via WebGL and map it to known VRAM. If auto-detection doesn't work, you can manually select your GPU or enter your VRAM amount.

What do the fit levels mean?

Great - model uses ≤60% of your VRAM, with headroom for context and activations. OK - ≤85% VRAM, solid performance. Tight - 85–100% VRAM, still usable but with less headroom. No- won't fit.

How does quantization affect what I can run?

Quantization reduces model size with minimal quality loss. Q4_K_M uses ~0.55 bytes/param. Q8_0 (near-lossless) uses ~1.1 bytes/param. FP16 (full precision) uses ~2.0 bytes/param. Lower quantization = smaller model = more VRAM available for other tasks.

Can I run these models on CPU only?

Yes, with caveats. CPUs can run smaller models (up to ~20B), but performance is slower (typically <1 token/sec). We filter out heavy media models (image generation, video) on CPU-only machines since they're impractical.