Private ChatGPT
Run a ChatGPT-style assistant entirely on your own machine - Ollama + Open WebUI, no data leaves your network, $0/mo. Works on a $300 used GPU.
- Cost
- ~$300
- $0/mo vs cloud
- Difficulty
- beginner
- Setup time
- ~15 min
- Use case
- A private ChatGPT on your own GPU
~$300 hardware · $0/mo vs cloud
Private ChatGPT
A ChatGPT-style chat assistant that runs entirely on your own hardware. Your conversations never leave your machine, there's no monthly fee, and it works offline. Two open-source components - Ollama to serve the model and Open WebUI for the familiar chat interface - get you there in about 15 minutes.
What you get
- A polished, ChatGPT-like web UI with chat history, system prompts, and multi-model switching
- 100% local inference - no data sent to any cloud, works on an air-gapped network
- Multiple users / accounts on your LAN
- $0/month - the only cost is the GPU you already own (or a ~$300 used one)
Architecture
| Component | Role |
|---|---|
| Ollama | Pulls and serves the model over a local API (port 11434) |
| Open WebUI | Browser chat front-end, talks to Ollama (port 3000) |
| Qwen3 14B | The default model - strong general chat, fits 12GB at Q4 |
For an 8GB card, swap in Llama 3.1 8B. Recommended GPU: RTX 3060 12GB - the cheapest card that runs a 14B model comfortably.
Prerequisites
- A GPU with ≥12 GB VRAM (RTX 3060 12GB or better) - or run smaller on 8GB
- Docker + Docker Compose, with the NVIDIA Container Toolkit for GPU passthrough
- ~10 GB free disk for the model
Setup
Save this as docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui:/app/backend/data
ports:
- "3000:8080"
restart: unless-stopped
volumes:
ollama:
open-webui:Bring it up and pull the model:
docker compose up -d
docker exec ollama ollama pull qwen3:14bOpen http://localhost:3000, create the first account (it becomes admin), pick qwen3:14b, and chat.
Use it
- Daily assistant - drafting, summarizing, brainstorming, with full history
- Private document Q&A - paste sensitive text you'd never send to a cloud API
- Team chat - host it on a homelab box; everyone on the LAN gets an account
Cost vs cloud
| Private ChatGPT | ChatGPT Plus | |
|---|---|---|
| Monthly | $0 | $20 |
| Hardware | ~$300 once (RTX 3060 12GB) | $0 |
| Data privacy | Stays on your machine | Sent to OpenAI |
| Break-even | ~15 months, then free forever | - |
After ~15 months the GPU has paid for itself versus a single ChatGPT Plus seat - and it serves your whole household, runs offline, and keeps every conversation private. See the cost-vs-cloud calculator for your usage.
Troubleshooting
- Ollama can't see the GPU → install the NVIDIA Container Toolkit and confirm
docker run --rm --gpus all nvidia/cuda:12.4.0-base nvidia-smiworks. - Open WebUI shows no models → the model pull (
ollama pull) must finish first; refresh the model list. - Slow / CPU-only → check
docker exec ollama ollama psshows the GPU; without it you'll get 2-5 tok/s instead of 40+.
Swap components
- Prefer a desktop app over a browser? Use Jan or LM Studio instead of Open WebUI.
- Want a lighter model on 8GB? Llama 3.1 8B or Phi-4 mini.
- More VRAM? Step up to Qwen3 32B on an RTX 4090.