Local Cursor (AI coding)
A Cursor/Copilot-style coding assistant that runs locally - Continue in VS Code + a Qwen Coder model on Ollama. Tab-complete and chat, $0/mo, your code never leaves your machine.
- Cost
- ~$800
- $0/mo vs cloud
- Difficulty
- beginner
- Setup time
- ~20 min
- Use case
- A private AI coding assistant in your editor
~$800 hardware · $0/mo vs cloud
Local Cursor (AI coding)
A Copilot/Cursor-style coding assistant - inline autocomplete plus an in-editor chat that knows your codebase - running entirely on your own GPU. Your proprietary code never touches a third-party server, and there's no per-seat subscription. Built from Continue (the open-source VS Code / JetBrains extension) and a Qwen Coder model served by Ollama.
What you get
- Tab autocomplete in your editor, like Copilot
- Chat with your code - ask about files, generate functions, write tests
- Zero code exfiltration - everything runs on localhost
- $0/month vs $20/mo for Cursor or Copilot
Architecture
| Component | Role |
|---|---|
| Continue | VS Code / JetBrains extension - autocomplete + chat UI |
| Ollama | Serves the coding model locally (port 11434) |
| Qwen2.5 Coder 14B | Strong code model that fits a 24GB GPU at Q4 |
For more headroom and a sharper model, use Qwen3 Coder 30B A3B (MoE, fast). Recommended GPU: RTX 3090 (best value, 24GB) or RTX 4090.
Prerequisites
- A GPU with ≥24 GB VRAM (RTX 3090 / RTX 4090) for the 14B at good speed
- Ollama installed (native or Docker)
- VS Code (or a JetBrains IDE)
Setup
- Pull the coding model with Ollama:
ollama pull qwen2.5-coder:14b-
Install the Continue extension in VS Code (Extensions → search "Continue").
-
Point Continue at your local Ollama. Open
~/.continue/config.yaml:
models:
- name: Qwen2.5 Coder 14B
provider: ollama
model: qwen2.5-coder:14b
roles:
- chat
- edit
- name: Qwen2.5 Coder (autocomplete)
provider: ollama
model: qwen2.5-coder:14b
roles:
- autocomplete- Reload VS Code. You now have inline completions and a Continue chat panel - both fully local.
Use it
- Autocomplete - write a comment, get the implementation; Tab to accept
- Refactor - select code, "⌘L", ask Continue to rewrite/optimize
- Tests + docs - "write unit tests for this file", "document these functions"
Cost vs cloud
| Local Cursor | Cursor / Copilot | |
|---|---|---|
| Monthly | $0 | $20/seat |
| Hardware | ~$800 once (used RTX 3090) | $0 |
| Code privacy | Never leaves your machine | Uploaded to vendor |
| Break-even | ~40 months per seat - or instant for teams that can't upload code | - |
For solo devs it's a long payback, but for any team under a no-cloud-code policy (finance, defense, health) it's the only option that works at all. See the cost-vs-cloud calculator.
Troubleshooting
- Autocomplete feels slow → a 14B at Q4 wants a 24GB card; on less VRAM use Qwen2.5 Coder 7B or a smaller quant.
- Completions are off-topic → make sure the
autocompleterole uses a base/instruct coder model, not a chat model. - Want repo-wide context → enable Continue's
@codebaseand index the workspace.
Swap components
- Heavier model, more speed (MoE): Qwen3 Coder 30B A3B on a RTX 4090.
- Prefer vLLM for throughput over Ollama? Serve the model with vLLM.
- Terminal-first workflow? Pair the model with Open Interpreter.