Windows Desktop AI with RAG
LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU.
Windows Desktop AI with RAG is a local AI stack for A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI. LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU. It combines 6 components, is rated beginner, and takes about 15 minutes to set up. Expect around $200 in hardware and $0/month versus cloud.
- Cost
- ~$200
- $0/mo vs cloud
- Difficulty
- beginner
- Setup time
- ~15 min
- Use case
- A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI
~$200 hardware · $0/mo vs cloud
Windows Desktop AI with RAG (LM Studio + AnythingLLM)
A fully private AI stack for Windows that never touches the cloud. LM Studio serves as your local model server with a visual interface — browse, download, and run models from HuggingFace without typing a single command. AnythingLLM adds document RAG, workspace isolation, and agent skills on top. Your documents, your GPU, your data.
This stack is built for Windows users who prefer a graphical interface — no Docker, no terminal commands beyond the basics. If you've wanted to run local AI but found Ollama's CLI intimidating, this is your setup.
What you get
- Visual model browser — search HuggingFace models inside LM Studio, download with one click
- Drop-in document Q&A — PDF, DOCX, TXT, CSV, code files. Drag them into AnythingLLM and ask questions
- Workspace isolation — separate knowledge bases for work, personal, or different projects
- OpenAI-compatible local API — LM Studio exposes
http://localhost:1234/v1so AnythingLLM and any other tool can connect - No data leaves your PC — all inference and embedding runs locally, works completely offline
- No Docker, no WSL, no CLI — both apps are native Windows desktop installers
- $0/month — the only cost is the GPU you already own
Architecture
| Component | Role |
|---|---|
| LM Studio | Model manager + local API server (port 1234) |
| AnythingLLM | Chat UI + RAG engine + vector DB (LanceDB) |
| Qwen2.5 14B | General chat model, fits 12GB at Q4 |
| LanceDB (built-in) | Local vector database for document embeddings |
LM Studio runs the model and exposes an OpenAI-compatible endpoint. AnythingLLM connects to it, handles document chunking and embeddings, and provides the chat interface. Everything runs as native Windows apps — no containers.
Recommended GPU: RTX 3060 12GB (~$200 used) for running 7B-14B models. If you have an RTX 4060 (8GB), stick with 3B-7B models for good speed. No GPU required — both apps fall back to CPU, though responses will be slower.
Prerequisites
- Windows 11 (64-bit, Home or Pro)
- GPU with 4GB+ VRAM recommended (6GB+ preferred). CPU-only works but is slower
- 16GB RAM (8GB minimum, 16GB for comfortable multitasking)
- 10-30GB free disk for models
- LM Studio (free) and AnythingLLM Desktop (free, MIT license)
Setup
Step 1: Install LM Studio
- Go to lmstudio.ai and download the Windows installer
- Run the installer — default path is fine
- Open LM Studio
You'll see a clean interface with a search bar and model browser. No terminal commands needed.
Step 2: Download a model
- In LM Studio, go to the Discover tab (magnifying glass icon in left sidebar)
- Search for
Qwen2.5-14B— a capable 14B model for general chat and RAG - Look for a Q4_K_M quantized version — best balance of quality and size
- Click Download and wait (~8 GB for Q4_K_M)
If you have 8GB VRAM or less, search for Qwen2.5-7B or Llama 3.2 3B instead. The download process is the same.
Step 3: Start the local server
- In LM Studio, go to the Developer tab (
</>icon in left sidebar) - Select your downloaded model from the dropdown at the top
- Click Start Server
- You should see:
Server listening on http://localhost:1234
Leave LM Studio open and the server running — AnythingLLM connects to it over this local port.
Step 4: Install AnythingLLM
- Go to anythingllm.com/desktop and download the Windows installer
- Important: When prompted, install for Current User only — not "All Users". Installing to Program Files causes a known spawn error
- Open AnythingLLM Desktop
Step 5: Connect AnythingLLM to LM Studio
- In AnythingLLM, open Settings (gear icon, bottom left)
- Go to LLM Preference
- Select LM Studio as the provider
- Set the base URL to
http://localhost:1234 - Click Save changes
- Go to Embedding Model in the same settings panel and set it to AnythingLLM built-in (or LM Studio if you prefer)
AnythingLLM will show a green indicator if it can reach the LM Studio server.
Step 6: Create a workspace and chat
- Click New Workspace
- Name it — for example "Work" or "Personal"
- Optionally set a system prompt
- Click Save
You can now chat with your local model. Type any question to confirm the connection is working.
Step 7: Add documents (RAG)
- Inside your workspace, click the paperclip icon or drag a file into the chat area
- Supported formats:
.txt,.pdf,.md,.docx,.csv - AnythingLLM splits the document into chunks and creates embeddings locally
- Once indexed (green indicator next to filename), ask questions about the document
- The model retrieves relevant chunks and answers using your document as context
Workspaces are isolated — documents in one workspace are invisible to others. This lets you keep separate knowledge bases for different projects.
Use it
Once set up, you can:
- Chat with local models — full conversational AI, no internet needed
- Query your documents — upload PDFs, code files, meeting notes, and ask questions
- Switch models on the fly — download a different model in LM Studio and select it from the dropdown
- Use any OpenAI-compatible tool — since LM Studio exposes
http://localhost:1234/v1, you can connect other tools (SillyTavern, Obsidian plugins, etc.)
Performance tips
| GPU | Max model size | Typical speed |
|---|---|---|
| RTX 3060 12GB | 14B at Q4 | 15-20 tok/s |
| RTX 4060 8GB | 7B at Q4 | 20-30 tok/s |
| CPU-only (16GB) | 3B at Q4 | 3-5 tok/s |
| CPU-only (32GB) | 7B at Q4 | 2-4 tok/s |
CPU-only inference works but expect slower responses. The RAG pipeline (embedding + retrieval) runs fine on CPU.
Cost vs cloud
| Local (this stack) | Cloud ChatGPT Plus | |
|---|---|---|
| Monthly | $0 | $20 |
| GPU (used) | $200 (RTX 3060) | $0 (included) |
| Data privacy | Complete | Shared with OpenAI |
| Offline capable | Yes | No |
| Model selection | Any open model | GPT-4o only |
Troubleshooting
AnythingLLM won't start — spawn error This happens when installed under Program Files (All Users). Uninstall and reinstall using "Current User" option.
Model loads but responds very slowly You're likely running on CPU. Check that LM Studio detects your GPU in the Developer tab. Try a smaller model (3B or 7B).
"Failed to connect to LM Studio" in AnythingLLM
Make sure LM Studio's server is running (Developer tab > Start Server). Verify the port: http://localhost:1234 should show a page in your browser.
Documents are uploaded but the model ignores them Ensure the document has a green indicator in the workspace sidebar. Check that the embedding model is active in Settings > Embedding Model.
Frequently asked
What is the Windows Desktop AI with RAG stack for?
LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU. It is purpose-built for A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI and runs entirely on your own hardware.
How much does the Windows Desktop AI with RAG stack cost?
Windows Desktop AI with RAG costs around $200 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.
How long does it take to set up Windows Desktop AI with RAG?
Plan for roughly 15 minutes. The stack is rated beginner.
What do I need to run Windows Desktop AI with RAG?
Windows Desktop AI with RAG is built from 2 tool(s), 2 model(s), 2 hardware item(s). Each is listed below with a link.