Windows Desktop AI with RAG

LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU.

The short answer

Windows Desktop AI with RAG is a local AI stack for A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI. LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU. It combines 6 components, is rated beginner, and takes about 15 minutes to set up. Expect around $200 in hardware and $0/month versus cloud.

Cost
~$200
$0/mo vs cloud
Difficulty
beginner
Setup time
~15 min
Use case
A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI

Windows Desktop AI with RAG (LM Studio + AnythingLLM)

A fully private AI stack for Windows that never touches the cloud. LM Studio serves as your local model server with a visual interface — browse, download, and run models from HuggingFace without typing a single command. AnythingLLM adds document RAG, workspace isolation, and agent skills on top. Your documents, your GPU, your data.

This stack is built for Windows users who prefer a graphical interface — no Docker, no terminal commands beyond the basics. If you've wanted to run local AI but found Ollama's CLI intimidating, this is your setup.

What you get

  • Visual model browser — search HuggingFace models inside LM Studio, download with one click
  • Drop-in document Q&A — PDF, DOCX, TXT, CSV, code files. Drag them into AnythingLLM and ask questions
  • Workspace isolation — separate knowledge bases for work, personal, or different projects
  • OpenAI-compatible local API — LM Studio exposes http://localhost:1234/v1 so AnythingLLM and any other tool can connect
  • No data leaves your PC — all inference and embedding runs locally, works completely offline
  • No Docker, no WSL, no CLI — both apps are native Windows desktop installers
  • $0/month — the only cost is the GPU you already own

Architecture

ComponentRole
LM StudioModel manager + local API server (port 1234)
AnythingLLMChat UI + RAG engine + vector DB (LanceDB)
Qwen2.5 14BGeneral chat model, fits 12GB at Q4
LanceDB (built-in)Local vector database for document embeddings

LM Studio runs the model and exposes an OpenAI-compatible endpoint. AnythingLLM connects to it, handles document chunking and embeddings, and provides the chat interface. Everything runs as native Windows apps — no containers.

Recommended GPU: RTX 3060 12GB (~$200 used) for running 7B-14B models. If you have an RTX 4060 (8GB), stick with 3B-7B models for good speed. No GPU required — both apps fall back to CPU, though responses will be slower.

Prerequisites

  • Windows 11 (64-bit, Home or Pro)
  • GPU with 4GB+ VRAM recommended (6GB+ preferred). CPU-only works but is slower
  • 16GB RAM (8GB minimum, 16GB for comfortable multitasking)
  • 10-30GB free disk for models
  • LM Studio (free) and AnythingLLM Desktop (free, MIT license)

Setup

Step 1: Install LM Studio

  1. Go to lmstudio.ai and download the Windows installer
  2. Run the installer — default path is fine
  3. Open LM Studio

You'll see a clean interface with a search bar and model browser. No terminal commands needed.

Step 2: Download a model

  1. In LM Studio, go to the Discover tab (magnifying glass icon in left sidebar)
  2. Search for Qwen2.5-14B — a capable 14B model for general chat and RAG
  3. Look for a Q4_K_M quantized version — best balance of quality and size
  4. Click Download and wait (~8 GB for Q4_K_M)

If you have 8GB VRAM or less, search for Qwen2.5-7B or Llama 3.2 3B instead. The download process is the same.

Step 3: Start the local server

  1. In LM Studio, go to the Developer tab (</> icon in left sidebar)
  2. Select your downloaded model from the dropdown at the top
  3. Click Start Server
  4. You should see: Server listening on http://localhost:1234

Leave LM Studio open and the server running — AnythingLLM connects to it over this local port.

Step 4: Install AnythingLLM

  1. Go to anythingllm.com/desktop and download the Windows installer
  2. Important: When prompted, install for Current User only — not "All Users". Installing to Program Files causes a known spawn error
  3. Open AnythingLLM Desktop

Step 5: Connect AnythingLLM to LM Studio

  1. In AnythingLLM, open Settings (gear icon, bottom left)
  2. Go to LLM Preference
  3. Select LM Studio as the provider
  4. Set the base URL to http://localhost:1234
  5. Click Save changes
  6. Go to Embedding Model in the same settings panel and set it to AnythingLLM built-in (or LM Studio if you prefer)

AnythingLLM will show a green indicator if it can reach the LM Studio server.

Step 6: Create a workspace and chat

  1. Click New Workspace
  2. Name it — for example "Work" or "Personal"
  3. Optionally set a system prompt
  4. Click Save

You can now chat with your local model. Type any question to confirm the connection is working.

Step 7: Add documents (RAG)

  1. Inside your workspace, click the paperclip icon or drag a file into the chat area
  2. Supported formats: .txt, .pdf, .md, .docx, .csv
  3. AnythingLLM splits the document into chunks and creates embeddings locally
  4. Once indexed (green indicator next to filename), ask questions about the document
  5. The model retrieves relevant chunks and answers using your document as context

Workspaces are isolated — documents in one workspace are invisible to others. This lets you keep separate knowledge bases for different projects.

Use it

Once set up, you can:

  • Chat with local models — full conversational AI, no internet needed
  • Query your documents — upload PDFs, code files, meeting notes, and ask questions
  • Switch models on the fly — download a different model in LM Studio and select it from the dropdown
  • Use any OpenAI-compatible tool — since LM Studio exposes http://localhost:1234/v1, you can connect other tools (SillyTavern, Obsidian plugins, etc.)

Performance tips

GPUMax model sizeTypical speed
RTX 3060 12GB14B at Q415-20 tok/s
RTX 4060 8GB7B at Q420-30 tok/s
CPU-only (16GB)3B at Q43-5 tok/s
CPU-only (32GB)7B at Q42-4 tok/s

CPU-only inference works but expect slower responses. The RAG pipeline (embedding + retrieval) runs fine on CPU.

Cost vs cloud

Local (this stack)Cloud ChatGPT Plus
Monthly$0$20
GPU (used)$200 (RTX 3060)$0 (included)
Data privacyCompleteShared with OpenAI
Offline capableYesNo
Model selectionAny open modelGPT-4o only

Troubleshooting

AnythingLLM won't start — spawn error This happens when installed under Program Files (All Users). Uninstall and reinstall using "Current User" option.

Model loads but responds very slowly You're likely running on CPU. Check that LM Studio detects your GPU in the Developer tab. Try a smaller model (3B or 7B).

"Failed to connect to LM Studio" in AnythingLLM Make sure LM Studio's server is running (Developer tab > Start Server). Verify the port: http://localhost:1234 should show a page in your browser.

Documents are uploaded but the model ignores them Ensure the document has a green indicator in the workspace sidebar. Check that the embedding model is active in Settings > Embedding Model.

Frequently asked

What is the Windows Desktop AI with RAG stack for?

LM Studio + AnythingLLM = local AI on Windows with a friendly GUI. Download models visually, chat with documents, no terminal needed. Runs on a $200 used GPU. It is purpose-built for A private AI assistant on Windows with document RAG - no CLI, no Docker, all desktop GUI and runs entirely on your own hardware.

How much does the Windows Desktop AI with RAG stack cost?

Windows Desktop AI with RAG costs around $200 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.

How long does it take to set up Windows Desktop AI with RAG?

Plan for roughly 15 minutes. The stack is rated beginner.

What do I need to run Windows Desktop AI with RAG?

Windows Desktop AI with RAG is built from 2 tool(s), 2 model(s), 2 hardware item(s). Each is listed below with a link.