Local Document RAG (AnythingLLM + Ollama)
AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo.
Local Document RAG (AnythingLLM + Ollama) is a local AI stack for Chat with your documents locally using RAG - private, offline, zero setup. AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo. It combines 6 components, is rated beginner, and takes about 10 minutes to set up. Expect around $300 in hardware and $0/month versus cloud.
- Cost
- ~$300
- $0/mo vs cloud
- Difficulty
- beginner
- Setup time
- ~10 min
- Use case
- Chat with your documents locally using RAG - private, offline, zero setup
~$300 hardware · $0/mo vs cloud
Local Document RAG (AnythingLLM + Ollama)
A private, zero-setup RAG system for your documents. AnythingLLM is an all-in-one desktop application that combines local LLM inference, RAG (retrieval-augmented generation), agent skills, and a vector database into a single clean interface. Connect it to Ollama for free local inference, and you can chat with your PDFs, Word docs, code files, and even web pages - entirely offline, entirely private.
What you get
- Drop-in document Q&A - PDFs, Word docs, CSV, code files, YouTube transcripts, GitHub repos. Just add them and ask questions
- Workspace isolation - separate knowledge bases for different projects or clients
- Built-in vector DB - LanceDB ships with AnythingLLM, no separate setup needed
- Agent skills - built-in tools for web search, summarization, and more
- Desktop app or Docker - runs on Windows, Mac, Linux, or as a server
- Multi-user - share with your team on LAN
- $0/month - all local, no API keys required
Architecture
| Component | Role |
|---|---|
| AnythingLLM | Desktop/server app - RAG, agents, vector DB |
| Ollama | Serves local LLM models |
| Qwen3 14B | Default model for chat + embeddings |
| LanceDB (built-in) | Vector database for document embeddings |
AnythingLLM runs on any machine with or without a GPU - CPU-only works fine for embedding and retrieval, a GPU speeds up the LLM responses. Recommended: RTX 3060 12GB or even a Beelink S9 mini PC for CPU-only.
Prerequisites
- Any computer with Windows, Mac, or Linux (GPU optional)
- Ollama installed and running
- ~3 GB free disk for AnythingLLM + models
Setup
Step 1: Install Ollama
Download and install Ollama from ollama.com, or run with Docker:
docker run -d --gpus all -p 11434:11434 --name ollama \
-v ollama:/root/.ollama \
ollama/ollamaPull your default model:
ollama pull qwen3:14bAlso pull an embedding model (AnythingLLM uses it for document vectorization):
ollama pull nomic-embed-textStep 2: Install AnythingLLM
Desktop app (easiest): Download from anythingllm.com and install like any app.
Docker (server mode):
docker run -d -p 3001:3001 \
--name anythingllm \
--add-host host.docker.internal:host-gateway \
-v anythingllm:/app/server/storage \
-e STORAGE_DIR=/app/server/storage \
mintplexlabs/anythingllmStep 3: Connect Ollama to AnythingLLM
- Open AnythingLLM (http://localhost:3001 for Docker, or launch the desktop app)
- Go to Settings → LLM Provider
- Select Ollama as the provider
- Set Base URL to
http://localhost:11434 - Select model:
qwen3:14b - Go to Embedder settings, select Ollama, choose
nomic-embed-text
That's it. You're ready to add documents.
Use it
Chat with a PDF
- Click New Workspace and name it (e.g., "Research Papers")
- Drag and drop a PDF into the workspace
- AnythingLLM automatically chunks, embeds, and indexes it
- Ask questions in the chat - responses come from your document
Build a Project Knowledge Base
- Create a workspace per project
- Upload: spec docs, meeting notes, code files, design briefs
- Ask questions across all documents in that workspace
- Workspaces are isolated - docs in one don't leak to another
Use Agent Skills
Click the agent icon in chat to enable tools:
- Web search - fetch and summarize web pages
- Code execution - run Python/JS snippets
- Built-in prompts - summarize, explain, translate
Cost vs cloud
| Local AnythingLLM + Ollama | ChatGPT + Custom GPTs | |
|---|---|---|
| Monthly | $0 | $20-200 (API costs) |
| Hardware | ~$0-300 (any machine) | $0 |
| Data privacy | Stays on your machine | Sent to OpenAI |
| Document cap | Unlimited | Token/file limits |
| RAG quality | Local, fast | Cloud, fast |
Troubleshooting
- AnythingLLM can't find Ollama → Make sure Ollama is running (
ollama list). On Docker, usehttp://host.docker.internal:11434. - Document answers are wrong → Try a larger model like Qwen3 14B. The built-in LanceDB works best with
nomic-embed-text. - Slow on CPU → Use a smaller model like Llama 3.1 8B at Q4. Responses will be a few tok/s but still usable.
- Docker desktop can't reach Ollama → Ensure
--add-host host.docker.internal:host-gatewayis in the run command.
Swap components
- Desktop vs Server → Desktop app is simpler for single user. Docker server enables multi-user and remote access.
- Larger model → Qwen3 32B on an RTX 4090 for sharper answers.
- Alternative LLM provider → AnythingLLM also supports OpenAI, Anthropic, vLLM, LM Studio, and more.
- Prefer Open WebUI → For a browser-based RAG experience, try Open WebUI instead.
Frequently asked
What is the Local Document RAG (AnythingLLM + Ollama) stack for?
AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo. It is purpose-built for Chat with your documents locally using RAG - private, offline, zero setup and runs entirely on your own hardware.
How much does the Local Document RAG (AnythingLLM + Ollama) stack cost?
Local Document RAG (AnythingLLM + Ollama) costs around $300 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.
How long does it take to set up Local Document RAG (AnythingLLM + Ollama)?
Plan for roughly 10 minutes. The stack is rated beginner.
What do I need to run Local Document RAG (AnythingLLM + Ollama)?
Local Document RAG (AnythingLLM + Ollama) is built from 2 tool(s), 2 model(s), 2 hardware item(s). Each is listed below with a link.