Local Document RAG (AnythingLLM + Ollama)

AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo.

The short answer

Local Document RAG (AnythingLLM + Ollama) is a local AI stack for Chat with your documents locally using RAG - private, offline, zero setup. AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo. It combines 6 components, is rated beginner, and takes about 10 minutes to set up. Expect around $300 in hardware and $0/month versus cloud.

Cost
~$300
$0/mo vs cloud
Difficulty
beginner
Setup time
~10 min
Use case
Chat with your documents locally using RAG - private, offline, zero setup

~$300 hardware · $0/mo vs cloud

Local Document RAG (AnythingLLM + Ollama)

A private, zero-setup RAG system for your documents. AnythingLLM is an all-in-one desktop application that combines local LLM inference, RAG (retrieval-augmented generation), agent skills, and a vector database into a single clean interface. Connect it to Ollama for free local inference, and you can chat with your PDFs, Word docs, code files, and even web pages - entirely offline, entirely private.

What you get

  • Drop-in document Q&A - PDFs, Word docs, CSV, code files, YouTube transcripts, GitHub repos. Just add them and ask questions
  • Workspace isolation - separate knowledge bases for different projects or clients
  • Built-in vector DB - LanceDB ships with AnythingLLM, no separate setup needed
  • Agent skills - built-in tools for web search, summarization, and more
  • Desktop app or Docker - runs on Windows, Mac, Linux, or as a server
  • Multi-user - share with your team on LAN
  • $0/month - all local, no API keys required

Architecture

ComponentRole
AnythingLLMDesktop/server app - RAG, agents, vector DB
OllamaServes local LLM models
Qwen3 14BDefault model for chat + embeddings
LanceDB (built-in)Vector database for document embeddings

AnythingLLM runs on any machine with or without a GPU - CPU-only works fine for embedding and retrieval, a GPU speeds up the LLM responses. Recommended: RTX 3060 12GB or even a Beelink S9 mini PC for CPU-only.

Prerequisites

  • Any computer with Windows, Mac, or Linux (GPU optional)
  • Ollama installed and running
  • ~3 GB free disk for AnythingLLM + models

Setup

Step 1: Install Ollama

Download and install Ollama from ollama.com, or run with Docker:

docker run -d --gpus all -p 11434:11434 --name ollama \
  -v ollama:/root/.ollama \
  ollama/ollama

Pull your default model:

ollama pull qwen3:14b

Also pull an embedding model (AnythingLLM uses it for document vectorization):

ollama pull nomic-embed-text

Step 2: Install AnythingLLM

Desktop app (easiest): Download from anythingllm.com and install like any app.

Docker (server mode):

docker run -d -p 3001:3001 \
  --name anythingllm \
  --add-host host.docker.internal:host-gateway \
  -v anythingllm:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  mintplexlabs/anythingllm

Step 3: Connect Ollama to AnythingLLM

  1. Open AnythingLLM (http://localhost:3001 for Docker, or launch the desktop app)
  2. Go to Settings → LLM Provider
  3. Select Ollama as the provider
  4. Set Base URL to http://localhost:11434
  5. Select model: qwen3:14b
  6. Go to Embedder settings, select Ollama, choose nomic-embed-text

That's it. You're ready to add documents.

Use it

Chat with a PDF

  1. Click New Workspace and name it (e.g., "Research Papers")
  2. Drag and drop a PDF into the workspace
  3. AnythingLLM automatically chunks, embeds, and indexes it
  4. Ask questions in the chat - responses come from your document

Build a Project Knowledge Base

  1. Create a workspace per project
  2. Upload: spec docs, meeting notes, code files, design briefs
  3. Ask questions across all documents in that workspace
  4. Workspaces are isolated - docs in one don't leak to another

Use Agent Skills

Click the agent icon in chat to enable tools:

  • Web search - fetch and summarize web pages
  • Code execution - run Python/JS snippets
  • Built-in prompts - summarize, explain, translate

Cost vs cloud

Local AnythingLLM + OllamaChatGPT + Custom GPTs
Monthly$0$20-200 (API costs)
Hardware~$0-300 (any machine)$0
Data privacyStays on your machineSent to OpenAI
Document capUnlimitedToken/file limits
RAG qualityLocal, fastCloud, fast

Troubleshooting

  • AnythingLLM can't find Ollama → Make sure Ollama is running (ollama list). On Docker, use http://host.docker.internal:11434.
  • Document answers are wrong → Try a larger model like Qwen3 14B. The built-in LanceDB works best with nomic-embed-text.
  • Slow on CPU → Use a smaller model like Llama 3.1 8B at Q4. Responses will be a few tok/s but still usable.
  • Docker desktop can't reach Ollama → Ensure --add-host host.docker.internal:host-gateway is in the run command.

Swap components

  • Desktop vs Server → Desktop app is simpler for single user. Docker server enables multi-user and remote access.
  • Larger modelQwen3 32B on an RTX 4090 for sharper answers.
  • Alternative LLM provider → AnythingLLM also supports OpenAI, Anthropic, vLLM, LM Studio, and more.
  • Prefer Open WebUI → For a browser-based RAG experience, try Open WebUI instead.

Frequently asked

What is the Local Document RAG (AnythingLLM + Ollama) stack for?

AnythingLLM + Ollama = a private RAG system where you drop in PDFs, Word docs, and code files and ask questions. Runs on any machine, no cloud dependency, $0/mo. It is purpose-built for Chat with your documents locally using RAG - private, offline, zero setup and runs entirely on your own hardware.

How much does the Local Document RAG (AnythingLLM + Ollama) stack cost?

Local Document RAG (AnythingLLM + Ollama) costs around $300 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.

How long does it take to set up Local Document RAG (AnythingLLM + Ollama)?

Plan for roughly 10 minutes. The stack is rated beginner.

What do I need to run Local Document RAG (AnythingLLM + Ollama)?

Local Document RAG (AnythingLLM + Ollama) is built from 2 tool(s), 2 model(s), 2 hardware item(s). Each is listed below with a link.