What it does
Core capabilities at a glance
- AI Chat
- AI Tools
- Apple Silicon
- Chatgpt
- Deepseek
- Desktop APP
- Gemma
- Gguf
Deep dive
The full breakdown - performance, comparisons, and setup
Atomic-Chat
Atomic-Chat is an agent framework - Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.
Overview
Local AI app and inference engine for agents. Run open-weight LLMs locally — private, on your machine.
or grab any build from atomic.chat · GitHub Releases — latest: v1.1.95
Atomic Chat is built by a small team and a handful of community contributors. Pull requests welcome — see CONTRIBUTING.md for how to get started.
- Run open-weight LLMs locally from HuggingFace — Llama, Gemma, Qwen, Mistral, Phi, and others - Multi-Token Prediction (MTP) speculative decoding — 30–70% throughput boost on supported models, up to 3× on Gemma 4 - DFlash block-diffusion decoding — up to 6× faster on Qwen 3.6, Gemma 4, Kimi K2.5 - Flash Attention toggle ('on' / 'off' / 'auto') - Automatic reasoning-context tracking for chain-of-thought models - Auto context-window expansion with overflow notifications - EAGLE-3 speculative decoding for Gemma 4 on Apple Silicon (MLX) - MTP on MLX for Qwen 3.5 / 3.6 and DeepSeek V4 - TurboQuant KV cache ('turbo3' / 'turbo4') on llama.cpp — now on Windows & Linux too, not just macOS: up to ~4.3× smaller KV cache footprint, CPU and GPU (CUDA / Vulkan) - TurboQuant KV cache on MLX-VLM — smaller memory footprint via RHT-correct fast paths
Atomic-Chat is open-source, written primarily in TypeScript, with 982 GitHub stars under the Other license. The latest release is v1.1.119 (2026-06-18).
Key capabilities
From the project's documentation:
- Run open-weight LLMs locally from HuggingFace — Llama, Gemma, Qwen, Mistral, Phi, and others
- Multi-Token Prediction (MTP) speculative decoding — 30–70% throughput boost on supported models, up to 3× on Gemma 4
- DFlash block-diffusion decoding — up to 6× faster on Qwen 3.6, Gemma 4, Kimi K2.5
- Flash Attention toggle (on / off / auto)
- Automatic reasoning-context tracking for chain-of-thought models
- Auto context-window expansion with overflow notifications
How it fits a local-AI stack
Atomic-Chat runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related agent frameworks in the directory:
Sources
- Source code & docs: AtomicBot-ai/Atomic-Chat
- Official website: https://atomic.chat
Stats from GitHub, 2026-06-27.
Frequently asked
Quick answers to common questions
What is Atomic-Chat?
Atomic-Chat is a agent-framework tool for local AI workloads. Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.
Is Atomic-Chat free and open source?
Yes, Atomic-Chat has 982 GitHub stars and is licensed under Other. You can self-host it for free on macos, linux, windows.
What platforms does Atomic-Chat support?
Atomic-Chat runs on macos, linux, windows.
What hardware do I need for Atomic-Chat?
The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. Atomic-Chat has 982 GitHub stars and an active community.
Does Atomic-Chat support GPU acceleration?
Atomic-Chat supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.
What are the best alternatives to Atomic-Chat?
Popular alternatives include other agent-framework tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.
How much does Atomic-Chat cost?
Atomic-Chat is free-open-source. It is completely free and open source to self-host.
Pairs well with
Complementary tools, models, and hardware
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.