What it does
Core capabilities at a glance
- ExLlamaV2 backend for fastest single-GPU inference
- Dynamic model loading and unloading at runtime
- OpenAI-compatible chat and completion API
- LoRA adapter switching without reloading
- Flash Attention and cache quantization
- Jinja2-based chat templates
Deep dive
The full breakdown - performance, comparisons, and setup
TabbyAPI
TabbyAPI combines the raw speed of ExLlamaV2 with a clean, OpenAI-compatible API interface and unique features like dynamic LoRA switching.
What it is
TabbyAPI is a Python API server that wraps ExLlamaV2 into an OpenAI-compatible endpoint. It supports dynamic model loading/unloading, on-the-fly LoRA adapter switching, and full chat completion and embedding endpoints.
Get started
git clone https://github.com/theroyallab/tabbyAPI.git
cd tabbyAPI
pip install -r requirements.txt
python main.pyWhen to use something else
Frequently asked
Quick answers to common questions
What is TabbyAPI?
TabbyAPI is a inference-server tool for local AI workloads. Lightweight, OpenAI-compatible API server for ExLlamaV2 with dynamic model loading and LoRA switching.
Is TabbyAPI free and open source?
Yes, TabbyAPI has 1,239 GitHub stars and is licensed under MIT. You can self-host it for free on linux, windows.
What platforms does TabbyAPI support?
TabbyAPI runs on linux, windows.
What hardware do I need for TabbyAPI?
The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. TabbyAPI has 1,239 GitHub stars and an active community.
Does TabbyAPI support GPU acceleration?
TabbyAPI supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.
What are the best alternatives to TabbyAPI?
Popular alternatives include other inference-server tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.
How much does TabbyAPI cost?
TabbyAPI is free-open-source. It is completely free and open source to self-host.
Pairs well with
Complementary tools, models, and hardware
Comments coming soon
Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.