shimmy social preview
inference-server5,339Apache 2.0

shimmy

⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.

Updated Jun 8, 2026
Platforms
Pricing
free-open-source
Status
active
License
Apache 2.0

What it does

Core capabilities at a glance

  • API Server
  • Command Line Tool
  • Developer Tools
  • Gguf
  • Huggingface
  • Huggingface Models
  • Huggingface Transformers
  • Inference Server

Deep dive

The full breakdown - performance, comparisons, and setup

shimmy

shimmy is a local inference server - ⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.

Overview

Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.

🚀 If Shimmy helps you, consider sponsoring — 100% of support goes to keeping it free forever.

  • $5/month: Coffee tier ☕ - Eternal gratitude + sponsor badge - $25/month: Bug prioritizer 🐛 - Priority support + name in SPONSORS.md - $100/month: Corporate backer 🏢 - Logo placement + monthly office hours - $500/month: Infrastructure partner 🚀 - Direct support + roadmap input

  • What Is Shimmy? - 🔥 Airframe Engine (v2.0) - ⚡ TurboShimmy INT4 KV (v2.1) - 🎯 Supported Models - 📦 Migrating from v1.x - ⚡ Quick Start (30 seconds) - 🚀 OpenAI SDK Compatibility - 🔧 Extended Context - 📥 Download & Install - 🔗 Integration Examples - 📖 API Reference - ❓ FAQ - 🏛️ Technical Architecture - 📚 Documentation Hub - 🌍 Community & Support - ⚡ Performance - License

Shimmy is a single-binary that provides 100% OpenAI-compatible endpoints for GGUF models. Point your existing AI tools to Shimmy and they just work — locally, privately, and free.

🎉 NEW in v2.0.0: Shimmy now runs on Airframe, a pure-Rust WGSL GPU engine. No C++ toolchain, no backend flags, no compilation required.

shimmy is open-source, written primarily in Rust, with 5,339 GitHub stars under the Apache 2.0 license. The latest release is v2.0.1 (2026-05-26).

Key capabilities

From the project's documentation:

  • $5/month: Coffee tier ☕ - Eternal gratitude + sponsor badge
  • $25/month: Bug prioritizer 🐛 - Priority support + name in SPONSORS.md
  • $100/month: Corporate backer 🏢 - Logo placement + monthly office hours
  • $500/month: Infrastructure partner 🚀 - Direct support + roadmap input
  • 🔥 Airframe Engine (v2.0)
  • ⚡ TurboShimmy INT4 KV (v2.1)

Install

A quick way to get started (always check the official docs for the latest):

go install shimmy

How it fits a local-AI stack

shimmy runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related local inference servers in the directory:

Sources

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is shimmy?

shimmy is a inference-server tool for local AI workloads. ⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.

Is shimmy free and open source?

Yes, shimmy has 5,339 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on .

What hardware do I need for shimmy?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. shimmy has 5,339 GitHub stars and an active community.

Does shimmy support GPU acceleration?

shimmy's GPU support depends on your specific setup. Check the documentation for details. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to shimmy?

Popular alternatives include other inference-server tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does shimmy cost?

shimmy is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.