willow-inference-server social preview
tts-stt503Apache 2.0

willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Updated Jun 8, 2026
Platforms
linux, docker, web
Pricing
free-open-source
Status
active
License
Apache 2.0

What it does

Core capabilities at a glance

  • Cuda
  • Llama
  • Privacy
  • Speech Recognition
  • Speech TO Text
  • Text TO Speech
  • Vicuna
  • Webrtc

Deep dive

The full breakdown - performance, comparisons, and setup

willow-inference-server

willow-inference-server is a speech (TTS/STT) tool - Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS.

Overview

Willow Inference Server (WIS) is a focused and highly optimized language inference server implementation. Our goal is to "automagically" enable performant, cost-effective self-hosting of released state of the art/best of breed models to enable speech and language tasks:

willow-inference-server is open-source, written primarily in Python, with 503 GitHub stars under the Apache 2.0 license. It was last updated on 2026-02-12.

Key capabilities

From the project's documentation:

  • TTS. Primarily provided for assistant tasks (like Willow!) and visually impaired users.
  • Support for a variety of transports. REST, WebRTC, Web Sockets.
  • Performance and memory optimized. Leverages CTranslate2 for Whisper support.
  • Desktop/mobile transcription apps (look out for a future announcement on this!).
  • Desktop/mobile assistant apps - Willow everywhere!

How it fits a local-AI stack

willow-inference-server runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related speech (TTS/STT) tools in the directory:

Sources

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is willow-inference-server?

willow-inference-server is a tts-stt tool for local AI workloads. Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Is willow-inference-server free and open source?

Yes, willow-inference-server has 503 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on linux, docker, web.

What platforms does willow-inference-server support?

willow-inference-server runs on linux, docker, web.

What hardware do I need for willow-inference-server?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. willow-inference-server has 503 GitHub stars and an active community.

Does willow-inference-server support GPU acceleration?

willow-inference-server supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to willow-inference-server?

Popular alternatives include other tts-stt tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does willow-inference-server cost?

willow-inference-server is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.