What it does

Core capabilities at a glance

Cuda
Cuda Kernels
Dflash
Kernel
Llama CPP
Local AI
Luce
Lucebox

Deep dive

The full breakdown - performance, comparisons, and setup

lucebox-hub

lucebox-hub is a local inference server - Fast LLM speculative inference server for consumer hardware.

Overview

Local LLM inference server built for speed. Custom kernels, speculative prefill & decoding. Each optimization in our engine is for specific model family and hardware target.

Each one is self-contained with setup instructions and benchmark notes.

All speedups measured vs vendored llama.cpp ('-fa 1', matching KV quant). Combined = geometric mean √(TTFT × decode) where both phases benched; otherwise the single-phase speedup. Drafters published on huggingface.co/Lucebox.

Reference target: RTX 3090 (Ampere sm_86) — all headline numbers. Other NVIDIA archs auto-detected by CMake / 'setup.py'; AMD HIP backend separate (Strix Halo section).

'server/' (DFlash) builds with CMake 3.18+ and '--recurse-submodules' for 'Luce-Org/llama.cpp@luce-dflash' — no PyTorch needed. 'optimizations/megakernel/' is the only component requiring PyTorch 2.0+ (CUDAExtension links against torch C++ libs). Power-tune: 'sudo nvidia-smi -pl 220' (3090 sweet spot, re-sweep for other cards).

'harness/' contains RTX 3090 client launchers and regression tests for Lucebox server compatibility. Run Lucebox inside Claude Code, Codex, OpenCode, Hermes, Pi, OpenClaw, or Open WebUI, or check if a server change still works with those clients.

All launchers spawn the native C++ HTTP server ('dflash_server'). Override defaults via env vars:

For no-draft targets such as Gemma, set only 'DFLASH_TARGET' or pass 'DRAFT=none'; the harness will not attach the default Qwen draft to a custom target.

Launcher scripts install missing real-client CLIs automatically under '.harness-work/'. To preinstall them yourself:

lucebox-hub is open-source, written primarily in C++, with 2,341 GitHub stars under the Apache 2.0 license. It was last updated on 2026-06-08.

How it fits a local-AI stack

lucebox-hub runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related local inference servers in the directory:

Sources

Source code & docs: Luce-Org/lucebox-hub
Official website: https://www.lucebox.com

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is lucebox-hub?

lucebox-hub is a inference-server tool for local AI workloads. Fast LLM speculative inference server for consumer hardware.

Is lucebox-hub free and open source?

Yes, lucebox-hub has 2,669 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on .

What hardware do I need for lucebox-hub?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. lucebox-hub has 2,669 GitHub stars and an active community.

Does lucebox-hub support GPU acceleration?

lucebox-hub's GPU support depends on your specific setup. Check the documentation for details. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to lucebox-hub?

Popular alternatives include other inference-server tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does lucebox-hub cost?

lucebox-hub is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

lucebox-hub

What it does

Deep dive

lucebox-hub

Overview

How it fits a local-AI stack

Sources

Frequently asked

What is lucebox-hub?

Is lucebox-hub free and open source?

What hardware do I need for lucebox-hub?

Does lucebox-hub support GPU acceleration?

What are the best alternatives to lucebox-hub?

How much does lucebox-hub cost?

Pairs well with

Tools

Models

Hardware