lorax social preview
fine-tuning3,790Apache 2.0

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Updated Jun 8, 2026
Platforms
docker
Pricing
free-open-source
Status
active
License
Apache 2.0

What it does

Core capabilities at a glance

  • Fine Tuning
  • GPT
  • Llama
  • LLM Inference
  • LLM Serving
  • Llmops
  • Lora
  • Model Serving

Deep dive

The full breakdown - performance, comparisons, and setup

lorax

lorax is a fine-tuning toolkit - Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.

Overview

LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.

  • ๐Ÿš… Dynamic Adapter Loading: include any fine-tuned LoRA adapter from HuggingFace, Predibase, or any filesystem in your request, it will be loaded just-in-time without blocking concurrent requests. Merge adapters per request to instantly create powerful ensembles. - ๐Ÿ‹๏ธโ€โ™€๏ธ Heterogeneous Continuous Batching: packs requests for different adapters together into the same batch, keeping latency and throughput nearly constant with the number of concurrent adapters. - ๐Ÿง Adapter Exchange Scheduling: asynchronously prefetches and offloads adapters between GPU and CPU memory, schedules request batching to optimize the aggregate throughput of the system. - ๐Ÿ‘ฌ Optimized Inference: high throughput and low latency optimizations including tensor parallelism, pre-compiled CUDA kernels (flash-attention, paged attention, SGMV), quantization, token streaming. - ๐Ÿšข Ready for Production prebuilt Docker images, Helm charts for Kubernetes, Prometheus metrics, and distributed tracing with Open Telemetry. OpenAI compatible API supporting multi-turn chat conversations. Private adapters through per-request tenant isolation. Structured Output (JSON mode). - ๐Ÿคฏ Free for Commercial Use: Apache 2.0 License. Enough said ๐Ÿ˜Ž.

  • Base Model: pretrained large model shared across all adapters. - Adapter: task-specific adapter weights dynamically loaded per request.

lorax is open-source, written primarily in Python, with 3,790 GitHub stars under the Apache 2.0 license. The latest release is lorax-0.4.0 (2025-01-13).

Key capabilities

From the project's documentation:

  • ๐Ÿ“– Table of contents
  • Prompt via REST API
  • Prompt via Python Client
  • Chat via OpenAI API
  • ๐Ÿคฏ Free for Commercial Use: Apache 2.0 License. Enough said ๐Ÿ˜Ž.
  • Base Model: pretrained large model shared across all adapters.

Install

A quick way to get started (always check the official docs for the latest):

pip install lorax-client

How it fits a local-AI stack

lorax runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related fine-tuning toolkits in the directory:

Sources

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is lorax?

lorax is a fine-tuning tool for local AI workloads. Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Is lorax free and open source?

Yes, lorax has 3,790 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on docker.

What platforms does lorax support?

lorax runs on docker.

What hardware do I need for lorax?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. lorax has 3,790 GitHub stars and an active community.

Does lorax support GPU acceleration?

lorax's GPU support depends on your specific setup. Check the documentation for details. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to lorax?

Popular alternatives include other fine-tuning tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does lorax cost?

lorax is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.