omlx social preview
inference-server16,214Apache 2.0

omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Updated Jun 8, 2026
Platforms
macos
Pricing
free-open-source
Status
active
License
Apache 2.0

What it does

Core capabilities at a glance

  • Apple Silicon
  • Inference Server
  • Macos
  • MLX
  • Openai API

Deep dive

The full breakdown - performance, comparisons, and setup

omlx

omlx is a local inference server - LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar.

Overview

oMLX LLM inference, optimized for your Mac Continuous batching and tiered KV caching, managed directly from your menu bar.

Install · Quickstart · Features · Models · CLI Configuration · Benchmarks · oMLX.ai

Download the '.dmg' from Releases, drag to Applications, done. The app includes in-app auto-update, so future upgrades are just one click. The macOS app also installs a lightweight '~/.omlx/bin/omlx' CLI shim so terminal commands and Apple Shortcuts can control the app-managed server.

Launch oMLX from your Applications folder. The Welcome screen guides you through three steps - model directory, server start, and first model download. That's it. To connect OpenClaw, OpenCode, Codex, Hermes Agent, or Copilot, see Integrations.

The server discovers LLMs, VLMs, embedding models, and rerankers from subdirectories automatically. Any OpenAI-compatible client can connect to 'http://localhost:8000/v1'. A built-in chat UI is also available at 'http://localhost:8000/admin/chat'.

The service runs 'omlx serve' with zero-config defaults ('/.omlx/models', port 8000). 'omlx start', 'omlx stop', and 'omlx restart' are the portable lifecycle commands; Homebrew installs delegate them to 'brew services'. To customize, either set environment variables ('OMLX_MODEL_DIR', 'OMLX_PORT', etc.) or run 'omlx serve --model-dir /your/path' once to persist settings to '/.omlx/settings.json'.

Logs are written to two locations: - Service log: '$(brew --prefix)/var/log/omlx.log' (stdout/stderr) - Server log: '~/.omlx/logs/server.log' (structured application log)

omlx is open-source, written primarily in Python, with 16,209 GitHub stars under the Apache 2.0 license. The latest release is v0.4.2rc1 (2026-06-06).

Key capabilities

From the project's documentation:

  • Service log: $(brew --prefix)/var/log/omlx.log (stdout/stderr)
  • Server log: ~/.omlx/logs/server.log (structured application log)
  • Hot tier (RAM): Frequently accessed blocks stay in memory for fast access.
  • LRU eviction: Least-recently-used models are evicted automatically when memory runs low.
  • Manual load/unload: Interactive status badges in the admin panel let you load or unload models on demand.
  • Model pinning: Pin frequently used models to keep them always loaded.

Install

A quick way to get started (always check the official docs for the latest):

brew install omlx

How it fits a local-AI stack

omlx runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related local inference servers in the directory:

Sources

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is omlx?

omlx is a inference-server tool for local AI workloads. LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Is omlx free and open source?

Yes, omlx has 16,214 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on macos.

What platforms does omlx support?

omlx runs on macos.

What hardware do I need for omlx?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. omlx has 16,214 GitHub stars and an active community.

Does omlx support GPU acceleration?

omlx supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to omlx?

Popular alternatives include other inference-server tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does omlx cost?

omlx is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

Comments coming soon

Configure NEXT_PUBLIC_GISCUS_REPO_ID and NEXT_PUBLIC_GISCUS_CATEGORY_ID at giscus.app to enable.