What it does

Core capabilities at a glance

Apple Silicon
Inference Server
Macos
MLX
Openai API

Deep dive

The full breakdown - performance, comparisons, and setup

omlx

omlx is a local inference server - LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar.

Overview

oMLX LLM inference, optimized for your Mac Continuous batching and tiered KV caching, managed directly from your menu bar.

Install · Quickstart · Features · Models · CLI Configuration · Benchmarks · oMLX.ai

Download the '.dmg' from Releases, drag to Applications, done. The app includes in-app auto-update, so future upgrades are just one click. The macOS app also installs a lightweight '~/.omlx/bin/omlx' CLI shim so terminal commands and Apple Shortcuts can control the app-managed server.

Launch oMLX from your Applications folder. The Welcome screen guides you through three steps - model directory, server start, and first model download. That's it. To connect OpenClaw, OpenCode, Codex, Hermes Agent, or Copilot, see Integrations.

The server discovers LLMs, VLMs, embedding models, and rerankers from subdirectories automatically. Any OpenAI-compatible client can connect to 'http://localhost:8000/v1'. A built-in chat UI is also available at 'http://localhost:8000/admin/chat'.

The service runs 'omlx serve' with zero-config defaults ('/.omlx/models', port 8000). 'omlx start', 'omlx stop', and 'omlx restart' are the portable lifecycle commands; Homebrew installs delegate them to 'brew services'. To customize, either set environment variables ('OMLX_MODEL_DIR', 'OMLX_PORT', etc.) or run 'omlx serve --model-dir /your/path' once to persist settings to '/.omlx/settings.json'.

Logs are written to two locations: - Service log: '$(brew --prefix)/var/log/omlx.log' (stdout/stderr) - Server log: '~/.omlx/logs/server.log' (structured application log)

omlx is open-source, written primarily in Python, with 16,209 GitHub stars under the Apache 2.0 license. The latest release is v0.4.2rc1 (2026-06-06).

Key capabilities

From the project's documentation:

Service log: $(brew --prefix)/var/log/omlx.log (stdout/stderr)
Server log: ~/.omlx/logs/server.log (structured application log)
Hot tier (RAM): Frequently accessed blocks stay in memory for fast access.
LRU eviction: Least-recently-used models are evicted automatically when memory runs low.
Manual load/unload: Interactive status badges in the admin panel let you load or unload models on demand.
Model pinning: Pin frequently used models to keep them always loaded.

Install

A quick way to get started (always check the official docs for the latest):

brew install omlx

How it fits a local-AI stack

omlx runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related local inference servers in the directory:

Sources

Source code & docs: jundot/omlx
Official website: https://omlx.ai

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is omlx?

omlx is a inference-server tool for local AI workloads. LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Is omlx free and open source?

Yes, omlx has 18,081 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on macos.

What platforms does omlx support?

omlx runs on macos.

What hardware do I need for omlx?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. omlx has 18,081 GitHub stars and an active community.

Does omlx support GPU acceleration?

omlx supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to omlx?

Popular alternatives include other inference-server tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does omlx cost?

omlx is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

omlx

What it does

Deep dive

omlx

Overview

Key capabilities

Install

How it fits a local-AI stack

Sources

Frequently asked

What is omlx?

Is omlx free and open source?

What platforms does omlx support?

What hardware do I need for omlx?

Does omlx support GPU acceleration?

What are the best alternatives to omlx?

How much does omlx cost?

Pairs well with

Tools

Models

Hardware