Deep dive

The full breakdown - performance, comparisons, and setup

exo

exo is a local-AI tool - Run frontier AI locally.

Overview

exo connects all your devices into an AI cluster. Not only does exo enable running models larger than would fit on a single device, but with day-0 support for RDMA over Thunderbolt, makes models run faster as you add more devices.

Automatic Device Discovery: Devices running exo automatically discover each other - no manual configuration. - RDMA over Thunderbolt: exo ships with day-0 support for RDMA over Thunderbolt 5, enabling 99% reduction in latency between devices. - Topology-Aware Auto Parallel: exo figures out the best way to split your model across all available devices based on a realtime view of your device topology. It takes into account device resources and network latency/bandwidth between each link. - Tensor Parallelism: exo supports sharding models, for up to 1.8x speedup on 2 devices and 3.2x speedup on 4 devices. - MLX Support: exo uses MLX as an inference backend and MLX distributed for distributed communication. - Multiple API Compatibility: Compatible with OpenAI Chat Completions API, Claude Messages API, OpenAI Responses API, and Ollama API - use your existing tools and clients. - Custom Model Support: Load custom models from HuggingFace hub to expand the range of available models.

exo includes a built-in dashboard for managing your cluster and chatting with models.

exo is open-source, written primarily in Python, with 45,208 GitHub stars under the Apache 2.0 license. The latest release is v1.0.71 (2026-04-23).

Key capabilities

From the project's documentation:

Automatic Device Discovery: Devices running exo automatically discover each other - no manual configuration.
Tensor Parallelism: exo supports sharding models, for up to 1.8x speedup on 2 devices and 3.2x speedup on 4 devices.
Custom Model Support: Load custom models from HuggingFace hub to expand the range of available models.
Xcode (provides the Metal ToolChain required for MLX compilation)
brew (for simple package management on macOS)
uv (for Python dependency management)

Install

A quick way to get started (always check the official docs for the latest):

brew install uv node

How it fits a local-AI stack

exo runs on your own hardware, so pair it with a model and a GPU sized to your needs. Use the VRAM calculator to pick a model that fits your card, and see what you can run for hardware guidance. Related local-AI tools in the directory:

Sources

Source code & docs: exo-explore/exo

Stats from GitHub, 2026-06-08.

Frequently asked

Quick answers to common questions

What is exo?

exo is a other tool for local AI workloads. Run frontier AI locally.

Is exo free and open source?

Yes, exo has 46,421 GitHub stars and is licensed under Apache 2.0. You can self-host it for free on macos.

What platforms does exo support?

exo runs on macos.

What hardware do I need for exo?

The hardware requirements depend on which models you run. Check our hardware directory for compatible GPUs and systems. exo has 46,421 GitHub stars and an active community.

Does exo support GPU acceleration?

exo supports GPU acceleration via CUDA, Metal, or Vulkan depending on your platform. For the best performance, pair it with an NVIDIA RTX 4090 or 5090.

What are the best alternatives to exo?

Popular alternatives include other other tools in our directory. Browse our full collection at /tool for comparisons, community reviews, and benchmark data to find the right fit for your workflow.

How much does exo cost?

exo is free-open-source. It is completely free and open source to self-host.

Pairs well with

Complementary tools, models, and hardware

Similar tools

More tools like this one

algernon - Same category: other blinko - Same category: other Continue - Same category: other farfalle - Same category: other lorax - fine-tuning METATRON - other

exo