Local Image Generation (ComfyUI + FLUX)

ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo.

The short answer

Local Image Generation (ComfyUI + FLUX) is a local AI stack for Generate photorealistic images locally with node-based AI workflows. ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo. It combines 4 components, is rated intermediate, and takes about 20 minutes to set up. Expect around $1,600 in hardware and $0/month versus cloud.

Cost
~$1,600
$0/mo vs cloud
Difficulty
intermediate
Setup time
~20 min
Use case
Generate photorealistic images locally with node-based AI workflows
ToolsComfyui
ModelsFlux 1 Dev

~$1,600 hardware · $0/mo vs cloud

Local Image Generation (ComfyUI + FLUX)

A professional-grade image generation pipeline that runs entirely on your own hardware. ComfyUI is a powerful node-based workflow engine for AI image, video, and 3D generation. Paired with FLUX.1 Dev - Black Forest Labs' 12B parameter text-to-image model - you get output that rivals Midjourney and DALL-E, with complete creative control and zero data leaving your machine.

What you get

  • Node-based workflow editor - visually construct complex image generation pipelines
  • Photorealistic output - FLUX.1 Dev produces stunning 1024x1024 images from text prompts
  • 5,000+ community nodes - LoRA, ControlNet, IP-Adapter, AnimateDiff, and more
  • Queue-based generation - batch multiple prompts with per-node caching
  • Workflow sharing - export/import workflows as JSON from the community
  • API mode - integrate with n8n, custom apps, or automation pipelines
  • $0/month - after the GPU, every image is free

Architecture

ComponentRole
ComfyUINode-based workflow engine and UI
FLUX.1 Dev12B text-to-image model, photorealistic output
Custom nodes (as needed)LoRA, ControlNet, upscalers, video extensions

FLUX.1 Dev is a 12B parameter model that needs significant VRAM. Recommended: RTX 4090 24GB for comfortable use, RTX 5090 for faster generation. Can run on 12GB with quantized versions.

Prerequisites

  • A GPU with ≥12 GB VRAM (24GB recommended for FLUX at full quality)
  • 20 GB free disk for the FLUX model files
  • Python 3.10+ or the ComfyUI desktop app
  • Git (for custom nodes)

Setup

Option A: Desktop App (Easiest)

  1. Download the ComfyUI desktop app from comfy.org
  2. Install and launch it
  3. Use the built-in model manager to download FLUX.1 Dev

Option B: Manual Install

git clone https://github.com/Comfy-Org/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Download the FLUX model files:

# Create model directories
mkdir -p models/unet models/clip models/vae
 
# Download FLUX.1 Dev (requires HuggingFace login)
# Models go in:
# - models/unet/flux1-dev.safetensors
# - models/clip/clip_l.safetensors
# - models/clip/t5xxl_fp16.safetensors
# - models/vae/ae.safetensors

Launch ComfyUI:

python main.py

Open http://localhost:8188 to access the ComfyUI interface.

Option C: Docker

docker run -d --gpus all -p 8188:8188 \
  --name comfyui \
  -v comfyui_models:/app/models \
  comfyui/comfyui:latest

Using the Basic FLUX Workflow

  1. Open http://localhost:8188

  2. Load the default workflow or create a new one

  3. Add these nodes:

    • Checkpoint Loader → load flux1-dev.safetensors
    • CLIP Text Encoder → enter your prompt (e.g., "a photorealistic cat sitting on a vintage leather chair, warm lighting, depth of field")
    • KSampler → connect model, CLIP, and empty latent
    • VAE Decode → decode the latent to an image
    • Save Image → save the result
  4. Click Queue Prompt to generate

Prompt Tips

  • FLUX responds well to natural language descriptions
  • Add style cues: "photorealistic", "cinematic lighting", "8K", "macro photography"
  • Negative prompts work differently in FLUX - use shorter negative prompts than SD models

Advanced Workflows

LoRA Loading

Add a LoRA Loader node between the checkpoint and the model input. LoRA files go in models/loras/.

Image-to-Image

Replace the Empty Latent Image with a VAE Encode node connected to your input image.

ControlNet

Add a ControlNet Loader and Apply ControlNet node for pose/edge guidance.

Video Generation

Install Video Nodes (via ComfyUI Manager) and pair with models like Wan2.1 for AI video.

Cost vs cloud

Local ComfyUI + FLUXMidjourney / DALL-E
Monthly$0$10-60
Per image$0$0.04-0.12
Hardware~$1600 once (4090)$0
Data privacyStays on your GPUSent to cloud
ControlFull node-levelLimited
Batch genUnlimited, freeRate-limited
ResolutionAny (VRAM permitting)Fixed sizes

If you generate 100+ images/month, a 4090 pays for itself in about 2 years versus Midjourney Pro. For power users generating 1000+/month, it pays off in months.

Troubleshooting

  • Out of memory → FLUX needs ~20GB VRAM at full precision. Use the FP8 quantized version to fit in 12GB. Lower resolution to 768x768.
  • No images showing → Check the console output for errors. The VAE decoder step is often the bottleneck.
  • Slow generation → FLUX.1 Dev takes 30-60s per image on a 4090. For faster results, use FLUX.1 Schnell (4-step distilled version).
  • Missing model files → Use the ComfyUI Manager node to download models from within the interface.
  • CORS errors in API mode → Set --listen 0.0.0.0 and configure the API settings in extra_model_paths.yaml.

Swap components

  • Faster generation → Use FLUX.1 Schnell (4-step distilled, 4x faster)
  • Video generation → Add Wan2.1 or HunyuanVideo nodes
  • Alternative UI → Try SwarmUI for a simpler interface on top of ComfyUI
  • Lower VRAM → Use SDXL or SD3.5 for 8GB cards
  • n8n integration → Use ComfyUI's API mode with n8n for automated generation pipelines

Frequently asked

What is the Local Image Generation (ComfyUI + FLUX) stack for?

ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo. It is purpose-built for Generate photorealistic images locally with node-based AI workflows and runs entirely on your own hardware.

How much does the Local Image Generation (ComfyUI + FLUX) stack cost?

Local Image Generation (ComfyUI + FLUX) costs around $1,600 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.

How long does it take to set up Local Image Generation (ComfyUI + FLUX)?

Plan for roughly 20 minutes. The stack is rated intermediate.

What do I need to run Local Image Generation (ComfyUI + FLUX)?

Local Image Generation (ComfyUI + FLUX) is built from 1 tool(s), 1 model(s), 2 hardware item(s). Each is listed below with a link.