Local Image Generation (ComfyUI + FLUX)
ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo.
Local Image Generation (ComfyUI + FLUX) is a local AI stack for Generate photorealistic images locally with node-based AI workflows. ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo. It combines 4 components, is rated intermediate, and takes about 20 minutes to set up. Expect around $1,600 in hardware and $0/month versus cloud.
- Cost
- ~$1,600
- $0/mo vs cloud
- Difficulty
- intermediate
- Setup time
- ~20 min
- Use case
- Generate photorealistic images locally with node-based AI workflows
Local Image Generation (ComfyUI + FLUX)
A professional-grade image generation pipeline that runs entirely on your own hardware. ComfyUI is a powerful node-based workflow engine for AI image, video, and 3D generation. Paired with FLUX.1 Dev - Black Forest Labs' 12B parameter text-to-image model - you get output that rivals Midjourney and DALL-E, with complete creative control and zero data leaving your machine.
What you get
- Node-based workflow editor - visually construct complex image generation pipelines
- Photorealistic output - FLUX.1 Dev produces stunning 1024x1024 images from text prompts
- 5,000+ community nodes - LoRA, ControlNet, IP-Adapter, AnimateDiff, and more
- Queue-based generation - batch multiple prompts with per-node caching
- Workflow sharing - export/import workflows as JSON from the community
- API mode - integrate with n8n, custom apps, or automation pipelines
- $0/month - after the GPU, every image is free
Architecture
| Component | Role |
|---|---|
| ComfyUI | Node-based workflow engine and UI |
| FLUX.1 Dev | 12B text-to-image model, photorealistic output |
| Custom nodes (as needed) | LoRA, ControlNet, upscalers, video extensions |
FLUX.1 Dev is a 12B parameter model that needs significant VRAM. Recommended: RTX 4090 24GB for comfortable use, RTX 5090 for faster generation. Can run on 12GB with quantized versions.
Prerequisites
- A GPU with ≥12 GB VRAM (24GB recommended for FLUX at full quality)
- 20 GB free disk for the FLUX model files
- Python 3.10+ or the ComfyUI desktop app
- Git (for custom nodes)
Setup
Option A: Desktop App (Easiest)
- Download the ComfyUI desktop app from comfy.org
- Install and launch it
- Use the built-in model manager to download FLUX.1 Dev
Option B: Manual Install
git clone https://github.com/Comfy-Org/ComfyUI.git
cd ComfyUI
pip install -r requirements.txtDownload the FLUX model files:
# Create model directories
mkdir -p models/unet models/clip models/vae
# Download FLUX.1 Dev (requires HuggingFace login)
# Models go in:
# - models/unet/flux1-dev.safetensors
# - models/clip/clip_l.safetensors
# - models/clip/t5xxl_fp16.safetensors
# - models/vae/ae.safetensorsLaunch ComfyUI:
python main.pyOpen http://localhost:8188 to access the ComfyUI interface.
Option C: Docker
docker run -d --gpus all -p 8188:8188 \
--name comfyui \
-v comfyui_models:/app/models \
comfyui/comfyui:latestUsing the Basic FLUX Workflow
-
Load the default workflow or create a new one
-
Add these nodes:
- Checkpoint Loader → load
flux1-dev.safetensors - CLIP Text Encoder → enter your prompt (e.g., "a photorealistic cat sitting on a vintage leather chair, warm lighting, depth of field")
- KSampler → connect model, CLIP, and empty latent
- VAE Decode → decode the latent to an image
- Save Image → save the result
- Checkpoint Loader → load
-
Click Queue Prompt to generate
Prompt Tips
- FLUX responds well to natural language descriptions
- Add style cues: "photorealistic", "cinematic lighting", "8K", "macro photography"
- Negative prompts work differently in FLUX - use shorter negative prompts than SD models
Advanced Workflows
LoRA Loading
Add a LoRA Loader node between the checkpoint and the model input. LoRA files go in models/loras/.
Image-to-Image
Replace the Empty Latent Image with a VAE Encode node connected to your input image.
ControlNet
Add a ControlNet Loader and Apply ControlNet node for pose/edge guidance.
Video Generation
Install Video Nodes (via ComfyUI Manager) and pair with models like Wan2.1 for AI video.
Cost vs cloud
| Local ComfyUI + FLUX | Midjourney / DALL-E | |
|---|---|---|
| Monthly | $0 | $10-60 |
| Per image | $0 | $0.04-0.12 |
| Hardware | ~$1600 once (4090) | $0 |
| Data privacy | Stays on your GPU | Sent to cloud |
| Control | Full node-level | Limited |
| Batch gen | Unlimited, free | Rate-limited |
| Resolution | Any (VRAM permitting) | Fixed sizes |
If you generate 100+ images/month, a 4090 pays for itself in about 2 years versus Midjourney Pro. For power users generating 1000+/month, it pays off in months.
Troubleshooting
- Out of memory → FLUX needs ~20GB VRAM at full precision. Use the FP8 quantized version to fit in 12GB. Lower resolution to 768x768.
- No images showing → Check the console output for errors. The VAE decoder step is often the bottleneck.
- Slow generation → FLUX.1 Dev takes 30-60s per image on a 4090. For faster results, use FLUX.1 Schnell (4-step distilled version).
- Missing model files → Use the ComfyUI Manager node to download models from within the interface.
- CORS errors in API mode → Set
--listen 0.0.0.0and configure the API settings inextra_model_paths.yaml.
Swap components
- Faster generation → Use FLUX.1 Schnell (4-step distilled, 4x faster)
- Video generation → Add Wan2.1 or HunyuanVideo nodes
- Alternative UI → Try SwarmUI for a simpler interface on top of ComfyUI
- Lower VRAM → Use SDXL or SD3.5 for 8GB cards
- n8n integration → Use ComfyUI's API mode with n8n for automated generation pipelines
Frequently asked
What is the Local Image Generation (ComfyUI + FLUX) stack for?
ComfyUI + FLUX.1 Dev = professional-grade image generation on your own GPU. Node-based workflow, photorealistic output, full privacy. Runs on a 12GB GPU for $0/mo. It is purpose-built for Generate photorealistic images locally with node-based AI workflows and runs entirely on your own hardware.
How much does the Local Image Generation (ComfyUI + FLUX) stack cost?
Local Image Generation (ComfyUI + FLUX) costs around $1,600 in hardware up front and $0/month to run, since everything is self-hosted — no per-token or subscription fees versus a cloud equivalent.
How long does it take to set up Local Image Generation (ComfyUI + FLUX)?
Plan for roughly 20 minutes. The stack is rated intermediate.
What do I need to run Local Image Generation (ComfyUI + FLUX)?
Local Image Generation (ComfyUI + FLUX) is built from 1 tool(s), 1 model(s), 2 hardware item(s). Each is listed below with a link.