If you’re looking to generate photorealistic AI images at lightning speed, you’ve probably heard the buzz about Z-Image Turbo. Developed by Alibaba’s Tongyi Lab, this model is blowing minds by generating top-tier outputs in just 8 steps. Yes, eight. No more waiting 30+ steps for a decent render.
But here’s the thing: getting it to run perfectly in ComfyUI requires the right nodes, the correct VAEs, and a clean workflow. Unlike older models, Z-Image Turbo uses a Scalable Single-Stream DiT (S3-DiT) architecture, which means your old SDXL workflows simply won’t work out of the box.
In this guide, I’ll break down the setup so it’s idiot-proof and fully tested. We’ll cover both the official base workflow and the highly-regarded Amazing Z-Image Workflow v4.0 (by martin-rizzo). We’ll cover everything from downloading the right format (GGUF vs. SafeTensors) to optimizing for low VRAM and setting up ControlNet. Let’s build!
⚡ Why Use Z-Image Turbo?
Before we start downloading gigabytes of files, let’s talk about why Z-Image Turbo is worth your time and hard drive space compared to FLUX or SDXL:
- Insane Speed: It’s a distilled model designed to generate highly detailed images in roughly 8 sampling steps. This means sub-second inference on high-end GPUs and just a few seconds on consumer cards.
- Bilingual Text Rendering: It accurately renders complex Chinese and English text — a major struggle for older models and even some newer ones.
- Hardware Agnostic Options: Whether you have a 24GB RTX 4090 or a humble 8GB card, there are GGUF and FP8 quantizations ready for you.
- Photorealism & Reasoning: It features a built-in Prompt Enhancer, making it excellent at skin textures, fabric details, and understanding complex “concept” prompts without needing extremely verbose descriptions.
⚙️ System Requirements
Here is what you need to run Z-Image Turbo smoothly on your local machine:
| Component | Minimum (GGUF/FP8) | Recommended (BF16) |
|---|---|---|
| GPU | NVIDIA 8GB VRAM | 16GB+ VRAM |
| RAM | 16GB | 32GB |
| OS | Windows / Linux | Windows 10/11 |
🔽 Step 1: Download Required Models & Checkpoints
Before loading any workflows, you need the actual model files. Depending on your VRAM, you have two paths. Place these files exactly in the directories listed below.
🏆 The High VRAM Path (Recommended: 16GB+ VRAM)
If you want the uncompromised BF16 experience, grab these:
- Diffusion Model: z_image_turbo_bf16.safetensors ➡️
ComfyUI/models/diffusion_models/ - Text Encoder: qwen_3_4b.safetensors ➡️
ComfyUI/models/text_encoders/ - VAE: ae.safetensors (Note: This is similar to the Flux 1 VAE) ➡️
ComfyUI/models/vae/
🧱 The Low VRAM Path (8GB - 12GB VRAM)
If you have less VRAM, we will use GGUF quantizations. This requires the ComfyUI-GGUF custom node.
- Diffusion Model: z_image_turbo-Q5_K_S.gguf ➡️
ComfyUI/models/diffusion_models/ - Text Encoder: Qwen3-4B.i1-Q5_K_S.gguf ➡️
ComfyUI/models/text_encoders/ - VAE: ae.safetensors ➡️
ComfyUI/models/vae/
🔽 Step 2: Choose Your Workflow
You have two main options for running Z-Image Turbo: the official barebones workflow, or a feature-packed community workflow. I recommend starting with the community one.
Option A: Amazing Z-Image Workflow v4.0 (Recommended for Features)
This workflow by martin-rizzo comes pre-configured with a style selector (18 styles!), built-in upscaler, refiner, and custom sigma values for the best results.
- Go to the AmazingZImageWorkflow GitHub Repo.
- Download the workflow JSON file that matches your VRAM:
amazing-z-image-a_GGUF.json➡️ Best for 8GB to 12GB VRAM.amazing-z-image-a_SAFETENSORS.json➡️ Best for 16GB+ VRAM.
- Drag and drop the
.jsonfile directly onto your ComfyUI canvas.
Note: This workflow requires the rgthree-comfy node. If you see red nodes, install it via ComfyUI Manager.
Option B: Official ComfyUI Base Workflow (Recommended for Modding)
If you prefer a clean slate to build your own complex routing, use the official template.
- Download the Official Z-Image-Turbo Workflow JSON.
- Drag and drop into ComfyUI.
🎯 Step 3: Recommended Generation Settings
The beauty of Z-Image Turbo is that it’s fast, but it is also very sensitive to settings. If you use standard SDXL settings, your images will look deep-fried.
Here are the golden rules for Z-Image Turbo:
| Setting | Recommended Value | Why? |
|---|---|---|
| Steps | 8 | The model is distilled. More steps DO NOT equal better quality. 8 is the absolute sweet spot. |
| CFG Scale | 1.5 - 2.0 | Keep it extremely low. Anything above 2.5 usually results in burned, oversaturated images. |
| Sampler | euler | Euler is fast, reliable, and consistent for this specific architecture. |
| Resolution | 1024x1024 | Native resolution. For low VRAM, try 1216x832 if you’re encountering OOM errors. |
Important Note on CLIP Settings: When configuring your text encoder node manually, ensure the CLIP type is set to “Lumina 2” to properly load the Qwen 3B file.
🧠 Advanced: Setting up ControlNet (Z-Image-Turbo Fun Union)
Want to control poses, depth, or edges? Z-Image Turbo supports a powerful Union ControlNet model.
- Update ComfyUI: Ensure you are on the absolute latest version of ComfyUI.
- Download the ControlNet Model: Grab
Z-Image-Turbo-Fun-Controlnet-Union.safetensors(check HuggingFace/CivitAI) and place it inComfyUI/models/controlnet/. - Load the Workflow: Download the Official Z-Image-Turbo Fun Union ControlNet Workflow and drag it into ComfyUI.
This single ControlNet model handles multiple conditions natively, saving you from downloading gigabytes of separate models for Canny, Depth, etc.
🛠 Troubleshooting (The “Idiot-Proof” Rescue Guide)
Things go wrong. It happens. Here is how to fix the most common Z-Image issues in ComfyUI:
| Error / Issue | Cause | Solution |
|---|---|---|
| Black Images / Pure Noise | Using BF16 model on incompatible GPU, or using the wrong VAE. | Ensure you are using the correct ae.safetensors VAE. If on low VRAM, switch to the FP8 or GGUF version. |
| ”Missing Node: GGUFModelLoader” | You didn’t install the GGUF reader. | Open ComfyUI Manager, search for ComfyUI-GGUF by city96, install, and restart. |
| Images look deep-fried/overcooked | CFG is too high, or Steps are too high. | Lower your CFG down to 1.5 or 1.8. Set steps exactly to 8. |
| CUDA Out of Memory (OOM) | VRAM overflow during generation or VAE decoding. | Use the 1216x832 ‘Smaller Image Switch’ in Martin Rizzo’s workflow. Ensure PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 is in your .bat file. |
💡 Quick Tips for Better Prompts
Z-Image Turbo has a surprisingly strong understanding of natural language thanks to the Qwen text encoder.
- Do’s ✅: Write in complete sentences. “A cinematic photograph of a futuristic city street at night, neon lights reflecting in puddles, with a glowing hologram sign reading ‘Neurocanvas’.”
- Don’ts ❌: Don’t use SD 1.5 style keyword dumps.
masterpiece, best quality, ultra detailed, neon city, puddlewill actually confuse the prompt enhancer and yield worse results.
🔗 Useful Links & Credits
If you want to dive deeper into ComfyUI magic or upgrade your setup, check these out:
- Install FLUX in ComfyUI: Complete Setup Guide
- Stable Diffusion Prompting Guide
- ComfyUI ControlNet & Node Extensions Guide
- AmazingZImageWorkflow on GitHub
- Official ComfyUI Docs for Z-Image
🏁 Final Thoughts
Z-Image Turbo represents a massive leap forward for open-source image generation. It bridges the gap between the speed of SDXL-Lightning models and the intricate photorealism of FLUX, without requiring a server farm to run.
By using the Amazing Z-Image Workflow or the official templates, you bypass the frustrating trial-and-error phase of wiring up complex S3-DiT nodes. Got a low VRAM card? The GGUF models keep the dream alive for 8GB GPUs. Update your ComfyUI, grab the workflow, and start prompting!