Z-Image Turbo in ComfyUI: Best Workflow & Setup Guide

If you’re looking to generate photorealistic AI images at lightning speed, you’ve probably heard the buzz about Z-Image Turbo. Developed by Alibaba’s Tongyi Lab, this model is blowing minds by generating top-tier outputs in just 8 steps. Yes, eight. No more waiting 30+ steps for a decent render.

But here’s the thing: getting it to run perfectly in ComfyUI requires the right nodes, the correct VAEs, and a clean workflow. Unlike older models, Z-Image Turbo uses a Scalable Single-Stream DiT (S3-DiT) architecture, which means your old SDXL workflows simply won’t work out of the box.

In this guide, I’ll break down the setup so it’s idiot-proof and fully tested. We’ll cover both the official base workflow and the highly-regarded Amazing Z-Image Workflow v4.0 (by martin-rizzo). We’ll cover everything from downloading the right format (GGUF vs. SafeTensors) to optimizing for low VRAM and setting up ControlNet. Let’s build!

⚡ Why Use Z-Image Turbo?

Before we start downloading gigabytes of files, let’s talk about why Z-Image Turbo is worth your time and hard drive space compared to FLUX or SDXL:

Insane Speed: It’s a distilled model designed to generate highly detailed images in roughly 8 sampling steps. This means sub-second inference on high-end GPUs and just a few seconds on consumer cards.
Bilingual Text Rendering: It accurately renders complex Chinese and English text — a major struggle for older models and even some newer ones.
Hardware Agnostic Options: Whether you have a 24GB RTX 4090 or a humble 8GB card, there are GGUF and FP8 quantizations ready for you.
Photorealism & Reasoning: It features a built-in Prompt Enhancer, making it excellent at skin textures, fabric details, and understanding complex “concept” prompts without needing extremely verbose descriptions.

⚙️ System Requirements

Here is what you need to run Z-Image Turbo smoothly on your local machine:

Component	Minimum (GGUF/FP8)	Recommended (BF16)
GPU	NVIDIA 8GB VRAM	16GB+ VRAM
RAM	16GB	32GB
OS	Windows / Linux	Windows 10/11

🔽 Step 1: Download Required Models & Checkpoints

Before loading any workflows, you need the actual model files. Depending on your VRAM, you have two paths. Place these files exactly in the directories listed below.

🏆 The High VRAM Path (Recommended: 16GB+ VRAM)

If you want the uncompromised BF16 experience, grab these:

Diffusion Model: z_image_turbo_bf16.safetensors ➡️ ComfyUI/models/diffusion_models/
Text Encoder: qwen_3_4b.safetensors ➡️ ComfyUI/models/text_encoders/
VAE: ae.safetensors (Note: This is similar to the Flux 1 VAE) ➡️ ComfyUI/models/vae/

🧱 The Low VRAM Path (8GB - 12GB VRAM)

If you have less VRAM, we will use GGUF quantizations. This requires the ComfyUI-GGUF custom node.

Diffusion Model: z_image_turbo-Q5_K_S.gguf ➡️ ComfyUI/models/diffusion_models/
Text Encoder: Qwen3-4B.i1-Q5_K_S.gguf ➡️ ComfyUI/models/text_encoders/
VAE: ae.safetensors ➡️ ComfyUI/models/vae/

🔽 Step 2: Choose Your Workflow

You have two main options for running Z-Image Turbo: the official barebones workflow, or a feature-packed community workflow. I recommend starting with the community one.

Option A: Amazing Z-Image Workflow v4.0 (Recommended for Features)

This workflow by martin-rizzo comes pre-configured with a style selector (18 styles!), built-in upscaler, refiner, and custom sigma values for the best results.

Go to the AmazingZImageWorkflow GitHub Repo.
Download the workflow JSON file that matches your VRAM:
- amazing-z-image-a_GGUF.json ➡️ Best for 8GB to 12GB VRAM.
- amazing-z-image-a_SAFETENSORS.json ➡️ Best for 16GB+ VRAM.
Drag and drop the .json file directly onto your ComfyUI canvas.

Note: This workflow requires the rgthree-comfy node. If you see red nodes, install it via ComfyUI Manager.

Option B: Official ComfyUI Base Workflow (Recommended for Modding)

If you prefer a clean slate to build your own complex routing, use the official template.

Download the Official Z-Image-Turbo Workflow JSON.
Drag and drop into ComfyUI.

🎯 Step 3: Recommended Generation Settings

The beauty of Z-Image Turbo is that it’s fast, but it is also very sensitive to settings. If you use standard SDXL settings, your images will look deep-fried.

Here are the golden rules for Z-Image Turbo:

Setting	Recommended Value	Why?
Steps	8	The model is distilled. More steps DO NOT equal better quality. 8 is the absolute sweet spot.
CFG Scale	1.5 - 2.0	Keep it extremely low. Anything above 2.5 usually results in burned, oversaturated images.
Sampler	euler	Euler is fast, reliable, and consistent for this specific architecture.
Resolution	1024x1024	Native resolution. For low VRAM, try 1216x832 if you’re encountering OOM errors.

Important Note on CLIP Settings: When configuring your text encoder node manually, ensure the CLIP type is set to “Lumina 2” to properly load the Qwen 3B file.

🧠 Advanced: Setting up ControlNet (Z-Image-Turbo Fun Union)

Want to control poses, depth, or edges? Z-Image Turbo supports a powerful Union ControlNet model.

Update ComfyUI: Ensure you are on the absolute latest version of ComfyUI.
Download the ControlNet Model: Grab Z-Image-Turbo-Fun-Controlnet-Union.safetensors (check HuggingFace/CivitAI) and place it in ComfyUI/models/controlnet/.
Load the Workflow: Download the Official Z-Image-Turbo Fun Union ControlNet Workflow and drag it into ComfyUI.

This single ControlNet model handles multiple conditions natively, saving you from downloading gigabytes of separate models for Canny, Depth, etc.

🛠 Troubleshooting (The “Idiot-Proof” Rescue Guide)

Things go wrong. It happens. Here is how to fix the most common Z-Image issues in ComfyUI:

Error / Issue	Cause	Solution
Black Images / Pure Noise	Using BF16 model on incompatible GPU, or using the wrong VAE.	Ensure you are using the correct `ae.safetensors` VAE. If on low VRAM, switch to the FP8 or GGUF version.
”Missing Node: GGUFModelLoader”	You didn’t install the GGUF reader.	Open ComfyUI Manager, search for `ComfyUI-GGUF` by city96, install, and restart.
Images look deep-fried/overcooked	CFG is too high, or Steps are too high.	Lower your CFG down to `1.5` or `1.8`. Set steps exactly to `8`.
CUDA Out of Memory (OOM)	VRAM overflow during generation or VAE decoding.	Use the 1216x832 ‘Smaller Image Switch’ in Martin Rizzo’s workflow. Ensure `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128` is in your `.bat` file.

💡 Quick Tips for Better Prompts

Z-Image Turbo has a surprisingly strong understanding of natural language thanks to the Qwen text encoder.

Do’s ✅: Write in complete sentences. “A cinematic photograph of a futuristic city street at night, neon lights reflecting in puddles, with a glowing hologram sign reading ‘Neurocanvas’.”
Don’ts ❌: Don’t use SD 1.5 style keyword dumps. masterpiece, best quality, ultra detailed, neon city, puddle will actually confuse the prompt enhancer and yield worse results.

🔗 Useful Links & Credits

If you want to dive deeper into ComfyUI magic or upgrade your setup, check these out:

🏁 Final Thoughts

Z-Image Turbo represents a massive leap forward for open-source image generation. It bridges the gap between the speed of SDXL-Lightning models and the intricate photorealism of FLUX, without requiring a server farm to run.

By using the Amazing Z-Image Workflow or the official templates, you bypass the frustrating trial-and-error phase of wiring up complex S3-DiT nodes. Got a low VRAM card? The GGUF models keep the dream alive for 8GB GPUs. Update your ComfyUI, grab the workflow, and start prompting!