Add Qwen-Image-Layered support for image decomposition into RGBA layers #302

ZimengXiong · 2025-12-22T16:17:02Z

Add Qwen-Image-Layered support for image decomposition into RGBA layers

This PR adds support for the Qwen-Image-Layered model

New CLI mflux-generate-qwen-layered
Decomposes images into N RGBA layers (default 4)
Supports 4, 6, and 8-bit quantization (~29GB with 6-bit vs ~55GB BF16)
Resolution buckets: 640 and 1024

Implements

RGBA-VAE (4-channel) with 3D temporal convolutions for layer handling
Layer3D RoPE: 3D positional encoding [layer, height, width]
Uses base QwenTransformer with extended RoPE for multi-layer sequences
Condition image encoded with layer_index=-1 for proper decomposition

Usage

mflux-generate-qwen-layered \
  --image input.png \
  --layers 4 \
  --steps 50 \
  -q 6 \
  --output-dir ./layers

Output: 4 RGBA PNG files (layer_0.png, layer_1.png, etc.) with transparency.

Requires local weights from Qwen/Qwen-Image-Layered:

~55GB for full BF16 model
~29GB with 6-bit quantization

Closes #299

Quantized weights at https://huggingface.co/zimengxiong/Qwen-Image-Layered-6bit

This PR adds support for the Qwen-Image-Layered model, which decomposes an input image into semantically disentangled RGBA layers for layer-based editing workflows. ## Features - New CLI command: \`mflux-generate-qwen-layered\` - Decomposes images into N RGBA layers (default 4) - Supports 6-bit quantization for ~29GB memory usage (vs 55GB BF16) - Resolution buckets: 640 and 1024 ## Architecture - RGBA-VAE (4-channel) with 3D temporal convolutions for layer handling - Layer3D RoPE: 3D positional encoding [layer, height, width] - Uses base QwenTransformer with extended RoPE for multi-layer sequences - Condition image encoded as layer_index=-1 for proper decomposition ## New Files - \`src/mflux/models/qwen_layered/\` - Full model implementation - \`model/qwen_layered_vae/\` - RGBA-VAE encoder/decoder - \`model/qwen_layered_transformer/\` - Layer3D RoPE - \`weights/\` - Weight mapping and definitions - \`variants/i2l/\` - Image-to-Layers pipeline - \`cli/\` - Command-line interface ## Usage \`\`\`sh mflux-generate-qwen-layered \\ --image input.png \\ --layers 4 \\ --steps 50 \\ -q 6 \\ --output-dir ./layers \`\`\` ## Documentation Added comprehensive documentation to README.md including: - TOC entry - CLI argument reference - Usage examples and tips - Memory requirements Tested on M4 Max 48GB with 6-bit quantization.

filipstrand · 2025-12-22T17:31:13Z

@ZimengXiong Really cool work! Have you compared this implementation directly to Diffusers? For example, for the same initial latent array, does it generate similar looking images?

ZimengXiong · 2025-12-22T17:38:01Z

No I have not, I haven't gotten diffusers to work with quantized models yet (limited on RAM, only 48GB), and diffusers doesn't support true 8/4bit Quantizing on Mac. I could compare with [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) with some of the quantized GGUFs out there like the ComfyUI ones. Out right now, will take a look later this week. Its also REALLY slow because the layers exponentially increase compute time, around 45min per 50its, M4 Max 40c.

…

On Dec 22, 2025, at 09:31:35, Filip Strand ***@***.***> wrote: filipstrand left a comment (filipstrand/mflux#302) <#302 (comment)> @ZimengXiong <https://github.com/ZimengXiong> Really cool work! Have you compared this implementation directly to Diffusers? For example, for the same initial latent array, does it generate similar looking images? — Reply to this email directly, view it on GitHub <#302 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AT7G33FSFOOPJYCT6LTHMT34DATHPAVCNFSM6AAAAACPYQI2OCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMOBTGA2TENBSGE>. You are receiving this because you were mentioned.

filipstrand · 2025-12-22T17:55:16Z

I feel your pain :) Currently I only have 32GB, but I'm waiting for a 256GB M3 Ultra machine which will hopefully arrive in 2 weeks or so, then I'll be able to properly try this out more properly.

azrahello · 2025-12-23T09:13:50Z

I feel your pain :) Currently I only have 32GB, but I'm waiting for a 256GB M3 Ultra machine which will hopefully arrive in 2 weeks or so, then I'll be able to properly try this out more properly.

you should have gotten a few more by my reckoning.. :D. I can't wait, I fear there could be many possible improvements towards qwen and zimage. you did well to get that cut, it seems like an excellent compromise. I can't figure out if it's mlx being 'unripe' for handling qwen image and Z image, but with comfyui the times are more or less similar, with the difference that I can use higher resolutions in the same time and configuration on MPS

filipstrand · 2025-12-23T12:38:12Z

with the difference that I can use higher resolutions in the same time and configuration on MPS

@azrahello Interesting, would like to hear from you if the new VAE tiling strategy helps with this once it is released in v0.14.

anthonywu · 2025-12-24T17:58:26Z

Wow @ZimengXiong - amazing contribution!

ZimengXiong added 3 commits December 22, 2025 08:08

feat: Introduce qwen-image-layered model saving

067ba25

Add chunked saving for low memory, add pre-quantized models to README

a255e4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen-Image-Layered support for image decomposition into RGBA layers #302

Add Qwen-Image-Layered support for image decomposition into RGBA layers #302

Uh oh!

ZimengXiong commented Dec 22, 2025 •

edited

Loading

Uh oh!

filipstrand commented Dec 22, 2025

Uh oh!

ZimengXiong commented Dec 22, 2025 via email •

edited

Loading

Uh oh!

filipstrand commented Dec 22, 2025

Uh oh!

azrahello commented Dec 23, 2025

Uh oh!

filipstrand commented Dec 23, 2025

Uh oh!

anthonywu commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Qwen-Image-Layered support for image decomposition into RGBA layers #302

Are you sure you want to change the base?

Add Qwen-Image-Layered support for image decomposition into RGBA layers #302

Uh oh!

Conversation

ZimengXiong commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Qwen-Image-Layered support for image decomposition into RGBA layers

Usage

Uh oh!

filipstrand commented Dec 22, 2025

Uh oh!

ZimengXiong commented Dec 22, 2025 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipstrand commented Dec 22, 2025

Uh oh!

azrahello commented Dec 23, 2025

Uh oh!

filipstrand commented Dec 23, 2025

Uh oh!

anthonywu commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZimengXiong commented Dec 22, 2025 •

edited

Loading

ZimengXiong commented Dec 22, 2025 via email •

edited

Loading