-
Notifications
You must be signed in to change notification settings - Fork 111
Add Qwen-Image-Layered support for image decomposition into RGBA layers #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds support for the Qwen-Image-Layered model, which decomposes an input image into semantically disentangled RGBA layers for layer-based editing workflows. ## Features - New CLI command: \`mflux-generate-qwen-layered\` - Decomposes images into N RGBA layers (default 4) - Supports 6-bit quantization for ~29GB memory usage (vs 55GB BF16) - Resolution buckets: 640 and 1024 ## Architecture - RGBA-VAE (4-channel) with 3D temporal convolutions for layer handling - Layer3D RoPE: 3D positional encoding [layer, height, width] - Uses base QwenTransformer with extended RoPE for multi-layer sequences - Condition image encoded as layer_index=-1 for proper decomposition ## New Files - \`src/mflux/models/qwen_layered/\` - Full model implementation - \`model/qwen_layered_vae/\` - RGBA-VAE encoder/decoder - \`model/qwen_layered_transformer/\` - Layer3D RoPE - \`weights/\` - Weight mapping and definitions - \`variants/i2l/\` - Image-to-Layers pipeline - \`cli/\` - Command-line interface ## Usage \`\`\`sh mflux-generate-qwen-layered \\ --image input.png \\ --layers 4 \\ --steps 50 \\ -q 6 \\ --output-dir ./layers \`\`\` ## Documentation Added comprehensive documentation to README.md including: - TOC entry - CLI argument reference - Usage examples and tips - Memory requirements Tested on M4 Max 48GB with 6-bit quantization.
|
@ZimengXiong Really cool work! Have you compared this implementation directly to Diffusers? For example, for the same initial latent array, does it generate similar looking images? |
|
No I have not, I haven't gotten diffusers to work with quantized models yet (limited on RAM, only 48GB), and diffusers doesn't support true 8/4bit Quantizing on Mac. I could compare with [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) with some of the quantized GGUFs out there like the ComfyUI ones. Out right now, will take a look later this week. Its also REALLY slow because the layers exponentially increase compute time, around 45min per 50its, M4 Max 40c.
… On Dec 22, 2025, at 09:31:35, Filip Strand ***@***.***> wrote:
filipstrand
left a comment
(filipstrand/mflux#302)
<#302 (comment)>
@ZimengXiong <https://github.com/ZimengXiong> Really cool work! Have you compared this implementation directly to Diffusers? For example, for the same initial latent array, does it generate similar looking images?
—
Reply to this email directly, view it on GitHub <#302 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AT7G33FSFOOPJYCT6LTHMT34DATHPAVCNFSM6AAAAACPYQI2OCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMOBTGA2TENBSGE>.
You are receiving this because you were mentioned.
|
|
I feel your pain :) Currently I only have 32GB, but I'm waiting for a 256GB M3 Ultra machine which will hopefully arrive in 2 weeks or so, then I'll be able to properly try this out more properly. |
you should have gotten a few more by my reckoning.. :D. I can't wait, I fear there could be many possible improvements towards qwen and zimage. you did well to get that cut, it seems like an excellent compromise. I can't figure out if it's mlx being 'unripe' for handling qwen image and Z image, but with comfyui the times are more or less similar, with the difference that I can use higher resolutions in the same time and configuration on MPS |
@azrahello Interesting, would like to hear from you if the new VAE tiling strategy helps with this once it is released in v0.14. |
|
Wow @ZimengXiong - amazing contribution! |
Add Qwen-Image-Layered support for image decomposition into RGBA layers
This PR adds support for the Qwen-Image-Layered model
mflux-generate-qwen-layeredImplements
[layer, height, width]QwenTransformerwith extended RoPE for multi-layer sequenceslayer_index=-1for proper decompositionUsage
Output: 4 RGBA PNG files (
layer_0.png,layer_1.png, etc.) with transparency.Requires local weights from
Qwen/Qwen-Image-Layered:Closes #299
Quantized weights at https://huggingface.co/zimengxiong/Qwen-Image-Layered-6bit