Skip to content

Conversation

@ZimengXiong
Copy link

@ZimengXiong ZimengXiong commented Dec 22, 2025

Add Qwen-Image-Layered support for image decomposition into RGBA layers

This PR adds support for the Qwen-Image-Layered model

  • New CLI mflux-generate-qwen-layered
  • Decomposes images into N RGBA layers (default 4)
  • Supports 4, 6, and 8-bit quantization (~29GB with 6-bit vs ~55GB BF16)
  • Resolution buckets: 640 and 1024

Implements

  • RGBA-VAE (4-channel) with 3D temporal convolutions for layer handling
  • Layer3D RoPE: 3D positional encoding [layer, height, width]
  • Uses base QwenTransformer with extended RoPE for multi-layer sequences
  • Condition image encoded with layer_index=-1 for proper decomposition

Usage

mflux-generate-qwen-layered \
  --image input.png \
  --layers 4 \
  --steps 50 \
  -q 6 \
  --output-dir ./layers

Output: 4 RGBA PNG files (layer_0.png, layer_1.png, etc.) with transparency.

Requires local weights from Qwen/Qwen-Image-Layered:

  • ~55GB for full BF16 model
  • ~29GB with 6-bit quantization

Closes #299

Image Image Image

Quantized weights at https://huggingface.co/zimengxiong/Qwen-Image-Layered-6bit

This PR adds support for the Qwen-Image-Layered model, which decomposes an input
image into semantically disentangled RGBA layers for layer-based editing workflows.

## Features
- New CLI command: \`mflux-generate-qwen-layered\`
- Decomposes images into N RGBA layers (default 4)
- Supports 6-bit quantization for ~29GB memory usage (vs 55GB BF16)
- Resolution buckets: 640 and 1024

## Architecture
- RGBA-VAE (4-channel) with 3D temporal convolutions for layer handling
- Layer3D RoPE: 3D positional encoding [layer, height, width]
- Uses base QwenTransformer with extended RoPE for multi-layer sequences
- Condition image encoded as layer_index=-1 for proper decomposition

## New Files
- \`src/mflux/models/qwen_layered/\` - Full model implementation
  - \`model/qwen_layered_vae/\` - RGBA-VAE encoder/decoder
  - \`model/qwen_layered_transformer/\` - Layer3D RoPE
  - \`weights/\` - Weight mapping and definitions
  - \`variants/i2l/\` - Image-to-Layers pipeline
  - \`cli/\` - Command-line interface

## Usage
\`\`\`sh
mflux-generate-qwen-layered \\
  --image input.png \\
  --layers 4 \\
  --steps 50 \\
  -q 6 \\
  --output-dir ./layers
\`\`\`

## Documentation
Added comprehensive documentation to README.md including:
- TOC entry
- CLI argument reference
- Usage examples and tips
- Memory requirements

Tested on M4 Max 48GB with 6-bit quantization.
@filipstrand
Copy link
Owner

@ZimengXiong Really cool work! Have you compared this implementation directly to Diffusers? For example, for the same initial latent array, does it generate similar looking images?

@ZimengXiong
Copy link
Author

ZimengXiong commented Dec 22, 2025 via email

@filipstrand
Copy link
Owner

I feel your pain :) Currently I only have 32GB, but I'm waiting for a 256GB M3 Ultra machine which will hopefully arrive in 2 weeks or so, then I'll be able to properly try this out more properly.

@azrahello
Copy link
Contributor

I feel your pain :) Currently I only have 32GB, but I'm waiting for a 256GB M3 Ultra machine which will hopefully arrive in 2 weeks or so, then I'll be able to properly try this out more properly.

you should have gotten a few more by my reckoning.. :D. I can't wait, I fear there could be many possible improvements towards qwen and zimage. you did well to get that cut, it seems like an excellent compromise. I can't figure out if it's mlx being 'unripe' for handling qwen image and Z image, but with comfyui the times are more or less similar, with the difference that I can use higher resolutions in the same time and configuration on MPS

@filipstrand
Copy link
Owner

with the difference that I can use higher resolutions in the same time and configuration on MPS

@azrahello Interesting, would like to hear from you if the new VAE tiling strategy helps with this once it is released in v0.14.

@anthonywu
Copy link
Collaborator

Wow @ZimengXiong - amazing contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]Qwen-Image-Layered

4 participants