-
Notifications
You must be signed in to change notification settings - Fork 111
feat: Add Chroma1-HD model support #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Dfunk55
wants to merge
8
commits into
filipstrand:main
Choose a base branch
from
Dfunk55:feat/add-chroma-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified FLUX.1-schnell with DistilledGuidanceLayer for efficient inference. Key features: - DistilledGuidanceLayer: Pre-computes 344 modulations upfront - T5-only text encoding (no CLIP required) - Support for negative prompts - 4-bit and 8-bit quantization - Save/load quantized models with mflux-save New CLI command: mflux-generate-chroma Usage: mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png Note: LoRA support not yet implemented for Chroma.
- Create ChromaLoRAMapping with targets for joint and single transformer blocks - Support BFL/Kohya format LoRA weights with QKV split transforms - Exclude norm layers (norm1.linear, norm1_context.linear, norm.linear) that don't exist in Chroma's DistilledGuidanceLayer architecture - Add lora_paths and lora_scales parameters to Chroma class - Enable --lora-paths and --lora-scales CLI arguments - Add 16 unit tests for mapping coverage and exclusions Tested with semiosphere/the_artist_for_chromaHD (684/684 keys matched) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add support for Meituan's LongCat-Image model (meituan-longcat/LongCat-Image): - Implement LongCat transformer architecture with 24 joint blocks and 12 single blocks using hidden_size=3072 and num_attention_heads=24 - Add Qwen-based text encoder integration via qwen2_vl tokenizer - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-longcat - Add comprehensive tests for transformer, weight loading, LoRA, and initializer validation Model specifications: - Uses Flow Matching scheduler (no sigma shift) - 16-channel VAE - Supports guidance with distilled guidance embedding - 512 max sequence length Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add support for Black Forest Labs' FLUX.2-schnell model: - Implement FLUX.2 transformer with 38 double blocks and 58 single blocks - Add 32-channel VAE with modified scaling factors - Integrate Mistral3-based text encoder with sliding window attention and 32K max position embeddings - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-flux2 - Add comprehensive tests for VAE, encoder, weight mapping, quantization, and LoRA Model specifications: - Uses rectified flow matching scheduler (no sigma shift) - 32-channel latent space (vs 16 in FLUX.1) - Mistral3 encoder (vs CLIP + T5 in FLUX.1) - 256 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add support for Tencent's Hunyuan-DiT v1.2 model: - Implement Hunyuan-DiT transformer architecture with 28 DiT blocks using hidden_size=1408 and num_attention_heads=16 - Add dual text encoder system (Chinese BERT + T5-XXL) via HunyuanPromptEncoder - Implement DDPM scheduler for diffusion process - Add num_dit_blocks() method to LoadedWeights for counting Hunyuan-style transformer blocks - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-hunyuan - Add comprehensive tests for DiT blocks, DDPM scheduler, text encoding, weight loading, and LoRA Model specifications: - Uses DDPM scheduler (1000 training steps) - Supports CFG with Chinese/English prompts - 256 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add support for NewBie-AI's NewBie-image model (NewBie-AI/NewBie-image-Exp0.1): - Implement NextDiT transformer architecture with 36 blocks using hidden_size=2560 and Grouped Query Attention (24 query heads, 8 KV heads) - Add dual text encoder system: - Gemma3-4B-it for semantic understanding (2560 dim) - Jina CLIP v2 for image-text alignment (1024 dim) - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-newbie - Add comprehensive tests for configuration, generation, and LoRA Model specifications: - 3.5B parameter model optimized for anime/illustration generation - Uses Flow Matching scheduler (no sigma shift) - 16-channel VAE (FLUX.1-dev compatible) - 512 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix save.py: Import Hunyuan (main model class) instead of HunyuanDiT (transformer class) which was causing TypeError - Fix model_config.py: Use FLUX.2-dev (which exists) instead of FLUX.2-schnell (which doesn't exist on HuggingFace) - Update FLUX.2 aliases and enable guidance support Co-Authored-By: Claude Opus 4.5 <[email protected]>
The NewBie-image HuggingFace repo only contains text_encoder (Gemma3), not text_encoder_2 (Jina CLIP). The Jina CLIP projection layers exist in the transformer weights, but the encoder itself is loaded separately from jinaai/jina-clip-v2 if needed. Changes: - Remove jina_clip_encoder from weight definition components - Remove jina_clip from tokenizer definitions - Update download patterns to exclude text_encoder_2 - Make jina_clip_encoder optional in initializer (set to None) - Skip jina_clip_encoder in weight application if None This fixes FileNotFoundError when loading NewBie-AI/NewBie-image-Exp0.1. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified FLUX.1-schnell with DistilledGuidanceLayer for efficient inference.
Key features:
New CLI command: mflux-generate-chroma
Usage:
mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png
Note: LoRA support not yet implemented for Chroma.