Skip to content

Latest commit

 

History

History
123 lines (95 loc) · 3.71 KB

File metadata and controls

123 lines (95 loc) · 3.71 KB

LLaDA2.0-Uni ComfyUI Nodes

Custom ComfyUI nodes for LLaDA 2.0-Uni — a unified multimodal diffusion language model supporting text-to-image generation, image understanding (VQA), and image editing.

Installation

⚠️ These nodes depend on the encoder/ and decoder/ modules in the project root. Do not copy apps/comfyui in isolation — the full repository must be present and the relative path apps/comfyui must be preserved.

Option 1: Clone + symlink (recommended)

# 1. Clone the full project
git clone https://github.com/inclusionAI/LLaDA2.0-Uni.git

# 2. Symlink into ComfyUI's custom_nodes
cd /path/to/ComfyUI/custom_nodes
ln -s /path/to/LLaDA2.0-Uni/apps/comfyui ./LLaDA2Uni

Option 2: One-line installer

bash /path/to/LLaDA2.0-Uni/apps/comfyui/install.sh /path/to/ComfyUI

Dependencies

pip install -r apps/comfyui/requirements.txt
pip install flash-attn --no-build-isolation  # optional, recommended

Model Weights

In the Loader node, set the model path to either a HuggingFace repo ID or a local directory:

HuggingFace (auto-download):

inclusionAI/LLaDA2.0-Uni

Local path:

/path/to/LLaDA2.0-Uni

Expected directory layout:

LLaDA2.0-Uni/
├── config.json                       # LLM config
├── model-*.safetensors               # LLM weights
├── tokenizer.json
├── decoder/
│   ├── config.json
│   └── model.safetensors             # diffusion decoder
├── decoder-turbo/
│   ├── config.json
│   └── model.safetensors             # turbo decoder (8-step)
├── vae/
│   └── diffusion_pytorch_model.safetensors
└── image_tokenizer/
    ├── config.json
    ├── preprocessor_config.json
    ├── model.safetensors             # SigLIP-VQ weights
    └── sigvq_embedding.pt

Nodes

Node Description
LLaDA2.0_Uni Loader Load the model (Flash Attention / SDPA, optional CPU offload)
LLaDA2.0_Uni Text-to-Image Generate VQ image tokens from a text prompt (supports thinking mode)
LLaDA2.0_Uni Image Understanding Visual question answering
LLaDA2.0_Uni Image Editing Edit an image with a text instruction
LLaDA2.0_Uni Token Decoder Decode VQ tokens to pixels (turbo or normal mode)
LLaDA2.0_Uni Unload Model Manually free VRAM

Example Workflows

Text-to-Image

Loader → Text-to-Image → Token Decoder → Preview Image

Image Understanding

Load Image + Loader → Image Understanding → Show Text

Image Editing

Load Image + Loader → Image Editing → Token Decoder → Preview Image

Parameters

Loader

  • model_path — HuggingFace repo ID or local directory
  • attentionflash_attn (recommended) or sdpa
  • dtypebf16 (recommended) or fp8
  • offload — enable CPU offload for limited VRAM
  • devicecuda or cpu

Text-to-Image

  • prompt — text description
  • width / height — output resolution
  • steps — LLM denoising steps (8–32)
  • cfg_scale — classifier-free guidance scale
  • modestandard or thinking
  • seed — random seed (-1 = random)
  • block_length — block size for block-wise denoising

Token Decoder

  • decode_modedecoder-turbo (fast, 8 steps) or normal (50 steps)
  • decoder_steps — number of steps when using normal mode
  • resolution_multiplier — upscale factor (typically 2)
  • unload_after — release decoder VRAM after decoding (set False to keep cached for faster repeated decodes)

License

Same as the parent project. See the repository root for details.