-
Notifications
You must be signed in to change notification settings - Fork 542
Open
Description
Thanks a lot to VAR's wonderful work! We are excited to introduce SCALAR (Scale-wise Controllable Visual Autoregressive LeARning), a novel framework designed for fine-grained control over generated outputs in Visual Autoregressive (VAR) models. SCALAR addresses the challenges of efficient control encoding and injection mechanisms in VAR-based models, providing superior generation quality and control precision across various tasks.
- Paper Link: https://arxiv.org/abs/2507.19946
Key Features of SCALAR:
- Scale-wise Conditional Decoding: SCALAR leverages a pretrained image encoder to extract rich control signal encodings, which are projected into scale-specific representations and injected into the corresponding layers of the VAR backbone. This design ensures persistent and structurally aligned guidance throughout the generation process.
- Unified Control Alignment: Building on SCALAR, we developed SCALAR-Uni, a unified extension that aligns multiple control modalities into a shared latent space, supporting flexible multi-conditional guidance in a single model.
- Superior Performance: Extensive experiments on ImageNet demonstrate that SCALAR achieves superior generation quality and control consistency compared to existing methods, including CAR/ControlVAR/ControlAR.
Future Work:
- Explore finer-grained semantic alignment in control feature injection.
- Extend SCALAR to text-to-image generation tasks and validate its effectiveness on diverse datasets.
- Inspire related tasks such as image editing and multimodal generation.
pdxdydz
Metadata
Metadata
Assignees
Labels
No labels