Skip to content

🚀 Introducing SCALAR: A Scale-wise Controllable VAR Framework for Next-Scale Prediction #173

@ZeroRF

Description

@ZeroRF

Thanks a lot to VAR's wonderful work! We are excited to introduce SCALAR (Scale-wise Controllable Visual Autoregressive LeARning), a novel framework designed for fine-grained control over generated outputs in Visual Autoregressive (VAR) models. SCALAR addresses the challenges of efficient control encoding and injection mechanisms in VAR-based models, providing superior generation quality and control precision across various tasks.

Key Features of SCALAR:

  • Scale-wise Conditional Decoding: SCALAR leverages a pretrained image encoder to extract rich control signal encodings, which are projected into scale-specific representations and injected into the corresponding layers of the VAR backbone. This design ensures persistent and structurally aligned guidance throughout the generation process.
  • Unified Control Alignment: Building on SCALAR, we developed SCALAR-Uni, a unified extension that aligns multiple control modalities into a shared latent space, supporting flexible multi-conditional guidance in a single model.
  • Superior Performance: Extensive experiments on ImageNet demonstrate that SCALAR achieves superior generation quality and control consistency compared to existing methods, including CAR/ControlVAR/ControlAR.
Image

Future Work:

  • Explore finer-grained semantic alignment in control feature injection.
  • Extend SCALAR to text-to-image generation tasks and validate its effectiveness on diverse datasets.
  • Inspire related tasks such as image editing and multimodal generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions