Skip to content

DiT vs. U-Net in GeoWizard's Diffusion Backbone #50

@wyhlovecpp

Description

@wyhlovecpp

Thanks a lot for your work. It's noted that GeoWizard extends the original Stable Diffusion model, which uses a U-Net based architecture, for its diffusion backbone.

Given the emergence of more recent DiT (Diffusion Transformer) based models (e.g., SD3, Flux) which show strong performance in general image generation, we're curious if the team considered or experimented with these architectures for GeoWizard.

Is there a specific reason why a U-Net backbone was chosen over a DiT, or have there been findings indicating U-Nets are more suitable for this particular 3D geometry estimation task? Any insights into this architectural decision would be valuable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions