-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Thanks a lot for your work. It's noted that GeoWizard extends the original Stable Diffusion model, which uses a U-Net based architecture, for its diffusion backbone.
Given the emergence of more recent DiT (Diffusion Transformer) based models (e.g., SD3, Flux) which show strong performance in general image generation, we're curious if the team considered or experimented with these architectures for GeoWizard.
Is there a specific reason why a U-Net backbone was chosen over a DiT, or have there been findings indicating U-Nets are more suitable for this particular 3D geometry estimation task? Any insights into this architectural decision would be valuable.
Metadata
Metadata
Assignees
Labels
No labels