Comfy is sooo much faster at higher resoluting

Looks like kontext is properly layered and comfyui implementation has some smart offloading going in the background.
I gave it a shot at upscaling with bf16 kontext to 2048x2656 and it was surprisingly fast. Unfortunately it has no Mixture of Diffusers implementation for flux, and then I saw that you saved the day again and implemented it in Forge. Thanks btw.
Yet on comfy I get 3.77s/it and on Forge I get 8.97s/it with my 4090. Sage+triton installed for both. Swap to shared, tried various gpu memory quotas - this was maximum that I've got. Any idea why 2 times difference in inference speed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comfy is sooo much faster at higher resoluting #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Comfy is sooo much faster at higher resoluting #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions