Skip to content

Latest commit

 

History

History
18 lines (17 loc) · 1.11 KB

File metadata and controls

18 lines (17 loc) · 1.11 KB

Leaderboard (best-so-far, gate-passing only)

rank exp latency (s) psnr (dB) max_abs_diff desc
1 18 4.791 61.9 0.0610 break-free, no coordinate_descent
2 16 4.796 61.9 0.0688 fuse unpatchify into compiled decoder
3 14 4.816 61.9 0.0593 break-free decoder (per-module cache, full fusion)
4 13 6.641 61.3 0.0911 inductor coordinate_descent_tuning
5 10 6.643 61.3 0.0892 compile max-autotune + graph
6 9 6.658 61.2 0.0771 compile decoder for elementwise fusion + graph
7 8 7.419 61.1 0.0667 native-spatial-pad conv (avoid F.pad copy)
8 7 8.002 61.1 0.0667 full-decode CUDA graph
9 6 8.021 61.1 0.0667 bf16 + channels_last + bf16-native upsample
10 4 8.105 61.3 0.0922 bf16 + compile max-autotune-no-cudagraphs
11 3 8.155 61.1 0.0667 bf16 + channels_last_3d eager
12 2 8.257 61.3 0.0844 bf16 + compile decoder
13 1 8.423 61.1 0.0708 bf16 eager
14 0 14.467 126.0 0.0000 baseline fp32 eager (Wan 2.2 VAE)