Conversation
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
JuliaFormatter
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 306 to 308 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 317 to 329 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 345 to 347 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 350 to 352 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 356 to 363 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
Reactant.jl/benchmark/oceananigans/abernathey_channel.jl
Lines 371 to 372 in 4a7db89
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
|
This adds a sample Oceananigans script to the benchmark directory for CI runs. I'm not sure how to update the 'benchmark.yml' file so that this script is run. I know the benchmark directory is listed here: but none of the subdirectories are. Also, I attached the script's dependencies in a separate |
There was a problem hiding this comment.
Reactant.jl Benchmarks
Details
| Benchmark suite | Current: 9227d57 | Previous: 58e2ca8 | Ratio |
|---|---|---|---|
DGCNN [3, 128, 256]/reverse/CPU/DefaultAfterEnzyme |
0.398922009 s |
0.398070836 s |
1.00 |
jacobi_2d [512, 512, 1024]/primal/CPU/Default_manual_vectorized |
0.602496558 s |
0.617874608 s |
0.98 |
NewtonSchulz [4096 x 4096]/primal/CPU/Default |
2.324246022 s |
2.316913006 s |
1.00 |
gesummv [4096]/primal/CPU/Default_manual_vectorized |
0.064501904 s |
0.063009809 s |
1.02 |
bicg [2048, 4096]/primal/CPU/Default_manual_vectorized |
0.000840299 s |
0.000743054 s |
1.13 |
NewtonSchulz [1024 x 1024]/primal/CPU/StructuredTensors |
0.193011207 s |
0.189669149 s |
1.02 |
atax [2048]/primal/CPU/Default_manual_vectorized |
0.000504998 s |
0.000425077 s |
1.19 |
covariance [2048, 2048]/primal/CPU/Default |
0.025072572 s |
0.02273972 s |
1.10 |
NewtonSchulz [256 x 256]/primal/CPU/Default |
0.00745019 s |
0.008141407 s |
0.92 |
NewtonSchulz [1024 x 1024]/primal/CPU/Default |
0.05022749 s |
0.050128252 s |
1.00 |
bloch_rf [128 spins]/reverse/CPU/Default |
0.006759542 s |
0.006765288 s |
1.00 |
syrk [2048]/primal/CPU/Julia |
36.765276186 s |
34.885234172000004 s |
1.05 |
2mm [2048]/primal/CPU/Default_manual_vectorized |
0.021594917 s |
0.023451378 s |
0.92 |
doitgen [256, 1024, 512]/primal/CPU/Default_manual_vectorized |
0.074364702 s |
0.076964094 s |
0.97 |
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultAll |
0.0045181 s |
0.004539621 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/CPU/StructuredTensors |
7.32011585 s |
7.309533224 s |
1.00 |
2mm [2048]/primal/CPU/Default |
0.018274122 s |
0.017374886 s |
1.05 |
doitgen [256, 1024, 512]/primal/CPU/Default |
0.111491519 s |
0.111187316 s |
1.00 |
bloch_rf [8192 spins]/reverse/CPU/Default |
0.063466459 s |
0.064630334 s |
0.98 |
bicg [2048, 4096]/primal/CPU/Julia |
0.055524708000000006 s |
0.053497277 s |
1.04 |
NewtonSchulz [256 x 256]/primal/CPU/StructuredTensors |
0.009151546 s |
0.010075531 s |
0.91 |
NewtonSchulz [1024 x 1024]/primal/CPU/StructuredTensors (Only Detection) |
0.150897501 s |
0.151394765 s |
1.00 |
DGCNN [3, 128, 256]/primal/CPU/Default |
0.084876016 s |
0.084878564 s |
1.00 |
jacobi_1d [2048, 1024]/primal/CPU/Default_manual_vectorized |
0.007331776 s |
0.006895479 s |
1.06 |
covariance [2048, 2048]/primal/CPU/Default_manual_vectorized |
0.025170141 s |
0.023050009 s |
1.09 |
atax [2048]/primal/CPU/Julia |
0.027179259 s |
0.026390141000000002 s |
1.03 |
syrk [2048]/primal/CPU/Default_manual_vectorized |
0.009459075 s |
0.009623455 s |
0.98 |
correlation [2048, 2048]/primal/CPU/Default |
0.040496038 s |
0.036684941 s |
1.10 |
correlation [2048, 2048]/primal/CPU/Julia |
22.956283347000003 s |
22.920312024 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/primal/CPU/Default |
0.001641677 s |
0.001395163 s |
1.18 |
atax [2048]/primal/CPU/Default |
0.000503268 s |
0.000499805 s |
1.01 |
DGCNN [3, 128, 256]/reverse/CPU/DefaultBeforeEnzyme |
0.576965901 s |
0.578087553 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/CPU/StructuredTensors (Only Detection) |
5.510028065 s |
5.472108618 s |
1.01 |
correlation [2048, 2048]/primal/CPU/Default_manual_vectorized |
0.03578499 s |
0.034047847 s |
1.05 |
3mm [256, 1024, 2048, 4096]/primal/CPU/Default |
0.006792493 s |
0.006639493 s |
1.02 |
bloch_rf [8192 spins]/reverse/CPU/Default_NoBatching |
0.160148244 s |
0.15287961 s |
1.05 |
heat_3d [128, 128, 128, 256]/primal/CPU/Default |
0.6286878 s |
0.696132764 s |
0.90 |
bloch_rf [16384 spins]/reverse/CPU/Default |
0.124762969 s |
0.121696235 s |
1.03 |
bloch_rf [16384 spins]/reverse/CPU/Default_Checkpointing |
0.226232159 s |
0.228674484 s |
0.99 |
heat_3d [128, 128, 128, 256]/primal/CPU/Default_manual_vectorized |
0.635809699 s |
0.64505205 s |
0.99 |
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeAfterEnzyme |
0.415978495 s |
0.416717894 s |
1.00 |
gemmver [2048]/primal/CPU/Default_manual_vectorized |
0.01021365 s |
0.011180861 s |
0.91 |
FNO [64, 64, 1, 4]/reverse/CPU/DefaultBeforeEnzyme |
0.159975727 s |
0.156646088 s |
1.02 |
2mm [2048]/primal/CPU/Julia |
56.765553326 s |
57.74098089100001 s |
0.98 |
bloch_rf [128 spins]/reverse/CPU/Default_NoBatching |
0.011326447 s |
0.010269736 s |
1.10 |
mvt [4096]/primal/CPU/Default_manual_vectorized |
0.007802637 s |
0.007670336 s |
1.02 |
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeBeforeEnzyme |
0.424982611 s |
0.432458673 s |
0.98 |
gemmver [2048]/primal/CPU/Default |
0.003459995 s |
0.003356295 s |
1.03 |
gemmver [2048]/primal/CPU/Julia |
0.036043339 s |
0.034191986 s |
1.05 |
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultBeforeEnzyme |
0.004545098 s |
0.004537273 s |
1.00 |
bloch_rf [128 spins]/reverse/CPU/Default_NoBatching_Checkpointing |
0.028099068 s |
0.029549591 s |
0.95 |
bloch_rf [16384 spins]/reverse/CPU/Default_NoBatching |
0.310841634 s |
0.311803138 s |
1.00 |
gesummv [4096]/primal/CPU/Julia |
0.376272724 s |
0.37708815100000004 s |
1.00 |
jacobi_1d [2048, 1024]/primal/CPU/Default |
0.006900504 s |
0.006919381 s |
1.00 |
jacobi_2d [512, 512, 1024]/primal/CPU/Julia |
1.6186772580000002 s |
1.743826184 s |
0.93 |
NewtonSchulz [1024 x 1024]/primal/CPU/Julia |
0.090992924 s |
0.096990111 s |
0.94 |
bloch_rf [1024 spins]/reverse/CPU/Default_Checkpointing |
0.02449167 s |
0.024760363 s |
0.99 |
FNO [64, 64, 1, 4]/reverse/CPU/DefaultAfterEnzyme |
0.170899858 s |
0.168079487 s |
1.02 |
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeAll |
0.429761764 s |
0.432814088 s |
0.99 |
bloch_rf [128 spins]/reverse/CPU/Julia |
0.005871141000000001 s |
0.006579594 s |
0.89 |
fdtd_2d [1024, 2048, 256]/primal/CPU/Julia |
28.566648037 s |
26.824709781000003 s |
1.06 |
bloch_rf [128 spins]/reverse/CPU/Default_Checkpointing |
0.01823615 s |
0.017465801 s |
1.04 |
DGCNN [3, 128, 256]/reverse/CPU/NoOpt |
0.431072964 s |
0.428403831 s |
1.01 |
FNO [64, 64, 1, 4]/reverse/CPU/NoOpt |
0.155458936 s |
0.152382598 s |
1.02 |
gemm [2048, 4096]/primal/CPU/Default |
0.01723814 s |
0.017096556 s |
1.01 |
jacobi_1d [2048, 1024]/primal/CPU/Julia |
0.0005626380000000001 s |
0.000568096 s |
0.99 |
mvt [4096]/primal/CPU/Julia |
0.15012155600000002 s |
0.200703134 s |
0.75 |
bloch_rf [1024 spins]/reverse/CPU/Default |
0.011250229 s |
0.010971558 s |
1.03 |
bicg [2048, 4096]/primal/CPU/Default |
0.000836915 s |
0.000698753 s |
1.20 |
doitgen [256, 1024, 512]/primal/CPU/Julia |
267.821386008 s |
376.430725777 s |
0.71 |
gesummv [4096]/primal/CPU/Default |
0.002210536 s |
0.001836795 s |
1.20 |
FNO [64, 64, 1, 4]/reverse/CPU/DefaultAll |
0.169654797 s |
0.168534654 s |
1.01 |
jacobi_2d [512, 512, 1024]/primal/CPU/Default |
0.625673918 s |
0.6296972 s |
0.99 |
bloch_rf [16384 spins]/reverse/CPU/Julia |
0.546503321 s |
0.5568492260000001 s |
0.98 |
DeepONet ([64, 1024], [1, 128])/primal/CPU/NoOpt |
0.001420488 s |
0.00144888 s |
0.98 |
gemm [2048, 4096]/primal/CPU/Julia |
300.773966141 s |
285.31838474200003 s |
1.05 |
bloch_rf [1024 spins]/reverse/CPU/Julia |
0.038267074000000005 s |
0.038497590000000005 s |
0.99 |
fdtd_2d [1024, 2048, 256]/primal/CPU/Default |
0.534963059 s |
0.548909331 s |
0.97 |
DGCNN [3, 128, 256]/primal/CPU/NoOpt |
0.098497035 s |
0.096644299 s |
1.02 |
fdtd_2d [1024, 2048, 256]/primal/CPU/Default_manual_vectorized |
0.844111791 s |
0.865483869 s |
0.98 |
covariance [2048, 2048]/primal/CPU/Julia |
22.94589364 s |
22.946513432 s |
1.00 |
syr2k [2048]/primal/CPU/Default_manual_vectorized |
0.019910847 s |
0.019253731 s |
1.03 |
DeepONet ([64, 1024], [1, 128])/reverse/CPU/NoOpt |
0.004097683 s |
0.004432937 s |
0.92 |
DGCNN [3, 128, 256]/reverse/CPU/DefaultAll |
0.371169488 s |
0.369851723 s |
1.00 |
bloch_rf [8192 spins]/reverse/CPU/Julia |
0.275172095 s |
0.27385626500000004 s |
1.00 |
bloch_rf [8192 spins]/reverse/CPU/Default_Checkpointing |
0.118576998 s |
0.119207904 s |
0.99 |
3mm [256, 1024, 2048, 4096]/primal/CPU/Julia |
15.775128842 s |
15.034657282000001 s |
1.05 |
NewtonSchulz [256 x 256]/primal/CPU/Julia |
0.0036593940000000003 s |
0.0033585240000000003 s |
1.09 |
NewtonSchulz [256 x 256]/primal/CPU/StructuredTensors (Only Detection) |
0.008095428 s |
0.009733051 s |
0.83 |
bloch_rf [16384 spins]/reverse/CPU/Default_NoBatching_Checkpointing |
0.490487018 s |
0.477110539 s |
1.03 |
syr2k [2048]/primal/CPU/Julia |
39.021939283 s |
39.007653261 s |
1.00 |
heat_3d [128, 128, 128, 256]/primal/CPU/Julia |
7.8999029830000005 s |
11.672141676 s |
0.68 |
mvt [4096]/primal/CPU/Default |
0.007581925 s |
0.007498994 s |
1.01 |
bloch_rf [8192 spins]/reverse/CPU/Default_NoBatching_Checkpointing |
0.236782268 s |
0.235839559 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/CPU/Julia |
4.686436541 s |
4.64022502 s |
1.01 |
FNO [64, 64, 1, 4]/primal/CPU/NoOpt |
0.071988439 s |
0.071582932 s |
1.01 |
DGCNN [3, 128, 256]/primal/CPU/DisableTransposeReshape |
0.09569326 s |
0.095659247 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultAfterEnzyme |
0.004559171 s |
0.004409539 s |
1.03 |
FNO [64, 64, 1, 4]/primal/CPU/Default |
0.070489744 s |
0.070216447 s |
1.00 |
bloch_rf [1024 spins]/reverse/CPU/Default_NoBatching |
0.025769687 s |
0.023443795 s |
1.10 |
bloch_rf [1024 spins]/reverse/CPU/Default_NoBatching_Checkpointing |
0.053763732 s |
0.056046882 s |
0.96 |
gemm [2048, 4096]/primal/CPU/Default_manual_vectorized |
0.019783398 s |
0.020302257 s |
0.97 |
syrk [2048]/primal/CPU/Default |
0.009323441 s |
0.009327635 s |
1.00 |
3mm [256, 1024, 2048, 4096]/primal/CPU/Default_manual_vectorized |
0.006767557 s |
0.006828708 s |
0.99 |
syr2k [2048]/primal/CPU/Default |
0.018834526 s |
0.019498516 s |
0.97 |
bloch_rf [16384 spins]/reverse/CUDA/Default_NoBatching_Checkpointing |
0.264304473 s |
0.261457892 s |
1.01 |
NewtonSchulz [256 x 256]/primal/CUDA/Default |
0.000574662 s |
0.000554159 s |
1.04 |
bloch_rf [1024 spins]/reverse/CUDA/Default_NoBatching |
0.073729375 s |
0.073598594 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableScatterGatherPadAll |
0.010376032 s |
0.010298557 s |
1.01 |
gemmver [2048]/primal/CUDA/Default_manual_vectorized |
0.000060768 s |
0.000058257 s |
1.04 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableTransposeReshapeAll |
0.01054224 s |
0.0104369 s |
1.01 |
syr2k [2048]/primal/CUDA/Default |
0.000499565 s |
0.000499709 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultAll |
0.003151842 s |
0.003121849 s |
1.01 |
atax [2048]/primal/CUDA/Default_manual_vectorized |
0.000025875 s |
0.000025575 s |
1.01 |
gesummv [4096]/primal/CUDA/Default_manual_vectorized |
0.000206178 s |
0.000206803 s |
1.00 |
fdtd_2d [1024, 2048, 256]/primal/CUDA/Default_manual_vectorized |
0.028675672 s |
0.027458502 s |
1.04 |
bloch_rf [8192 spins]/reverse/CUDA/Default_NoBatching_Checkpointing |
0.261291461 s |
0.256914282 s |
1.02 |
VGG11 bn=true [224, 224, 3, 4]/primal/CUDA/Default |
0.001917002 s |
0.001904037 s |
1.01 |
gemm [2048, 4096]/primal/CUDA/Default |
0.000446583 s |
0.000445421 s |
1.00 |
bloch_rf [16384 spins]/reverse/CUDA/Default |
0.034818093 s |
0.034181084 s |
1.02 |
bloch_rf [8192 spins]/reverse/CUDA/Default_Checkpointing |
0.147638926 s |
0.146332433 s |
1.01 |
ViT tiny [256, 256, 3, 4]/primal/CUDA/NoOpt |
0.003158564 s |
0.00314326 s |
1.00 |
3mm [256, 1024, 2048, 4096]/primal/CUDA/Default_manual_vectorized |
0.000159101 s |
0.000159924 s |
0.99 |
jacobi_2d [512, 512, 1024]/primal/CUDA/Default_manual_vectorized |
0.02091506 s |
0.020097093 s |
1.04 |
NewtonSchulz [1024 x 1024]/primal/CUDA/StructuredTensors |
0.00701279 s |
0.00686823 s |
1.02 |
ViT tiny [256, 256, 3, 4]/primal/CUDA/Default |
0.002877501 s |
0.002651615 s |
1.09 |
DGCNN [3, 128, 256]/primal/CUDA/DisableTransposeReshape |
0.001247968 s |
0.001242163 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableScatterGatherAll |
0.010460293 s |
0.01062204 s |
0.98 |
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeAfterEnzyme |
0.003393965 s |
0.003389311 s |
1.00 |
bloch_rf [16384 spins]/reverse/CUDA/Default_Checkpointing |
0.145979048 s |
0.146557366 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultAll |
0.000527248 s |
0.000575929 s |
0.92 |
bloch_rf [128 spins]/reverse/CUDA/Default |
0.037041434 s |
0.034459056 s |
1.07 |
gemm [2048, 4096]/primal/CUDA/Default_manual_vectorized |
0.00044178 s |
0.000442725 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/primal/CUDA/Default |
0.000162614 s |
0.000213511 s |
0.76 |
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeAll |
0.003345437 s |
0.003340038 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultAfterEnzyme |
0.007075102 s |
0.007045761 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/CUDA/StructuredTensors (Only Detection) |
0.109337738 s |
0.109336512 s |
1.00 |
covariance [2048, 2048]/primal/CUDA/Default_manual_vectorized |
0.000262827 s |
0.000263588 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/CUDA/StructuredTensors |
0.109759335 s |
0.110324879 s |
0.99 |
NewtonSchulz [4096 x 4096]/primal/CUDA/Default |
0.043339614 s |
0.043750541 s |
0.99 |
correlation [2048, 2048]/primal/CUDA/Default |
0.000295359 s |
0.000296845 s |
0.99 |
3mm [256, 1024, 2048, 4096]/primal/CUDA/Default |
0.000158985 s |
0.000161757 s |
0.98 |
doitgen [256, 1024, 512]/primal/CUDA/Default_manual_vectorized |
0.001468615 s |
0.001470557 s |
1.00 |
jacobi_1d [2048, 1024]/primal/CUDA/Default_manual_vectorized |
0.024257626 s |
0.024096308 s |
1.01 |
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultBeforeEnzyme |
0.007131914 s |
0.007119492 s |
1.00 |
DGCNN [3, 128, 256]/primal/CUDA/NoOpt |
0.001266264 s |
0.001263411 s |
1.00 |
covariance [2048, 2048]/primal/CUDA/Default |
0.000283074 s |
0.000286435 s |
0.99 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisablePadAll |
0.010433047 s |
0.010435377 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/NoOpt |
0.007182739 s |
0.007157338 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultAfterEnzyme |
0.003252567 s |
0.003126927 s |
1.04 |
DGCNN [3, 128, 256]/reverse/CUDA/DefaultAfterEnzyme |
0.00303201 s |
0.003024562 s |
1.00 |
DGCNN [3, 128, 256]/reverse/CUDA/DefaultAll |
0.003520211 s |
0.003537342 s |
1.00 |
heat_3d [128, 128, 128, 256]/primal/CUDA/Default |
0.013342762 s |
0.013134791 s |
1.02 |
mvt [4096]/primal/CUDA/Default_manual_vectorized |
0.000107014 s |
0.000110445 s |
0.97 |
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultAll |
0.007069624 s |
0.007038347 s |
1.00 |
doitgen [256, 1024, 512]/primal/CUDA/Default |
0.002241755 s |
0.002241956 s |
1.00 |
DGCNN [3, 128, 256]/primal/CUDA/Default |
0.000974305 s |
0.000971581 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/primal/CUDA/NoOpt |
0.001939534 s |
0.001926944 s |
1.01 |
bloch_rf [1024 spins]/reverse/CUDA/Default_NoBatching_Checkpointing |
0.261730497 s |
0.257782645 s |
1.02 |
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/NoOpt |
0.000634165 s |
0.000644931 s |
0.98 |
syrk [2048]/primal/CUDA/Default_manual_vectorized |
0.000254876 s |
0.000255133 s |
1.00 |
gemmver [2048]/primal/CUDA/Default |
0.000044986 s |
0.00004434 s |
1.01 |
NewtonSchulz [256 x 256]/primal/CUDA/StructuredTensors |
0.001436332 s |
0.001411867 s |
1.02 |
DGCNN [3, 128, 256]/reverse/CUDA/NoOpt |
0.006571116 s |
0.006575345 s |
1.00 |
bloch_rf [128 spins]/reverse/CUDA/Default_Checkpointing |
0.147163896 s |
0.151287582 s |
0.97 |
bicg [2048, 4096]/primal/CUDA/Default |
0.000055485 s |
0.00006569 s |
0.84 |
DGCNN [3, 128, 256]/reverse/CUDA/DefaultBeforeEnzyme |
0.006508961 s |
0.006538191 s |
1.00 |
bicg [2048, 4096]/primal/CUDA/Default_manual_vectorized |
0.000053565 s |
0.000054109 s |
0.99 |
2mm [2048]/primal/CUDA/Default |
0.000462545 s |
0.000462548 s |
1.00 |
jacobi_1d [2048, 1024]/primal/CUDA/Default |
0.020627327 s |
0.020134235 s |
1.02 |
heat_3d [128, 128, 128, 256]/primal/CUDA/Default_manual_vectorized |
0.012847875 s |
0.013123787 s |
0.98 |
bloch_rf [16384 spins]/reverse/CUDA/Default_NoBatching |
0.083654162 s |
0.082095426 s |
1.02 |
jacobi_2d [512, 512, 1024]/primal/CUDA/Default |
0.020403413 s |
0.020061832 s |
1.02 |
correlation [2048, 2048]/primal/CUDA/Default_manual_vectorized |
0.000268405 s |
0.000268627 s |
1.00 |
bloch_rf [8192 spins]/reverse/CUDA/Default_NoBatching |
0.072826295 s |
0.07273783 s |
1.00 |
bloch_rf [1024 spins]/reverse/CUDA/Default_Checkpointing |
0.147529895 s |
0.145460278 s |
1.01 |
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultBeforeEnzyme |
0.000521865 s |
0.000581555 s |
0.90 |
2mm [2048]/primal/CUDA/Default_manual_vectorized |
0.000465735 s |
0.000465898 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/NoOpt |
0.01163171 s |
0.010929207 s |
1.06 |
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultBeforeEnzyme |
0.003194712 s |
0.003147606 s |
1.01 |
bloch_rf [1024 spins]/reverse/CUDA/Default |
0.036278792 s |
0.035616641 s |
1.02 |
NewtonSchulz [1024 x 1024]/primal/CUDA/Default |
0.001847829 s |
0.001829829 s |
1.01 |
atax [2048]/primal/CUDA/Default |
0.000029872 s |
0.000025562 s |
1.17 |
gesummv [4096]/primal/CUDA/Default |
0.000108406 s |
0.000106736 s |
1.02 |
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DefaultAll |
0.010504191 s |
0.010640459 s |
0.99 |
syr2k [2048]/primal/CUDA/Default_manual_vectorized |
0.0004996 s |
0.000499725 s |
1.00 |
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeBeforeEnzyme |
0.014071049 s |
0.014599092 s |
0.96 |
bloch_rf [128 spins]/reverse/CUDA/Default_NoBatching |
0.076232127 s |
0.07602372 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/CUDA/NoOpt |
0.003336913 s |
0.00336687 s |
0.99 |
bloch_rf [8192 spins]/reverse/CUDA/Default |
0.035577678 s |
0.034350543 s |
1.04 |
FNO [64, 64, 1, 4]/primal/CUDA/NoOpt |
0.001149151 s |
0.001131692 s |
1.02 |
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultAfterEnzyme |
0.000517677 s |
0.000545863 s |
0.95 |
NewtonSchulz [256 x 256]/primal/CUDA/StructuredTensors (Only Detection) |
0.001369076 s |
0.001349102 s |
1.01 |
DeepONet ([64, 1024], [1, 128])/primal/CUDA/NoOpt |
0.00021077 s |
0.000212637 s |
0.99 |
mvt [4096]/primal/CUDA/Default |
0.000107007 s |
0.000109972 s |
0.97 |
syrk [2048]/primal/CUDA/Default |
0.000255953 s |
0.000256032 s |
1.00 |
fdtd_2d [1024, 2048, 256]/primal/CUDA/Default |
0.025073715 s |
0.025553858 s |
0.98 |
NewtonSchulz [1024 x 1024]/primal/CUDA/StructuredTensors (Only Detection) |
0.005458217 s |
0.005679814 s |
0.96 |
bloch_rf [128 spins]/reverse/CUDA/Default_NoBatching_Checkpointing |
0.415549826 s |
0.399026405 s |
1.04 |
FNO [64, 64, 1, 4]/primal/CUDA/Default |
0.00108504 s |
0.00107467 s |
1.01 |
ViT tiny [256, 256, 3, 4]/primal/TPU/Default |
0.000217708 s |
0.000217818 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/TPU/Default |
0.024508124 s |
0.024705897 s |
0.99 |
bloch_rf [1024 spins]/reverse/TPU/Default_Checkpointing |
0.003438832 s |
0.003438864 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/TPU/DefaultAll |
0.003093378 s |
0.003093037 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeAll |
0.005004825 s |
0.005005214 s |
1.00 |
syrk [2048]/primal/TPU/Default_manual_vectorized |
0.000031548 s |
0.000031535 s |
1.00 |
bloch_rf [16384 spins]/reverse/TPU/Default_NoBatching |
0.047648354 s |
0.047636619 s |
1.00 |
ViT tiny [256, 256, 3, 4]/primal/TPU/NoOpt |
0.000586544 s |
0.000586181 s |
1.00 |
FNO [64, 64, 1, 4]/primal/TPU/NoOpt |
0.001138321 s |
0.001138033 s |
1.00 |
gemm [2048, 4096]/primal/TPU/Default |
0.000072625 s |
0.000072509 s |
1.00 |
3mm [256, 1024, 2048, 4096]/primal/TPU/Default_manual_vectorized |
0.000016375 s |
0.000016401 s |
1.00 |
atax [2048]/primal/TPU/Default_manual_vectorized |
0.000024208 s |
0.000024146 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableScatterGatherAll |
0.001698505 s |
0.001698623 s |
1.00 |
syr2k [2048]/primal/TPU/Default |
0.000058343 s |
0.000058219 s |
1.00 |
2mm [2048]/primal/TPU/Default_manual_vectorized |
0.000074958 s |
0.000074929 s |
1.00 |
FNO [64, 64, 1, 4]/primal/TPU/Default |
0.000953001 s |
0.000953054 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultBeforeEnzyme |
0.004179579 s |
0.004179874 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/primal/TPU/Default |
0.000006047 s |
0.000006087 s |
0.99 |
doitgen [256, 1024, 512]/primal/TPU/Default_manual_vectorized |
0.001088949 s |
0.001086157 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableScatterGatherPadAll |
0.001697239 s |
0.001697514 s |
1.00 |
doitgen [256, 1024, 512]/primal/TPU/Default |
0.001579798 s |
0.001579097 s |
1.00 |
DGCNN [3, 128, 256]/primal/TPU/DisableTransposeReshape |
0.002865498 s |
0.002865397 s |
1.00 |
bloch_rf [1024 spins]/reverse/TPU/Default |
0.001323008 s |
0.001323031 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/TPU/DefaultAfterEnzyme |
0.003085456 s |
0.003085515 s |
1.00 |
bloch_rf [128 spins]/reverse/TPU/Default |
0.000694924 s |
0.000694866 s |
1.00 |
mvt [4096]/primal/TPU/Default |
0.000045023 s |
0.000045082 s |
1.00 |
bloch_rf [16384 spins]/reverse/TPU/Default |
0.026236881 s |
0.026240626 s |
1.00 |
gemmver [2048]/primal/TPU/Default |
0.000036403 s |
0.00003654 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/TPU/NoOpt |
0.002959794 s |
0.002960119 s |
1.00 |
syr2k [2048]/primal/TPU/Default_manual_vectorized |
0.000057948 s |
0.000057998 s |
1.00 |
bloch_rf [8192 spins]/reverse/TPU/Default_NoBatching |
0.033746967 s |
0.033754872 s |
1.00 |
bloch_rf [8192 spins]/reverse/TPU/Default_Checkpointing |
0.019418776 s |
0.0194229 s |
1.00 |
fdtd_2d [1024, 2048, 256]/primal/TPU/Default |
0.018833155 s |
0.018833106 s |
1.00 |
covariance [2048, 2048]/primal/TPU/Default_manual_vectorized |
0.000047606 s |
0.000047646 s |
1.00 |
heat_3d [128, 128, 128, 256]/primal/TPU/Default |
0.26175386 s |
0.261753897 s |
1.00 |
bloch_rf [128 spins]/reverse/TPU/Default_NoBatching_Checkpointing |
0.017732083 s |
0.01791452 s |
0.99 |
NewtonSchulz [256 x 256]/primal/TPU/StructuredTensors (Only Detection) |
0.000019731 s |
0.000019735 s |
1.00 |
gemmver [2048]/primal/TPU/Default_manual_vectorized |
0.000037325 s |
0.000037471 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/NoOpt |
0.004056119 s |
0.004055074 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DefaultBeforeEnzyme |
0.004752407 s |
0.004751814 s |
1.00 |
bicg [2048, 4096]/primal/TPU/Default_manual_vectorized |
0.000023459 s |
0.000023456 s |
1.00 |
NewtonSchulz [1024 x 1024]/primal/TPU/StructuredTensors |
0.00022266 s |
0.0002227 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisablePadAll |
0.00169829 s |
0.001697866 s |
1.00 |
bloch_rf [1024 spins]/reverse/TPU/Default_NoBatching_Checkpointing |
0.018718537 s |
0.019394475 s |
0.97 |
3mm [256, 1024, 2048, 4096]/primal/TPU/Default |
0.000016374 s |
0.000016327 s |
1.00 |
jacobi_1d [2048, 1024]/primal/TPU/Default |
0.008143506 s |
0.008144584 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DefaultAfterEnzyme |
0.004670825 s |
0.00467102 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableTransposeReshapeAll |
0.00143745 s |
0.001437695 s |
1.00 |
syrk [2048]/primal/TPU/Default |
0.000030314 s |
0.000030265 s |
1.00 |
correlation [2048, 2048]/primal/TPU/Default_manual_vectorized |
0.000053189 s |
0.000053206 s |
1.00 |
bloch_rf [128 spins]/reverse/TPU/Default_Checkpointing |
0.001658265 s |
0.001658484 s |
1.00 |
DGCNN [3, 128, 256]/primal/TPU/NoOpt |
0.002867399 s |
0.002867601 s |
1.00 |
bloch_rf [16384 spins]/reverse/TPU/Default_NoBatching_Checkpointing |
0.035670511 s |
0.03566848 s |
1.00 |
NewtonSchulz [1024 x 1024]/primal/TPU/StructuredTensors (Only Detection) |
0.000222784 s |
0.000222713 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/primal/TPU/Default |
0.000813696 s |
0.000813646 s |
1.00 |
NewtonSchulz [256 x 256]/primal/TPU/Default |
0.000018987 s |
0.000018965 s |
1.00 |
jacobi_1d [2048, 1024]/primal/TPU/Default_manual_vectorized |
0.005707607 s |
0.005708007 s |
1.00 |
FNO [64, 64, 1, 4]/reverse/TPU/DefaultBeforeEnzyme |
0.003093487 s |
0.003093686 s |
1.00 |
bloch_rf [8192 spins]/reverse/TPU/Default_NoBatching_Checkpointing |
0.024888531 s |
0.024891187 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultBeforeEnzyme |
0.000027333 s |
0.00002735 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultAfterEnzyme |
0.004180059 s |
0.004180592 s |
1.00 |
gesummv [4096]/primal/TPU/Default |
0.000087436 s |
0.000087412 s |
1.00 |
covariance [2048, 2048]/primal/TPU/Default |
0.000051871 s |
0.000051776 s |
1.00 |
gesummv [4096]/primal/TPU/Default_manual_vectorized |
0.000087393 s |
0.000087481 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/NoOpt |
0.002020693 s |
0.002020436 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/NoOpt |
0.005226905 s |
0.005227528 s |
1.00 |
NewtonSchulz [1024 x 1024]/primal/TPU/Default |
0.00022076 s |
0.00022078 s |
1.00 |
bicg [2048, 4096]/primal/TPU/Default |
0.00002348 s |
0.000023427 s |
1.00 |
fdtd_2d [1024, 2048, 256]/primal/TPU/Default_manual_vectorized |
0.027117804 s |
0.027117819 s |
1.00 |
jacobi_2d [512, 512, 1024]/primal/TPU/Default_manual_vectorized |
0.02207921 s |
0.022079209 s |
1.00 |
jacobi_2d [512, 512, 1024]/primal/TPU/Default |
0.0266761 s |
0.026676079 s |
1.00 |
gemm [2048, 4096]/primal/TPU/Default_manual_vectorized |
0.000072719 s |
0.000072801 s |
1.00 |
ViT tiny [256, 256, 3, 4]/reverse/TPU/DefaultAll |
0.001698392 s |
0.001698055 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/TPU/StructuredTensors (Only Detection) |
0.024553534 s |
0.02476967 s |
0.99 |
2mm [2048]/primal/TPU/Default |
0.000086594 s |
0.000086593 s |
1.00 |
NewtonSchulz [256 x 256]/primal/TPU/StructuredTensors |
0.000019743 s |
0.000019746 s |
1.00 |
bloch_rf [16384 spins]/reverse/TPU/Default_Checkpointing |
0.085420627 s |
0.08541824 s |
1.00 |
DGCNN [3, 128, 256]/primal/TPU/Default |
0.002349152 s |
0.002349392 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultAfterEnzyme |
0.000027326 s |
0.00002731 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/primal/TPU/NoOpt |
0.000006327 s |
0.000006281 s |
1.01 |
atax [2048]/primal/TPU/Default |
0.000024192 s |
0.000024112 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeAfterEnzyme |
0.005179774 s |
0.005180388 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DefaultAll |
0.004673113 s |
0.004672252 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/TPU/NoOpt |
0.000027424 s |
0.000027524 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultAll |
0.004180245 s |
0.004180778 s |
1.00 |
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultAll |
0.000027259 s |
0.000027323 s |
1.00 |
VGG11 bn=true [224, 224, 3, 4]/primal/TPU/NoOpt |
0.000861191 s |
0.000861071 s |
1.00 |
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeBeforeEnzyme |
0.005004828 s |
0.005004994 s |
1.00 |
mvt [4096]/primal/TPU/Default_manual_vectorized |
0.000045565 s |
0.000045534 s |
1.00 |
bloch_rf [128 spins]/reverse/TPU/Default_NoBatching |
0.006970248 s |
0.006971372 s |
1.00 |
bloch_rf [1024 spins]/reverse/TPU/Default_NoBatching |
0.007860035 s |
0.007846746 s |
1.00 |
NewtonSchulz [4096 x 4096]/primal/TPU/StructuredTensors |
0.024573923 s |
0.024616768 s |
1.00 |
bloch_rf [8192 spins]/reverse/TPU/Default |
0.016541352 s |
0.016542545 s |
1.00 |
heat_3d [128, 128, 128, 256]/primal/TPU/Default_manual_vectorized |
0.261753842 s |
0.261753882 s |
1.00 |
correlation [2048, 2048]/primal/TPU/Default |
0.000055706 s |
0.000055745 s |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Additionally, checkpointing is set for this for loop: To enable/disable checkpointing you need to change the |
f6ba7a4 to
cb73cf3
Compare
|
da19e01 to
0c7728a
Compare
|
@jlk9 seems like a setup error? |
|
That's a screwup from me |
|
|
||
| function loop!(model) | ||
| Δt = model.clock.last_Δt | ||
| @trace mincut = true checkpointing = true track_numbers = false for i in 1:100 |
There was a problem hiding this comment.
potentially for a follow up, @avik-pal we should try testing checkpointing on vs off, and also optimizations on vs off
fa36b03 to
3e6cdda
Compare
3e6cdda to
c8e8494
Compare
|
@avik-pal I think the benchmark has too many timesteps in the autodiff run. Locally having 10 steps in |
This adds an Oceananigans model run to the Reactant benchmark.