Skip to content

Adding an Oceananigans benchmark#2362

Open
jlk9 wants to merge 11 commits intomainfrom
jlk9/oceananigans-benchmark
Open

Adding an Oceananigans benchmark#2362
jlk9 wants to merge 11 commits intomainfrom
jlk9/oceananigans-benchmark

Conversation

@jlk9
Copy link
Collaborator

@jlk9 jlk9 commented Feb 6, 2026

This adds an Oceananigans model run to the Reactant benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] reported by reviewdog 🐶

@trace mincut = true checkpointing = true track_numbers = false for i = 1:100


[JuliaFormatter] reported by reviewdog 🐶

function run_reentrant_channel_model!(model, Tᵢ, Sᵢ, u_wind_stress, v_wind_stress, temp_flux)


[JuliaFormatter] reported by reviewdog 🐶

function estimate_tracer_error(model, initial_temperature, initial_salinity, u_wind_stress, v_wind_stress, temp_flux, Δz, mld)
run_reentrant_channel_model!(model, initial_temperature, initial_salinity, u_wind_stress, v_wind_stress, temp_flux)


[JuliaFormatter] reported by reviewdog 🐶

zonal_transport = (model.velocities.u[x_midpoint,1:Ny,1:Nz] .* model.grid.Δyᵃᶜᵃ) .* Δz


[JuliaFormatter] reported by reviewdog 🐶

function differentiate_tracer_error(model, Tᵢ, Sᵢ, u_wind_stress, v_wind_stress, temp_flux, Δz, mld,
dmodel, dTᵢ, dSᵢ, du_wind_stress, dv_wind_stress, dtemp_flux, dΔz, dmld)
dedν = autodiff(set_strong_zero(Enzyme.ReverseWithPrimal),
estimate_tracer_error, Active,
Duplicated(model, dmodel),
Duplicated(Tᵢ, dTᵢ),
Duplicated(Sᵢ, dSᵢ),
Duplicated(u_wind_stress, du_wind_stress),
Duplicated(v_wind_stress, dv_wind_stress),
Duplicated(temp_flux, dtemp_flux),
Duplicated(Δz, dΔz),
Duplicated(mld, dmld))


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

grid = make_grid(architecture, Nx, Ny, Nz, z_faces)
model = build_model(grid, Δt₀, parameters)
T_flux = T_flux_init(model.grid, parameters)


[JuliaFormatter] reported by reviewdog 🐶

Tᵢ, Sᵢ = temperature_salinity_init(model.grid, parameters)
mld = Field{Center, Center, Nothing}(model.grid) # Not used for now
Δz = Reactant.ConcreteRArray(Δz)


[JuliaFormatter] reported by reviewdog 🐶

dmodel = Enzyme.make_zero(model)
dTᵢ = Field{Center, Center, Center}(model.grid)
dSᵢ = Field{Center, Center, Center}(model.grid)
du_wind_stress = Field{Face, Center, Nothing}(model.grid)
dv_wind_stress = Field{Center, Face, Nothing}(model.grid)
dT_flux = Field{Center, Center, Nothing}(model.grid)
dmld = Field{Center, Center, Nothing}(model.grid)
dΔz = Enzyme.make_zero(Δz)


[JuliaFormatter] reported by reviewdog 🐶

rspinup_reentrant_channel_model! = @compile raise_first=true raise=true sync=true spinup_reentrant_channel_model!(model, Tᵢ, Sᵢ, u_wind_stress, v_wind_stress, T_flux)


[JuliaFormatter] reported by reviewdog 🐶

rdifferentiate_tracer_error = @compile raise_first=true raise=true sync=true differentiate_tracer_error(model, Tᵢ, Sᵢ, u_wind_stress, v_wind_stress, T_flux, Δz, mld,
dmodel, dTᵢ, dSᵢ, du_wind_stress, dv_wind_stress, dT_flux, dΔz, dmld)


[JuliaFormatter] reported by reviewdog 🐶

dedν = rdifferentiate_tracer_error(model, Tᵢ, Sᵢ, u_wind_stress, v_wind_stress, T_flux, Δz, mld, dmodel, dTᵢ, dSᵢ, du_wind_stress, dv_wind_stress, dT_flux, dΔz, dmld)


[JuliaFormatter] reported by reviewdog 🐶

@jlk9
Copy link
Collaborator Author

jlk9 commented Feb 6, 2026

This adds a sample Oceananigans script to the benchmark directory for CI runs. I'm not sure how to update the 'benchmark.yml' file so that this script is run. I know the benchmark directory is listed here:

https://github.com/EnzymeAD/Reactant.jl/blob/9227d57f248250440ef77a25d954d9a3b629e1a2/.github/workflows/benchmark.yml#L16C1-L23C1

but none of the subdirectories are.

Also, I attached the script's dependencies in a separate Project.toml file in the new benchmark/oceananigans subdirectory. Maybe those dependencies should just go into the toml in the benchmark folder itself - let me know if that's better.

@avik-pal @wsmoses

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Details
Benchmark suite Current: 9227d57 Previous: 58e2ca8 Ratio
DGCNN [3, 128, 256]/reverse/CPU/DefaultAfterEnzyme 0.398922009 s 0.398070836 s 1.00
jacobi_2d [512, 512, 1024]/primal/CPU/Default_manual_vectorized 0.602496558 s 0.617874608 s 0.98
NewtonSchulz [4096 x 4096]/primal/CPU/Default 2.324246022 s 2.316913006 s 1.00
gesummv [4096]/primal/CPU/Default_manual_vectorized 0.064501904 s 0.063009809 s 1.02
bicg [2048, 4096]/primal/CPU/Default_manual_vectorized 0.000840299 s 0.000743054 s 1.13
NewtonSchulz [1024 x 1024]/primal/CPU/StructuredTensors 0.193011207 s 0.189669149 s 1.02
atax [2048]/primal/CPU/Default_manual_vectorized 0.000504998 s 0.000425077 s 1.19
covariance [2048, 2048]/primal/CPU/Default 0.025072572 s 0.02273972 s 1.10
NewtonSchulz [256 x 256]/primal/CPU/Default 0.00745019 s 0.008141407 s 0.92
NewtonSchulz [1024 x 1024]/primal/CPU/Default 0.05022749 s 0.050128252 s 1.00
bloch_rf [128 spins]/reverse/CPU/Default 0.006759542 s 0.006765288 s 1.00
syrk [2048]/primal/CPU/Julia 36.765276186 s 34.885234172000004 s 1.05
2mm [2048]/primal/CPU/Default_manual_vectorized 0.021594917 s 0.023451378 s 0.92
doitgen [256, 1024, 512]/primal/CPU/Default_manual_vectorized 0.074364702 s 0.076964094 s 0.97
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultAll 0.0045181 s 0.004539621 s 1.00
NewtonSchulz [4096 x 4096]/primal/CPU/StructuredTensors 7.32011585 s 7.309533224 s 1.00
2mm [2048]/primal/CPU/Default 0.018274122 s 0.017374886 s 1.05
doitgen [256, 1024, 512]/primal/CPU/Default 0.111491519 s 0.111187316 s 1.00
bloch_rf [8192 spins]/reverse/CPU/Default 0.063466459 s 0.064630334 s 0.98
bicg [2048, 4096]/primal/CPU/Julia 0.055524708000000006 s 0.053497277 s 1.04
NewtonSchulz [256 x 256]/primal/CPU/StructuredTensors 0.009151546 s 0.010075531 s 0.91
NewtonSchulz [1024 x 1024]/primal/CPU/StructuredTensors (Only Detection) 0.150897501 s 0.151394765 s 1.00
DGCNN [3, 128, 256]/primal/CPU/Default 0.084876016 s 0.084878564 s 1.00
jacobi_1d [2048, 1024]/primal/CPU/Default_manual_vectorized 0.007331776 s 0.006895479 s 1.06
covariance [2048, 2048]/primal/CPU/Default_manual_vectorized 0.025170141 s 0.023050009 s 1.09
atax [2048]/primal/CPU/Julia 0.027179259 s 0.026390141000000002 s 1.03
syrk [2048]/primal/CPU/Default_manual_vectorized 0.009459075 s 0.009623455 s 0.98
correlation [2048, 2048]/primal/CPU/Default 0.040496038 s 0.036684941 s 1.10
correlation [2048, 2048]/primal/CPU/Julia 22.956283347000003 s 22.920312024 s 1.00
DeepONet ([64, 1024], [1, 128])/primal/CPU/Default 0.001641677 s 0.001395163 s 1.18
atax [2048]/primal/CPU/Default 0.000503268 s 0.000499805 s 1.01
DGCNN [3, 128, 256]/reverse/CPU/DefaultBeforeEnzyme 0.576965901 s 0.578087553 s 1.00
NewtonSchulz [4096 x 4096]/primal/CPU/StructuredTensors (Only Detection) 5.510028065 s 5.472108618 s 1.01
correlation [2048, 2048]/primal/CPU/Default_manual_vectorized 0.03578499 s 0.034047847 s 1.05
3mm [256, 1024, 2048, 4096]/primal/CPU/Default 0.006792493 s 0.006639493 s 1.02
bloch_rf [8192 spins]/reverse/CPU/Default_NoBatching 0.160148244 s 0.15287961 s 1.05
heat_3d [128, 128, 128, 256]/primal/CPU/Default 0.6286878 s 0.696132764 s 0.90
bloch_rf [16384 spins]/reverse/CPU/Default 0.124762969 s 0.121696235 s 1.03
bloch_rf [16384 spins]/reverse/CPU/Default_Checkpointing 0.226232159 s 0.228674484 s 0.99
heat_3d [128, 128, 128, 256]/primal/CPU/Default_manual_vectorized 0.635809699 s 0.64505205 s 0.99
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeAfterEnzyme 0.415978495 s 0.416717894 s 1.00
gemmver [2048]/primal/CPU/Default_manual_vectorized 0.01021365 s 0.011180861 s 0.91
FNO [64, 64, 1, 4]/reverse/CPU/DefaultBeforeEnzyme 0.159975727 s 0.156646088 s 1.02
2mm [2048]/primal/CPU/Julia 56.765553326 s 57.74098089100001 s 0.98
bloch_rf [128 spins]/reverse/CPU/Default_NoBatching 0.011326447 s 0.010269736 s 1.10
mvt [4096]/primal/CPU/Default_manual_vectorized 0.007802637 s 0.007670336 s 1.02
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeBeforeEnzyme 0.424982611 s 0.432458673 s 0.98
gemmver [2048]/primal/CPU/Default 0.003459995 s 0.003356295 s 1.03
gemmver [2048]/primal/CPU/Julia 0.036043339 s 0.034191986 s 1.05
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultBeforeEnzyme 0.004545098 s 0.004537273 s 1.00
bloch_rf [128 spins]/reverse/CPU/Default_NoBatching_Checkpointing 0.028099068 s 0.029549591 s 0.95
bloch_rf [16384 spins]/reverse/CPU/Default_NoBatching 0.310841634 s 0.311803138 s 1.00
gesummv [4096]/primal/CPU/Julia 0.376272724 s 0.37708815100000004 s 1.00
jacobi_1d [2048, 1024]/primal/CPU/Default 0.006900504 s 0.006919381 s 1.00
jacobi_2d [512, 512, 1024]/primal/CPU/Julia 1.6186772580000002 s 1.743826184 s 0.93
NewtonSchulz [1024 x 1024]/primal/CPU/Julia 0.090992924 s 0.096990111 s 0.94
bloch_rf [1024 spins]/reverse/CPU/Default_Checkpointing 0.02449167 s 0.024760363 s 0.99
FNO [64, 64, 1, 4]/reverse/CPU/DefaultAfterEnzyme 0.170899858 s 0.168079487 s 1.02
DGCNN [3, 128, 256]/reverse/CPU/DisableTransposeReshapeAll 0.429761764 s 0.432814088 s 0.99
bloch_rf [128 spins]/reverse/CPU/Julia 0.005871141000000001 s 0.006579594 s 0.89
fdtd_2d [1024, 2048, 256]/primal/CPU/Julia 28.566648037 s 26.824709781000003 s 1.06
bloch_rf [128 spins]/reverse/CPU/Default_Checkpointing 0.01823615 s 0.017465801 s 1.04
DGCNN [3, 128, 256]/reverse/CPU/NoOpt 0.431072964 s 0.428403831 s 1.01
FNO [64, 64, 1, 4]/reverse/CPU/NoOpt 0.155458936 s 0.152382598 s 1.02
gemm [2048, 4096]/primal/CPU/Default 0.01723814 s 0.017096556 s 1.01
jacobi_1d [2048, 1024]/primal/CPU/Julia 0.0005626380000000001 s 0.000568096 s 0.99
mvt [4096]/primal/CPU/Julia 0.15012155600000002 s 0.200703134 s 0.75
bloch_rf [1024 spins]/reverse/CPU/Default 0.011250229 s 0.010971558 s 1.03
bicg [2048, 4096]/primal/CPU/Default 0.000836915 s 0.000698753 s 1.20
doitgen [256, 1024, 512]/primal/CPU/Julia 267.821386008 s 376.430725777 s 0.71
gesummv [4096]/primal/CPU/Default 0.002210536 s 0.001836795 s 1.20
FNO [64, 64, 1, 4]/reverse/CPU/DefaultAll 0.169654797 s 0.168534654 s 1.01
jacobi_2d [512, 512, 1024]/primal/CPU/Default 0.625673918 s 0.6296972 s 0.99
bloch_rf [16384 spins]/reverse/CPU/Julia 0.546503321 s 0.5568492260000001 s 0.98
DeepONet ([64, 1024], [1, 128])/primal/CPU/NoOpt 0.001420488 s 0.00144888 s 0.98
gemm [2048, 4096]/primal/CPU/Julia 300.773966141 s 285.31838474200003 s 1.05
bloch_rf [1024 spins]/reverse/CPU/Julia 0.038267074000000005 s 0.038497590000000005 s 0.99
fdtd_2d [1024, 2048, 256]/primal/CPU/Default 0.534963059 s 0.548909331 s 0.97
DGCNN [3, 128, 256]/primal/CPU/NoOpt 0.098497035 s 0.096644299 s 1.02
fdtd_2d [1024, 2048, 256]/primal/CPU/Default_manual_vectorized 0.844111791 s 0.865483869 s 0.98
covariance [2048, 2048]/primal/CPU/Julia 22.94589364 s 22.946513432 s 1.00
syr2k [2048]/primal/CPU/Default_manual_vectorized 0.019910847 s 0.019253731 s 1.03
DeepONet ([64, 1024], [1, 128])/reverse/CPU/NoOpt 0.004097683 s 0.004432937 s 0.92
DGCNN [3, 128, 256]/reverse/CPU/DefaultAll 0.371169488 s 0.369851723 s 1.00
bloch_rf [8192 spins]/reverse/CPU/Julia 0.275172095 s 0.27385626500000004 s 1.00
bloch_rf [8192 spins]/reverse/CPU/Default_Checkpointing 0.118576998 s 0.119207904 s 0.99
3mm [256, 1024, 2048, 4096]/primal/CPU/Julia 15.775128842 s 15.034657282000001 s 1.05
NewtonSchulz [256 x 256]/primal/CPU/Julia 0.0036593940000000003 s 0.0033585240000000003 s 1.09
NewtonSchulz [256 x 256]/primal/CPU/StructuredTensors (Only Detection) 0.008095428 s 0.009733051 s 0.83
bloch_rf [16384 spins]/reverse/CPU/Default_NoBatching_Checkpointing 0.490487018 s 0.477110539 s 1.03
syr2k [2048]/primal/CPU/Julia 39.021939283 s 39.007653261 s 1.00
heat_3d [128, 128, 128, 256]/primal/CPU/Julia 7.8999029830000005 s 11.672141676 s 0.68
mvt [4096]/primal/CPU/Default 0.007581925 s 0.007498994 s 1.01
bloch_rf [8192 spins]/reverse/CPU/Default_NoBatching_Checkpointing 0.236782268 s 0.235839559 s 1.00
NewtonSchulz [4096 x 4096]/primal/CPU/Julia 4.686436541 s 4.64022502 s 1.01
FNO [64, 64, 1, 4]/primal/CPU/NoOpt 0.071988439 s 0.071582932 s 1.01
DGCNN [3, 128, 256]/primal/CPU/DisableTransposeReshape 0.09569326 s 0.095659247 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/CPU/DefaultAfterEnzyme 0.004559171 s 0.004409539 s 1.03
FNO [64, 64, 1, 4]/primal/CPU/Default 0.070489744 s 0.070216447 s 1.00
bloch_rf [1024 spins]/reverse/CPU/Default_NoBatching 0.025769687 s 0.023443795 s 1.10
bloch_rf [1024 spins]/reverse/CPU/Default_NoBatching_Checkpointing 0.053763732 s 0.056046882 s 0.96
gemm [2048, 4096]/primal/CPU/Default_manual_vectorized 0.019783398 s 0.020302257 s 0.97
syrk [2048]/primal/CPU/Default 0.009323441 s 0.009327635 s 1.00
3mm [256, 1024, 2048, 4096]/primal/CPU/Default_manual_vectorized 0.006767557 s 0.006828708 s 0.99
syr2k [2048]/primal/CPU/Default 0.018834526 s 0.019498516 s 0.97
bloch_rf [16384 spins]/reverse/CUDA/Default_NoBatching_Checkpointing 0.264304473 s 0.261457892 s 1.01
NewtonSchulz [256 x 256]/primal/CUDA/Default 0.000574662 s 0.000554159 s 1.04
bloch_rf [1024 spins]/reverse/CUDA/Default_NoBatching 0.073729375 s 0.073598594 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableScatterGatherPadAll 0.010376032 s 0.010298557 s 1.01
gemmver [2048]/primal/CUDA/Default_manual_vectorized 0.000060768 s 0.000058257 s 1.04
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableTransposeReshapeAll 0.01054224 s 0.0104369 s 1.01
syr2k [2048]/primal/CUDA/Default 0.000499565 s 0.000499709 s 1.00
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultAll 0.003151842 s 0.003121849 s 1.01
atax [2048]/primal/CUDA/Default_manual_vectorized 0.000025875 s 0.000025575 s 1.01
gesummv [4096]/primal/CUDA/Default_manual_vectorized 0.000206178 s 0.000206803 s 1.00
fdtd_2d [1024, 2048, 256]/primal/CUDA/Default_manual_vectorized 0.028675672 s 0.027458502 s 1.04
bloch_rf [8192 spins]/reverse/CUDA/Default_NoBatching_Checkpointing 0.261291461 s 0.256914282 s 1.02
VGG11 bn=true [224, 224, 3, 4]/primal/CUDA/Default 0.001917002 s 0.001904037 s 1.01
gemm [2048, 4096]/primal/CUDA/Default 0.000446583 s 0.000445421 s 1.00
bloch_rf [16384 spins]/reverse/CUDA/Default 0.034818093 s 0.034181084 s 1.02
bloch_rf [8192 spins]/reverse/CUDA/Default_Checkpointing 0.147638926 s 0.146332433 s 1.01
ViT tiny [256, 256, 3, 4]/primal/CUDA/NoOpt 0.003158564 s 0.00314326 s 1.00
3mm [256, 1024, 2048, 4096]/primal/CUDA/Default_manual_vectorized 0.000159101 s 0.000159924 s 0.99
jacobi_2d [512, 512, 1024]/primal/CUDA/Default_manual_vectorized 0.02091506 s 0.020097093 s 1.04
NewtonSchulz [1024 x 1024]/primal/CUDA/StructuredTensors 0.00701279 s 0.00686823 s 1.02
ViT tiny [256, 256, 3, 4]/primal/CUDA/Default 0.002877501 s 0.002651615 s 1.09
DGCNN [3, 128, 256]/primal/CUDA/DisableTransposeReshape 0.001247968 s 0.001242163 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisableScatterGatherAll 0.010460293 s 0.01062204 s 0.98
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeAfterEnzyme 0.003393965 s 0.003389311 s 1.00
bloch_rf [16384 spins]/reverse/CUDA/Default_Checkpointing 0.145979048 s 0.146557366 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultAll 0.000527248 s 0.000575929 s 0.92
bloch_rf [128 spins]/reverse/CUDA/Default 0.037041434 s 0.034459056 s 1.07
gemm [2048, 4096]/primal/CUDA/Default_manual_vectorized 0.00044178 s 0.000442725 s 1.00
DeepONet ([64, 1024], [1, 128])/primal/CUDA/Default 0.000162614 s 0.000213511 s 0.76
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeAll 0.003345437 s 0.003340038 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultAfterEnzyme 0.007075102 s 0.007045761 s 1.00
NewtonSchulz [4096 x 4096]/primal/CUDA/StructuredTensors (Only Detection) 0.109337738 s 0.109336512 s 1.00
covariance [2048, 2048]/primal/CUDA/Default_manual_vectorized 0.000262827 s 0.000263588 s 1.00
NewtonSchulz [4096 x 4096]/primal/CUDA/StructuredTensors 0.109759335 s 0.110324879 s 0.99
NewtonSchulz [4096 x 4096]/primal/CUDA/Default 0.043339614 s 0.043750541 s 0.99
correlation [2048, 2048]/primal/CUDA/Default 0.000295359 s 0.000296845 s 0.99
3mm [256, 1024, 2048, 4096]/primal/CUDA/Default 0.000158985 s 0.000161757 s 0.98
doitgen [256, 1024, 512]/primal/CUDA/Default_manual_vectorized 0.001468615 s 0.001470557 s 1.00
jacobi_1d [2048, 1024]/primal/CUDA/Default_manual_vectorized 0.024257626 s 0.024096308 s 1.01
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultBeforeEnzyme 0.007131914 s 0.007119492 s 1.00
DGCNN [3, 128, 256]/primal/CUDA/NoOpt 0.001266264 s 0.001263411 s 1.00
covariance [2048, 2048]/primal/CUDA/Default 0.000283074 s 0.000286435 s 0.99
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DisablePadAll 0.010433047 s 0.010435377 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/NoOpt 0.007182739 s 0.007157338 s 1.00
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultAfterEnzyme 0.003252567 s 0.003126927 s 1.04
DGCNN [3, 128, 256]/reverse/CUDA/DefaultAfterEnzyme 0.00303201 s 0.003024562 s 1.00
DGCNN [3, 128, 256]/reverse/CUDA/DefaultAll 0.003520211 s 0.003537342 s 1.00
heat_3d [128, 128, 128, 256]/primal/CUDA/Default 0.013342762 s 0.013134791 s 1.02
mvt [4096]/primal/CUDA/Default_manual_vectorized 0.000107014 s 0.000110445 s 0.97
VGG11 bn=true [224, 224, 3, 4]/reverse/CUDA/DefaultAll 0.007069624 s 0.007038347 s 1.00
doitgen [256, 1024, 512]/primal/CUDA/Default 0.002241755 s 0.002241956 s 1.00
DGCNN [3, 128, 256]/primal/CUDA/Default 0.000974305 s 0.000971581 s 1.00
VGG11 bn=true [224, 224, 3, 4]/primal/CUDA/NoOpt 0.001939534 s 0.001926944 s 1.01
bloch_rf [1024 spins]/reverse/CUDA/Default_NoBatching_Checkpointing 0.261730497 s 0.257782645 s 1.02
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/NoOpt 0.000634165 s 0.000644931 s 0.98
syrk [2048]/primal/CUDA/Default_manual_vectorized 0.000254876 s 0.000255133 s 1.00
gemmver [2048]/primal/CUDA/Default 0.000044986 s 0.00004434 s 1.01
NewtonSchulz [256 x 256]/primal/CUDA/StructuredTensors 0.001436332 s 0.001411867 s 1.02
DGCNN [3, 128, 256]/reverse/CUDA/NoOpt 0.006571116 s 0.006575345 s 1.00
bloch_rf [128 spins]/reverse/CUDA/Default_Checkpointing 0.147163896 s 0.151287582 s 0.97
bicg [2048, 4096]/primal/CUDA/Default 0.000055485 s 0.00006569 s 0.84
DGCNN [3, 128, 256]/reverse/CUDA/DefaultBeforeEnzyme 0.006508961 s 0.006538191 s 1.00
bicg [2048, 4096]/primal/CUDA/Default_manual_vectorized 0.000053565 s 0.000054109 s 0.99
2mm [2048]/primal/CUDA/Default 0.000462545 s 0.000462548 s 1.00
jacobi_1d [2048, 1024]/primal/CUDA/Default 0.020627327 s 0.020134235 s 1.02
heat_3d [128, 128, 128, 256]/primal/CUDA/Default_manual_vectorized 0.012847875 s 0.013123787 s 0.98
bloch_rf [16384 spins]/reverse/CUDA/Default_NoBatching 0.083654162 s 0.082095426 s 1.02
jacobi_2d [512, 512, 1024]/primal/CUDA/Default 0.020403413 s 0.020061832 s 1.02
correlation [2048, 2048]/primal/CUDA/Default_manual_vectorized 0.000268405 s 0.000268627 s 1.00
bloch_rf [8192 spins]/reverse/CUDA/Default_NoBatching 0.072826295 s 0.07273783 s 1.00
bloch_rf [1024 spins]/reverse/CUDA/Default_Checkpointing 0.147529895 s 0.145460278 s 1.01
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultBeforeEnzyme 0.000521865 s 0.000581555 s 0.90
2mm [2048]/primal/CUDA/Default_manual_vectorized 0.000465735 s 0.000465898 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/CUDA/NoOpt 0.01163171 s 0.010929207 s 1.06
FNO [64, 64, 1, 4]/reverse/CUDA/DefaultBeforeEnzyme 0.003194712 s 0.003147606 s 1.01
bloch_rf [1024 spins]/reverse/CUDA/Default 0.036278792 s 0.035616641 s 1.02
NewtonSchulz [1024 x 1024]/primal/CUDA/Default 0.001847829 s 0.001829829 s 1.01
atax [2048]/primal/CUDA/Default 0.000029872 s 0.000025562 s 1.17
gesummv [4096]/primal/CUDA/Default 0.000108406 s 0.000106736 s 1.02
ViT tiny [256, 256, 3, 4]/reverse/CUDA/DefaultAll 0.010504191 s 0.010640459 s 0.99
syr2k [2048]/primal/CUDA/Default_manual_vectorized 0.0004996 s 0.000499725 s 1.00
DGCNN [3, 128, 256]/reverse/CUDA/DisableTransposeReshapeBeforeEnzyme 0.014071049 s 0.014599092 s 0.96
bloch_rf [128 spins]/reverse/CUDA/Default_NoBatching 0.076232127 s 0.07602372 s 1.00
FNO [64, 64, 1, 4]/reverse/CUDA/NoOpt 0.003336913 s 0.00336687 s 0.99
bloch_rf [8192 spins]/reverse/CUDA/Default 0.035577678 s 0.034350543 s 1.04
FNO [64, 64, 1, 4]/primal/CUDA/NoOpt 0.001149151 s 0.001131692 s 1.02
DeepONet ([64, 1024], [1, 128])/reverse/CUDA/DefaultAfterEnzyme 0.000517677 s 0.000545863 s 0.95
NewtonSchulz [256 x 256]/primal/CUDA/StructuredTensors (Only Detection) 0.001369076 s 0.001349102 s 1.01
DeepONet ([64, 1024], [1, 128])/primal/CUDA/NoOpt 0.00021077 s 0.000212637 s 0.99
mvt [4096]/primal/CUDA/Default 0.000107007 s 0.000109972 s 0.97
syrk [2048]/primal/CUDA/Default 0.000255953 s 0.000256032 s 1.00
fdtd_2d [1024, 2048, 256]/primal/CUDA/Default 0.025073715 s 0.025553858 s 0.98
NewtonSchulz [1024 x 1024]/primal/CUDA/StructuredTensors (Only Detection) 0.005458217 s 0.005679814 s 0.96
bloch_rf [128 spins]/reverse/CUDA/Default_NoBatching_Checkpointing 0.415549826 s 0.399026405 s 1.04
FNO [64, 64, 1, 4]/primal/CUDA/Default 0.00108504 s 0.00107467 s 1.01
ViT tiny [256, 256, 3, 4]/primal/TPU/Default 0.000217708 s 0.000217818 s 1.00
NewtonSchulz [4096 x 4096]/primal/TPU/Default 0.024508124 s 0.024705897 s 0.99
bloch_rf [1024 spins]/reverse/TPU/Default_Checkpointing 0.003438832 s 0.003438864 s 1.00
FNO [64, 64, 1, 4]/reverse/TPU/DefaultAll 0.003093378 s 0.003093037 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeAll 0.005004825 s 0.005005214 s 1.00
syrk [2048]/primal/TPU/Default_manual_vectorized 0.000031548 s 0.000031535 s 1.00
bloch_rf [16384 spins]/reverse/TPU/Default_NoBatching 0.047648354 s 0.047636619 s 1.00
ViT tiny [256, 256, 3, 4]/primal/TPU/NoOpt 0.000586544 s 0.000586181 s 1.00
FNO [64, 64, 1, 4]/primal/TPU/NoOpt 0.001138321 s 0.001138033 s 1.00
gemm [2048, 4096]/primal/TPU/Default 0.000072625 s 0.000072509 s 1.00
3mm [256, 1024, 2048, 4096]/primal/TPU/Default_manual_vectorized 0.000016375 s 0.000016401 s 1.00
atax [2048]/primal/TPU/Default_manual_vectorized 0.000024208 s 0.000024146 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableScatterGatherAll 0.001698505 s 0.001698623 s 1.00
syr2k [2048]/primal/TPU/Default 0.000058343 s 0.000058219 s 1.00
2mm [2048]/primal/TPU/Default_manual_vectorized 0.000074958 s 0.000074929 s 1.00
FNO [64, 64, 1, 4]/primal/TPU/Default 0.000953001 s 0.000953054 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultBeforeEnzyme 0.004179579 s 0.004179874 s 1.00
DeepONet ([64, 1024], [1, 128])/primal/TPU/Default 0.000006047 s 0.000006087 s 0.99
doitgen [256, 1024, 512]/primal/TPU/Default_manual_vectorized 0.001088949 s 0.001086157 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableScatterGatherPadAll 0.001697239 s 0.001697514 s 1.00
doitgen [256, 1024, 512]/primal/TPU/Default 0.001579798 s 0.001579097 s 1.00
DGCNN [3, 128, 256]/primal/TPU/DisableTransposeReshape 0.002865498 s 0.002865397 s 1.00
bloch_rf [1024 spins]/reverse/TPU/Default 0.001323008 s 0.001323031 s 1.00
FNO [64, 64, 1, 4]/reverse/TPU/DefaultAfterEnzyme 0.003085456 s 0.003085515 s 1.00
bloch_rf [128 spins]/reverse/TPU/Default 0.000694924 s 0.000694866 s 1.00
mvt [4096]/primal/TPU/Default 0.000045023 s 0.000045082 s 1.00
bloch_rf [16384 spins]/reverse/TPU/Default 0.026236881 s 0.026240626 s 1.00
gemmver [2048]/primal/TPU/Default 0.000036403 s 0.00003654 s 1.00
FNO [64, 64, 1, 4]/reverse/TPU/NoOpt 0.002959794 s 0.002960119 s 1.00
syr2k [2048]/primal/TPU/Default_manual_vectorized 0.000057948 s 0.000057998 s 1.00
bloch_rf [8192 spins]/reverse/TPU/Default_NoBatching 0.033746967 s 0.033754872 s 1.00
bloch_rf [8192 spins]/reverse/TPU/Default_Checkpointing 0.019418776 s 0.0194229 s 1.00
fdtd_2d [1024, 2048, 256]/primal/TPU/Default 0.018833155 s 0.018833106 s 1.00
covariance [2048, 2048]/primal/TPU/Default_manual_vectorized 0.000047606 s 0.000047646 s 1.00
heat_3d [128, 128, 128, 256]/primal/TPU/Default 0.26175386 s 0.261753897 s 1.00
bloch_rf [128 spins]/reverse/TPU/Default_NoBatching_Checkpointing 0.017732083 s 0.01791452 s 0.99
NewtonSchulz [256 x 256]/primal/TPU/StructuredTensors (Only Detection) 0.000019731 s 0.000019735 s 1.00
gemmver [2048]/primal/TPU/Default_manual_vectorized 0.000037325 s 0.000037471 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/NoOpt 0.004056119 s 0.004055074 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DefaultBeforeEnzyme 0.004752407 s 0.004751814 s 1.00
bicg [2048, 4096]/primal/TPU/Default_manual_vectorized 0.000023459 s 0.000023456 s 1.00
NewtonSchulz [1024 x 1024]/primal/TPU/StructuredTensors 0.00022266 s 0.0002227 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisablePadAll 0.00169829 s 0.001697866 s 1.00
bloch_rf [1024 spins]/reverse/TPU/Default_NoBatching_Checkpointing 0.018718537 s 0.019394475 s 0.97
3mm [256, 1024, 2048, 4096]/primal/TPU/Default 0.000016374 s 0.000016327 s 1.00
jacobi_1d [2048, 1024]/primal/TPU/Default 0.008143506 s 0.008144584 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DefaultAfterEnzyme 0.004670825 s 0.00467102 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/DisableTransposeReshapeAll 0.00143745 s 0.001437695 s 1.00
syrk [2048]/primal/TPU/Default 0.000030314 s 0.000030265 s 1.00
correlation [2048, 2048]/primal/TPU/Default_manual_vectorized 0.000053189 s 0.000053206 s 1.00
bloch_rf [128 spins]/reverse/TPU/Default_Checkpointing 0.001658265 s 0.001658484 s 1.00
DGCNN [3, 128, 256]/primal/TPU/NoOpt 0.002867399 s 0.002867601 s 1.00
bloch_rf [16384 spins]/reverse/TPU/Default_NoBatching_Checkpointing 0.035670511 s 0.03566848 s 1.00
NewtonSchulz [1024 x 1024]/primal/TPU/StructuredTensors (Only Detection) 0.000222784 s 0.000222713 s 1.00
VGG11 bn=true [224, 224, 3, 4]/primal/TPU/Default 0.000813696 s 0.000813646 s 1.00
NewtonSchulz [256 x 256]/primal/TPU/Default 0.000018987 s 0.000018965 s 1.00
jacobi_1d [2048, 1024]/primal/TPU/Default_manual_vectorized 0.005707607 s 0.005708007 s 1.00
FNO [64, 64, 1, 4]/reverse/TPU/DefaultBeforeEnzyme 0.003093487 s 0.003093686 s 1.00
bloch_rf [8192 spins]/reverse/TPU/Default_NoBatching_Checkpointing 0.024888531 s 0.024891187 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultBeforeEnzyme 0.000027333 s 0.00002735 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultAfterEnzyme 0.004180059 s 0.004180592 s 1.00
gesummv [4096]/primal/TPU/Default 0.000087436 s 0.000087412 s 1.00
covariance [2048, 2048]/primal/TPU/Default 0.000051871 s 0.000051776 s 1.00
gesummv [4096]/primal/TPU/Default_manual_vectorized 0.000087393 s 0.000087481 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/NoOpt 0.002020693 s 0.002020436 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/NoOpt 0.005226905 s 0.005227528 s 1.00
NewtonSchulz [1024 x 1024]/primal/TPU/Default 0.00022076 s 0.00022078 s 1.00
bicg [2048, 4096]/primal/TPU/Default 0.00002348 s 0.000023427 s 1.00
fdtd_2d [1024, 2048, 256]/primal/TPU/Default_manual_vectorized 0.027117804 s 0.027117819 s 1.00
jacobi_2d [512, 512, 1024]/primal/TPU/Default_manual_vectorized 0.02207921 s 0.022079209 s 1.00
jacobi_2d [512, 512, 1024]/primal/TPU/Default 0.0266761 s 0.026676079 s 1.00
gemm [2048, 4096]/primal/TPU/Default_manual_vectorized 0.000072719 s 0.000072801 s 1.00
ViT tiny [256, 256, 3, 4]/reverse/TPU/DefaultAll 0.001698392 s 0.001698055 s 1.00
NewtonSchulz [4096 x 4096]/primal/TPU/StructuredTensors (Only Detection) 0.024553534 s 0.02476967 s 0.99
2mm [2048]/primal/TPU/Default 0.000086594 s 0.000086593 s 1.00
NewtonSchulz [256 x 256]/primal/TPU/StructuredTensors 0.000019743 s 0.000019746 s 1.00
bloch_rf [16384 spins]/reverse/TPU/Default_Checkpointing 0.085420627 s 0.08541824 s 1.00
DGCNN [3, 128, 256]/primal/TPU/Default 0.002349152 s 0.002349392 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultAfterEnzyme 0.000027326 s 0.00002731 s 1.00
DeepONet ([64, 1024], [1, 128])/primal/TPU/NoOpt 0.000006327 s 0.000006281 s 1.01
atax [2048]/primal/TPU/Default 0.000024192 s 0.000024112 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeAfterEnzyme 0.005179774 s 0.005180388 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DefaultAll 0.004673113 s 0.004672252 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/TPU/NoOpt 0.000027424 s 0.000027524 s 1.00
VGG11 bn=true [224, 224, 3, 4]/reverse/TPU/DefaultAll 0.004180245 s 0.004180778 s 1.00
DeepONet ([64, 1024], [1, 128])/reverse/TPU/DefaultAll 0.000027259 s 0.000027323 s 1.00
VGG11 bn=true [224, 224, 3, 4]/primal/TPU/NoOpt 0.000861191 s 0.000861071 s 1.00
DGCNN [3, 128, 256]/reverse/TPU/DisableTransposeReshapeBeforeEnzyme 0.005004828 s 0.005004994 s 1.00
mvt [4096]/primal/TPU/Default_manual_vectorized 0.000045565 s 0.000045534 s 1.00
bloch_rf [128 spins]/reverse/TPU/Default_NoBatching 0.006970248 s 0.006971372 s 1.00
bloch_rf [1024 spins]/reverse/TPU/Default_NoBatching 0.007860035 s 0.007846746 s 1.00
NewtonSchulz [4096 x 4096]/primal/TPU/StructuredTensors 0.024573923 s 0.024616768 s 1.00
bloch_rf [8192 spins]/reverse/TPU/Default 0.016541352 s 0.016542545 s 1.00
heat_3d [128, 128, 128, 256]/primal/TPU/Default_manual_vectorized 0.261753842 s 0.261753882 s 1.00
correlation [2048, 2048]/primal/TPU/Default 0.000055706 s 0.000055745 s 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jlk9
Copy link
Collaborator Author

jlk9 commented Feb 6, 2026

Additionally, checkpointing is set for this for loop:
https://github.com/EnzymeAD/Reactant.jl/blob/9227d57f248250440ef77a25d954d9a3b629e1a2/benchmark/oceananigans/abernathey_channel.jl#L280C1-L286C4

To enable/disable checkpointing you need to change the checkpointing = true line there.

@avik-pal avik-pal force-pushed the jlk9/oceananigans-benchmark branch 4 times, most recently from f6ba7a4 to cb73cf3 Compare February 11, 2026 03:03
@wsmoses
Copy link
Member

wsmoses commented Feb 11, 2026

- Run `import Pkg; Pkg.add("Chairmarks")` to install the Chairmarks package.

@avik-pal avik-pal force-pushed the jlk9/oceananigans-benchmark branch 4 times, most recently from da19e01 to 0c7728a Compare February 11, 2026 11:48
@wsmoses
Copy link
Member

wsmoses commented Feb 11, 2026

@jlk9 seems like a setup error?


ERROR: LoadError: MethodError: no method matching Duplicated(::Reactant.TracedRArray{Float64, 2}, ::Matrix{Float64})
The type `Duplicated` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  Duplicated(::T1, ::T1) where T1
   @ EnzymeCore ~/.julia/packages/EnzymeCore/RpjpI/src/EnzymeCore.jl:68
  Duplicated(::T1, ::T1, ::Bool) where T1
   @ EnzymeCore ~/.julia/packages/EnzymeCore/RpjpI/src/EnzymeCore.jl:68
  Duplicated(::T1, ::T1) where T1<:SubArray
   @ EnzymeCore ~/.julia/packages/EnzymeCore/RpjpI/src/EnzymeCore.jl:69
  ...

Stacktrace:
  [1] macro expansion
    @ /__w/Reactant.jl/Reactant.jl/src/utils.jl:0 [inlined]
  [2] call_with_reactant(::Reactant.EnsureReturnType{Union{}}, ::Type{Duplicated}, ::Reactant.TracedRArray{Float64, 2}, ::Matrix{Float64})
    @ Reactant /__w/Reactant.jl/Reactant.jl/src/utils.jl:1177
  [3] call_with_reactant(::typeof(Main.AbernatheyChannel.differentiate_tracer_error), 

@avik-pal
Copy link
Collaborator

That's a screwup from me


function loop!(model)
Δt = model.clock.last_Δt
@trace mincut = true checkpointing = true track_numbers = false for i in 1:100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially for a follow up, @avik-pal we should try testing checkpointing on vs off, and also optimizations on vs off

@avik-pal avik-pal force-pushed the jlk9/oceananigans-benchmark branch 2 times, most recently from fa36b03 to 3e6cdda Compare February 11, 2026 15:06
@jlk9
Copy link
Collaborator Author

jlk9 commented Feb 12, 2026

@avik-pal I think the benchmark has too many timesteps in the autodiff run. Locally having 10 steps in loop! passes but 100 (what it is in this PR now) fails. I'm testing if 10 passes on the reduced benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants