Skip to content

Clean up unnecessary splatting #4383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
fd00e3f
remove the args...
simone-silvestri Apr 10, 2025
613a39c
remove more splatting
simone-silvestri Apr 10, 2025
611c667
remove this
simone-silvestri Apr 10, 2025
7cae280
cleanup val
simone-silvestri Apr 10, 2025
af01250
add back implicit step
simone-silvestri Apr 10, 2025
e80e73e
this is not splatting
simone-silvestri Apr 10, 2025
ece82ea
more cleaning
simone-silvestri Apr 10, 2025
5bb66b1
remove more vals
simone-silvestri Apr 10, 2025
0b6c669
Merge branch 'main' into ss/fix-performance-2
simone-silvestri Apr 10, 2025
62f5e23
remove more vals
simone-silvestri Apr 10, 2025
6dbc155
Merge branch 'ss/fix-performance-2' of github.com:CliMA/Oceananigans.…
simone-silvestri Apr 10, 2025
50abc3c
remove more Vals
simone-silvestri Apr 10, 2025
8c12679
fix a potential bug
simone-silvestri Apr 11, 2025
c7e604b
Merge remote-tracking branch 'origin/main' into ss/fix-performance-2
simone-silvestri Apr 11, 2025
65e1f22
try it out like this
simone-silvestri Apr 11, 2025
e8ddf7f
restart from scratch
simone-silvestri Apr 15, 2025
7a1dbb2
Merge branch 'main' into ss/fix-performance-2
simone-silvestri Apr 15, 2025
eea3be0
also this one
simone-silvestri Apr 15, 2025
1e642e0
benchmark performance
simone-silvestri Apr 15, 2025
e063730
add the benchmark
simone-silvestri Apr 15, 2025
7947a47
continue later on
simone-silvestri Apr 15, 2025
68fc30c
issues
simone-silvestri Apr 15, 2025
bbc2c56
remove one dispatch
simone-silvestri Apr 16, 2025
cbf7dd5
go ahead
simone-silvestri Apr 16, 2025
3d6a671
mask also the implicit coefficient
simone-silvestri Apr 16, 2025
d45f26e
more cleanup
simone-silvestri Apr 16, 2025
955d1ac
remove more args...
simone-silvestri Apr 16, 2025
4e5c4d9
remove more splatting
simone-silvestri Apr 16, 2025
4328f29
more cleanup
simone-silvestri Apr 16, 2025
e2d6444
remove more splatting
simone-silvestri Apr 16, 2025
3d3b3e1
remove more splatting
simone-silvestri Apr 16, 2025
f50bac4
more cleanup
simone-silvestri Apr 16, 2025
63963ee
remove more useless splatting
simone-silvestri Apr 16, 2025
158e78e
remove more splatting
simone-silvestri Apr 16, 2025
95ebad2
remove more splatting
simone-silvestri Apr 16, 2025
051ce23
back to correctness
simone-silvestri Apr 16, 2025
8a37641
bugfix
simone-silvestri Apr 16, 2025
92eb5f9
remove more splatting
simone-silvestri Apr 16, 2025
e1a84df
advection to scheme
simone-silvestri Apr 16, 2025
da7871c
advection to scheme
simone-silvestri Apr 16, 2025
47b3f06
Merge branch 'main' into ss/fix-performance-2
simone-silvestri Apr 16, 2025
fcbabf4
add the benchmark
simone-silvestri Apr 16, 2025
b07b4e5
FMA
simone-silvestri Apr 16, 2025
1506eed
muladd
simone-silvestri Apr 16, 2025
7ffec65
Merge remote-tracking branch 'origin/ss/fused-multiply-add' into ss/f…
simone-silvestri Apr 16, 2025
6f7ed98
test it
simone-silvestri Apr 16, 2025
ad7a051
Merge branch 'main' into ss/fix-performance-2
simone-silvestri Apr 17, 2025
d8027a4
Merge remote-tracking branch 'origin/main' into ss/fix-performance-2
simone-silvestri Apr 26, 2025
db07dfd
bugfix
simone-silvestri Apr 26, 2025
ff09ee4
Merge branch 'main' into ss/fix-performance-2
simone-silvestri Apr 29, 2025
cbe65fa
Merge branch 'main' into ss/fix-performance-2
simone-silvestri May 8, 2025
bac4892
Merge branch 'main' into ss/fix-performance-2
simone-silvestri May 22, 2025
a7d1dcc
Merge branch 'main' into ss/fix-performance-2
navidcy May 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 94 additions & 103 deletions benchmark/Manifest.toml

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion benchmark/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
FFTW = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
KrylovPreconditioners = "45d422c2-293f-44ce-8315-2cb988662dec"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
Expand Down
60 changes: 25 additions & 35 deletions benchmark/benchmark_hydrostatic_model.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@ push!(LOAD_PATH, joinpath(@__DIR__, ".."))
using Benchmarks

using Oceananigans.TimeSteppers: update_state!
using Oceananigans.Diagnostics: accurate_cell_advection_timescale

using BenchmarkTools
using CUDA
using Oceananigans
using Oceananigans.Models.HydrostaticFreeSurfaceModels: DiagonallyDominantInversePreconditioner
using Statistics
using Oceananigans.Solvers

# Problem size
Nx = 256
Ny = 128
Nx = 512
Ny = 256

function set_divergent_velocity!(model)
# Create a divergent velocity
Expand All @@ -33,31 +33,23 @@ function set_divergent_velocity!(model)

return nothing
end

random_vector = - 0.5 .* rand(Nx, Ny) .- 0.5
bottom_height(arch) = GridFittedBottom(Oceananigans.on_architecture(arch, random_vector))
rgrid(arch) = RectilinearGrid(arch, size=(Nx, Ny, 1), extent=(1, 1, 1), halo = (3, 3, 3))
lgrid(arch) = LatitudeLongitudeGrid(arch, size=(Nx, Ny, 1), longitude=(-180, 180), latitude=(-80, 80), z=(-1, 0), halo = (3, 3, 3))

grids = Dict(
(CPU, :RectilinearGrid) => RectilinearGrid(CPU(), size=(Nx, Ny, 1), extent=(1, 1, 1)),
(CPU, :LatitudeLongitudeGrid) => LatitudeLongitudeGrid(CPU(), size=(Nx, Ny, 1), longitude=(-180, 180), latitude=(-80, 80), z=(-1, 0), precompute_metrics=true),
(GPU, :RectilinearGrid) => RectilinearGrid(GPU(), size=(Nx, Ny, 1), extent=(1, 1, 1)),
(GPU, :LatitudeLongitudeGrid) => LatitudeLongitudeGrid(GPU(), size=(Nx, Ny, 1), longitude=(-160, 160), latitude=(-80, 80), z=(-1, 0), precompute_metrics=true),
(CPU, :RectilinearGrid) => rgrid(CPU()),
(CPU, :LatitudeLongitudeGrid) => lgrid(CPU()),
(CPU, :ImmersedRecGrid) => ImmersedBoundaryGrid(rgrid(CPU()), bottom_height(GPU())),
(CPU, :ImmersedLatGrid) => ImmersedBoundaryGrid(lgrid(CPU()), bottom_height(GPU())),
(GPU, :RectilinearGrid) => rgrid(GPU()),
(GPU, :LatitudeLongitudeGrid) => lgrid(GPU()),
(GPU, :ImmersedRecGrid) => ImmersedBoundaryGrid(rgrid(GPU()), bottom_height(GPU())),
(GPU, :ImmersedLatGrid) => ImmersedBoundaryGrid(lgrid(CPU()), bottom_height(GPU()))
)

# Cubed sphere cases maybe worth considering eventually
#=
using DataDeps

ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

dd = DataDep("cubed_sphere_510_grid",
"Conformal cubed sphere grid with 510×510 grid points on each face",
"https://engaging-web.mit.edu/~alir/cubed_sphere_grids/cs510/cubed_sphere_510_grid.jld2"
)

DataDeps.register(dd)

# (CPU, :ConformalCubedSphereGrid) => ConformalCubedSphereGrid(datadep"cubed_sphere_510_grid/cubed_sphere_510_grid.jld2", Nz=1, z=(-1, 0)),
# (GPU, :ConformalCubedSphereGrid) => ConformalCubedSphereGrid(datadep"cubed_sphere_510_grid/cubed_sphere_510_grid.jld2", Nz=1, z=(-1, 0), architecture=GPU()),
=#

free_surfaces = Dict(
:ExplicitFreeSurface => ExplicitFreeSurface(),
:PCGImplicitFreeSurface => ImplicitFreeSurface(solver_method = :PreconditionedConjugateGradient),
Expand All @@ -74,11 +66,11 @@ function benchmark_hydrostatic_model(Arch, grid_type, free_surface_type)
grid = grids[(Arch, grid_type)]

model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection = VectorInvariant(),
free_surface = free_surfaces[free_surface_type])
momentum_advection = VectorInvariant(),
free_surface = free_surfaces[free_surface_type])

set_divergent_velocity!(model)
Δt = accurate_cell_advection_timescale(grid, model.velocities) / 2
Δt = Oceananigans.Advection.cell_advection_timescale(grid, model.velocities) / 2
time_step!(model, Δt) # warmup

trial = @benchmark begin
Expand All @@ -90,15 +82,13 @@ end

# Benchmark parameters

#architectures = has_cuda() ? [GPU] : [CPU]
architectures = [CPU]
architectures = has_cuda() ? [GPU, CPU] : [CPU]

grid_types = [
:RectilinearGrid,
:LatitudeLongitudeGrid,
# Uncomment when OrthogonalSphericalShellGrids of any size can be built natively without loading from file:
# :OrthogonalSphericalShellGrid,
# :ConformalCubedSphereGrid
:RectilinearGrid,
:LatitudeLongitudeGrid,
:ImmersedRecGrid,
:ImmersedLatGrid,
]

free_surface_types = collect(keys(free_surfaces))
Expand Down
128 changes: 128 additions & 0 deletions benchmark/benchmark_performance_issue.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
push!(LOAD_PATH, joinpath(@__DIR__, ".."))

using Benchmarks

using Oceananigans.TimeSteppers: update_state!
using BenchmarkTools
using CUDA
using Oceananigans
using Oceananigans.Operators
using Oceananigans.TurbulenceClosures
using Statistics
using Oceananigans.Solvers
using SeawaterPolynomials.TEOS10
using NVTX

CUDA.device!(2)

function ocean_benchmark(grid, closure)
momentum_advection = nothing # WENOVectorInvariant()
tracer_advection = WENO(order=7)
buoyancy = SeawaterBuoyancy(equation_of_state=TEOS10EquationOfState())
coriolis = nothing # HydrostaticSphericalCoriolis()
free_surface = nothing # SplitExplicitFreeSurface(grid; substeps=70)

model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection,
tracer_advection,
buoyancy,
coriolis,
closure,
free_surface,
tracers = (:T, :S, :e))

@info "Model is built"
return model
end

function benchmark_hydrostatic_model(Arch, grid_type, closure_type)

grid = grids[(Arch, grid_type)]
model = ocean_benchmark(grid, closures[closure_type])

T = 0.0001 .* rand(size(model.grid)) .+ 20
S = 0.0001 .* rand(size(model.grid)) .+ 35

set!(model.tracers.T, T)
set!(model.tracers.S, S)

set!(model.velocities.u, (x, y, z) -> 1e-6 * rand())
set!(model.velocities.v, (x, y, z) -> 1e-6 * rand())

Δt = 1
for _ in 1:30
time_step!(model, Δt) # warmup
end

# Make sure we do not have any NaN or Inf values anywhere
fields = Oceananigans.fields(model)
for (key, field) in zip(propertynames(fields), fields)
arr = Array(interior(field))
@assert all(isfinite.(arr)) || @show "Nan in $key"
@assert all(Array(arr) .< 1e10) || @show "Inf in $key"
@show key, extrema(arr)
end

for _ in 1:10
Oceananigans.TimeSteppers.compute_tendencies!(model, [])
end

# NVTX.@range "compute tendencies" begin
trial = @benchmark begin
Oceananigans.TimeSteppers.compute_tendencies!($model, [])
end samples = 100

return trial
end

# Problem size
Nx = 200
Ny = 200
Nz = 50

random_vector = - 5000 .* rand(Nx, Ny)

bottom_height(arch) = GridFittedBottom(Oceananigans.on_architecture(arch, random_vector))
lgrid(arch) = LatitudeLongitudeGrid(arch, size=(Nx, Ny, Nz),
longitude=(0, 360),
latitude=(-75, 75),
z=collect(range(-5000, 0, length=51)),
halo=(7, 7, 7))

grids = Dict(
(CPU, :LatitudeLongitudeGrid) => lgrid(CPU()),
(CPU, :ImmersedLatGrid) => ImmersedBoundaryGrid(lgrid(CPU()), bottom_height(CPU()); active_cells_map=true),
(GPU, :LatitudeLongitudeGrid) => lgrid(GPU()),
(GPU, :ImmersedLatGrid) => ImmersedBoundaryGrid(lgrid(GPU()), bottom_height(GPU()); active_cells_map=true),
)

closures = Dict(
:DiffImplicit => VerticalScalarDiffusivity(TurbulenceClosures.VerticallyImplicitTimeDiscretization(), ν=1e-5 , κ=1.0),
:DiffExplicit => VerticalScalarDiffusivity(ν=1e-5, κ=1e-5),
:CATKEExplicit => Oceananigans.TurbulenceClosures.TKEBasedVerticalDiffusivities.CATKEVerticalDiffusivity(TurbulenceClosures.ExplicitTimeDiscretization()),
:CATKEImplicit => Oceananigans.TurbulenceClosures.TKEBasedVerticalDiffusivities.CATKEVerticalDiffusivity(),
)

# Benchmark parameters

architectures = has_cuda() ? [GPU] : [CPU]

grid_types = [
# :LatitudeLongitudeGrid,
:ImmersedLatGrid,
]

closure_types = collect(keys(closures))

# Run and summarize benchmarks
# print_system_info()
suite = run_benchmarks(benchmark_hydrostatic_model; architectures, grid_types, closure_types)
df = benchmarks_dataframe(suite)
@show df2 = sort(df)

for _ in 1:5
suite = run_benchmarks(benchmark_hydrostatic_model; architectures, grid_types, closure_types)
@show df2 = sort(df)
end

benchmarks_pretty_table(df, title="Hydrostatic model benchmarks")
24 changes: 12 additions & 12 deletions src/Advection/flux_form_advection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,18 @@ Adapt.adapt_structure(to, scheme::FluxFormAdvection{N, FT}) where {N, FT} =
Adapt.adapt(to, scheme.y),
Adapt.adapt(to, scheme.z))

@inline _advective_tracer_flux_x(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_tracer_flux_x(i, j, k, grid, advection.x, args...)
@inline _advective_tracer_flux_y(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_tracer_flux_y(i, j, k, grid, advection.y, args...)
@inline _advective_tracer_flux_z(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_tracer_flux_z(i, j, k, grid, advection.z, args...)
@inline _advective_tracer_flux_x(i, j, k, grid, scheme::FluxFormAdvection, U, c) = _advective_tracer_flux_x(i, j, k, grid, scheme.x, U, c)
@inline _advective_tracer_flux_y(i, j, k, grid, scheme::FluxFormAdvection, V, c) = _advective_tracer_flux_y(i, j, k, grid, scheme.y, V, c)
@inline _advective_tracer_flux_z(i, j, k, grid, scheme::FluxFormAdvection, W, c) = _advective_tracer_flux_z(i, j, k, grid, scheme.z, W, c)

@inline _advective_momentum_flux_Uu(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Uu(i, j, k, grid, advection.x, args...)
@inline _advective_momentum_flux_Vu(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Vu(i, j, k, grid, advection.y, args...)
@inline _advective_momentum_flux_Wu(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Wu(i, j, k, grid, advection.z, args...)
@inline _advective_momentum_flux_Uu(i, j, k, grid, scheme::FluxFormAdvection, U, u) = _advective_momentum_flux_Uu(i, j, k, grid, scheme.x, U, u)
@inline _advective_momentum_flux_Vu(i, j, k, grid, scheme::FluxFormAdvection, V, u) = _advective_momentum_flux_Vu(i, j, k, grid, scheme.y, V, u)
@inline _advective_momentum_flux_Wu(i, j, k, grid, scheme::FluxFormAdvection, W, u) = _advective_momentum_flux_Wu(i, j, k, grid, scheme.z, W, u)

@inline _advective_momentum_flux_Uv(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Uv(i, j, k, grid, advection.x, args...)
@inline _advective_momentum_flux_Vv(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Vv(i, j, k, grid, advection.y, args...)
@inline _advective_momentum_flux_Wv(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Wv(i, j, k, grid, advection.z, args...)
@inline _advective_momentum_flux_Uv(i, j, k, grid, scheme::FluxFormAdvection, U, v) = _advective_momentum_flux_Uv(i, j, k, grid, scheme.x, U, v)
@inline _advective_momentum_flux_Vv(i, j, k, grid, scheme::FluxFormAdvection, V, v) = _advective_momentum_flux_Vv(i, j, k, grid, scheme.y, V, v)
@inline _advective_momentum_flux_Wv(i, j, k, grid, scheme::FluxFormAdvection, W, v) = _advective_momentum_flux_Wv(i, j, k, grid, scheme.z, W, v)

@inline _advective_momentum_flux_Uw(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Uw(i, j, k, grid, advection.x, args...)
@inline _advective_momentum_flux_Vw(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Vw(i, j, k, grid, advection.y, args...)
@inline _advective_momentum_flux_Ww(i, j, k, grid, advection::FluxFormAdvection, args...) = _advective_momentum_flux_Ww(i, j, k, grid, advection.z, args...)
@inline _advective_momentum_flux_Uw(i, j, k, grid, scheme::FluxFormAdvection, U, w) = _advective_momentum_flux_Uw(i, j, k, grid, scheme.x, U, w)
@inline _advective_momentum_flux_Vw(i, j, k, grid, scheme::FluxFormAdvection, V, w) = _advective_momentum_flux_Vw(i, j, k, grid, scheme.y, V, w)
@inline _advective_momentum_flux_Ww(i, j, k, grid, scheme::FluxFormAdvection, W, w) = _advective_momentum_flux_Ww(i, j, k, grid, scheme.z, W, w)
Loading