Tests for Distributed architecture with AnelasticDynamics#441
Tests for Distributed architecture with AnelasticDynamics#441glwagner wants to merge 29 commits into
Conversation
Add true multi-rank (mpiexec -n 4) tests that validate AtmosphereModel across Partition(4,1,1), Partition(2,2,1), and Partition(1,4,1) using JLD2Writer/FieldTimeSeries for distributed output comparison. Fixes required for distributed support: - Use DistributedFourierTridiagonalPoissonSolver for multi-rank grids with FullyConnected topology - Skip MPI halo communication for column fields (reference state profiles) via only_local_halos=true - Add architecture(::AtmosphereModel) so JLD2Writer adds rank suffixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CompressibleDynamics requires hydrostatically balanced density initialization and a small fixed Δt (acoustic CFL limit), so the test helpers are refactored to dispatch on dynamics_type for initial conditions, time stepping, and MPI script generation. Tests cover single-rank inline comparison and multi-rank MPI (mpiexec -n 4) with all three partition layouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ogies AtmosphereModel validates that periodic dimensions have halo >= 2, but Oceananigans also requires halo <= size. Tests with size=1 in periodic x/y dimensions cannot satisfy both constraints. Fix by increasing grid sizes from 1 to 2 in horizontal dimensions, or switching to Flat topology for truly 1D column tests (dcmip2016_kessler, diagnostics hydrostatic pressure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Use escape_string() when interpolating file paths into generated Julia scripts so backslashes in Windows temp paths are not parsed as escape sequences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use (Flat, Flat, Bounded) topology for the single-cell parcel grid to satisfy the minimum halo >= 2 requirement for periodic dimensions added in this PR. A stationary parcel has no spatial extent in x/y, so Flat is the correct topology. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Ok, this takes a lot to run, 30-40 minutes on average. How about we run it in a single separate job, maybe only Julia v1.12 only on Ubuntu?
There was a problem hiding this comment.
let's review the tests carefully and only keep exactly what is needed
There was a problem hiding this comment.
ok i think i made the tests a lot cheaper (one was doing 1e4 iteration). That said I think a separate job is a good idea.
The temp directory from mktempdir() is cleaned up automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove validate_velocity_interpolation_halo and MINIMUM_VELOCITY_INTERPOLATION_HALO (the algorithm no longer requires halo >= 2 for periodic dimensions) - Remove only_local_halos=true from fill_halo_regions! calls on Flat column fields (Flat dimensions don't have MPI communication anyway) - Remove is_column_field helper (no longer needed) - Revert test grid sizes from (2, 2, ...) back to (1, 1, ...) - Revert stationary_parcel_model.jl grid back to original Periodic topology Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove all single-rank Distributed(Partition(1,1)) tests (Section A) since they don't actually test MPI communication - Keep only multi-rank MPI tests launched via mpiexec - Cap all simulations at stop_iteration=100 instead of stop_time to avoid expensive runs (the full physics test was doing ~10k iterations) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add only_local_halos=true to fill_halo_regions! calls on reference state
column fields (Field{Nothing, Nothing, Center}). Oceananigans' distributed
halo communication does not handle Flat-located dimensions correctly,
causing BoundsError when running with --check-bounds=yes on distributed
grids. This affects AnelasticDynamics reference density, pressure, and
temperature fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts: - atmosphere_model.jl: keep both architecture() method and _timestepper_uses_dynamics - update_atmosphere_model_state.jl: take main's extended periodic diagnostic indices - reference_states.jl: take main's generalized hydrostatic profiles with only_local_halos - test/Project.toml: include both MPI and Logging dependencies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@giordano the distributed tests pass locally, but I am not sure where to add them for CI purposes. How should we add the distributed tests? |
|
I'll take a look tomorrow. |
Not for me: etc for loads of tests. For reference: |
|
I see an identical error on the main branch, too (using a case I'm working on - not a test) with a There's some messages printing on top of each other due to MPI, but that |
|
In my particular case, the error is thrown at this line: Looking at the callback stack, it looks like the call stack leaves Just for completeness, and |
|
Sorry - I just saw glw/distributed-tests addressed this issue using I do see that 0526c73 explicitly removed |
- Remove fill_halo_regions!(dynamics.density) and fill_halo_regions!(prognostic_fields(model.formulation)) from compute_auxiliary_dynamics_variables! for CompressibleDynamics. Both fields are prognostic and already filled by the async prognostic fill in update_state!, synchronized by the momentum fill in compute_velocities!. - Remove fill_halo_regions!(model.formulation) from compute_auxiliary_thermodynamic_variables!. The formulation (potential temperature density) is a prognostic field already included in the async prognostic fill. Eliminates 3 redundant halo fills per RK3 stage (9 per time step), reducing distributed communication overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
I think this is a legit bug this is fixed by CliMA/Oceananigans.jl#5434 |
For CompressibleDynamics, temperature is recomputed from the equation of state in compute_auxiliary_dynamics_variables! which fills T halos. The earlier fill in compute_auxiliary_thermodynamic_variables! is redundant and wastes one fill_halo_regions! per RK3 stage. Add _fill_thermodynamic_halos! dispatch: default fills T halos, CompressibleDynamics method is a no-op. Saves 3 halo fills per timestep (1 per stage × 3 stages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompressibleDynamics do not work yet.