-
Notifications
You must be signed in to change notification settings - Fork 236
(0.96.28) Small improvements to NaNChecker #4470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good! thanks!
bunch of tests fail, e.g., https://buildkite.com/clima/oceananigans/builds/23183#01969c41-533f-4867-bc39-028df30cd953/6-37 Is it because the fallback NaNChecker(fields) = NaNChecker(fields, false) # default was removed? Or perhaps because the fallback has an arg and now it became kwarg? |
I think because somebody is using that constructor. I think its better to have just one constructor so I will try to fix the errors |
@taimoorsohail, should work now and test will (hopefully) pass |
Tests are still failing - having a look now |
I haven't been able to figure out why these tests are failing - other than that they don't seem to be related to the NaN checker changes as far as I can tell. |
This https://buildkite.com/clima/oceananigans/builds/23277#0196b2e0-8d99-4d3d-a91d-1a4e8c0275e9/18-1245 might seem unrelated but following the error stack trace it does originate from the nanchecker https://buildkite.com/clima/oceananigans/builds/23277#0196b2e0-8d99-4d3d-a91d-1a4e8c0275e9/18-1273 I'll have a look now. |
@glwagner, seems like hasnan breaks for MultiRegion (probably also Distributed?) fields? Here's a mwe: using Oceananigans
grid = RectilinearGrid(CPU(), size = (64, 3, 1), x = (0, 100e3), y = (0, 100e3), z = (-400, 0))
regions = 2
mrg = MultiRegionGrid(grid, partition = XPartition(regions))
model = HydrostaticFreeSurfaceModel(grid = mrg)
simulation = Simulation(model; Δt=0.1, stop_iteration = 10)
run!(simulation) julia> run!(simulation)
[2025/05/09 12:12:48.874] INFO Initializing simulation...
ERROR: MethodError: no method matching interior(::MultiRegionObject{…}, ::Tuple{…}, ::MultiRegionGrid{…}, ::MultiRegionObject{…})
Closest candidates are:
interior(::Field, ::Any...)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:407
interior(::FieldTimeSeries, ::Any...)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/OutputReaders/field_time_series.jl:724
interior(::OffsetArray, ::Any, ::Any, ::Any)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:406
...
Stacktrace:
[1] interior(f::Field{…})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:405
[2] any!(f::Function, r::Field{…}, a::Field{…}; condition::Nothing, mask::Bool, kwargs::@Kwargs{})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:682
[3] any!(f::Function, r::Field{…}, a::Field{…})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:673
[4] hasnan(field::Field{…}, checker::Oceananigans.Models.NaNChecker{…})
@ Oceananigans.Models ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Models/nan_checker.jl:40
[5] (::Oceananigans.Models.NaNChecker{…})(simulation::Simulation{…})
@ Oceananigans.Models ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Models/nan_checker.jl:49
[6] (::Callback{…})(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/callback.jl:15
[7] initialize!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:235
[8] time_step!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:136
[9] run!(sim::Simulation{…}; pickup::Bool)
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:105
[10] run!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:92
[11] top-level scope
@ REPL[75]:1
Some type information was truncated. Use `show(err)` to see complete types. |
Wondering if we need Oceananigans.jl/src/Models/nan_checker.jl Line 40 in 9454ebc
|
Actually on second thought I'm wondering how would this work on distributed architectures. Do we need to be calling |
shouldn't that work under the hood? same with distributed. otherwise users cannot reduce either without |
I think it should work after #4497 |
Is this ready to merge once tests pass? now that #4520 is merged? |
No. First #4497 needs to get in |
Thanks! |
Inspired by a discussion with @navidcy and @taimoorsohail, this PR implements two small improvements to the
NaNChecker
:I may add a few more improvements here, because more ideas are coming to mind...