Skip to content

Conversation

@hellkite500
Copy link
Contributor

@hellkite500 hellkite500 commented Sep 17, 2025

This feature allows the model engine to attempt dynamic mass balance introspection of BMI modules via a specified protocol which dictates the BMI variables names representing cumulative mass balance variables which can be inspected at any point after bmi_intialize() is called.

It is assumed that these variables balance mass within the model, e.g.

balance = mass_in - mass_out - mass_stored - mass_leaked

where

  • mass_in is the model's cumulative mass in (e.g. precipitation)
  • mass_out is the model's cumulative mass out (e.g. overland flow)
  • mass_stored is the model's currently stored mass (e.g. soil moisture reservoirs)
  • mass_leaked is cumulative mass leaked from the domain which is known loss to the model (e.g. deep groundwater)

When balance is greater than a configurable threshold (provided as a formulation parameter in the realization file) then this either prints a warning or throws an error, based on the configuration.

Additions

Configurable mass balance protocol implementation. The follow object definition in a formulations params object is used to configure the mass balance implementation.

"mass_balance":  {
    "fatal": true, 
    "tolerance": 1.0e-17,
    "check": true,
    "frequency": -1
}

Where

  • fatal, bool
    When true, a fatal exception is thrown when mass balance tolerance is violated.
    When false, a mass balance warning capturing details of the error is printed, but the simulation continues.
  • tolerance, float
    How far from 0 the mass balance can get before it is considered an error, i.e. balance > tolerance will trigger a mass balance error/warning.
  • check, bool
    Toggle the mass balance checking on or off
  • frequency, int
    How often, in number of timesteps, to check the mass balance in the model engine. If frequency is set to -1, then it will only be checked at the last time step.

Changes

  • BREAKING: require boost>=1.86.0
  • Layer update now calls check_mass_balance upon every update.

Testing

  1. Tests added for BMI_C formulation and BMI_Multi formulation
  2. Standalone tests of the protocol via the protocols container using the CPP test model through a direct adapter

Notes

  • THIS IS AN OPTIONAL PROTOCOL. If a BMI model doesn't implement the protocol, it is detected upon initialization and a message provided, but mass balance checks will not be attempted.

Todos

  • Some consideration of unit conversion could be added, right now a sanity check exists which causes an integration error
    to occur if the mass balance units aren't all the same, but no conversion is attempted.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • MacOS

@hellkite500
Copy link
Contributor Author

Force pushed to fix failing bmi multi test (was missing a test config file). MacOS tests seem to be queued indefinitely, but I'll note that I ran tests locally on macos.

@aaraney
Copy link
Member

aaraney commented Sep 19, 2025

Just adding to the above description. I'm probably off a little here, so please correct me where i'm wrong :).

For a bmi module to comply with the mass balance protocol a module must expose the following "variables" to ngen.

# "name" -- type
"ngen::mass_in" -- double
"ngen::mass_out" -- double
"ngen::mass_stored" -- double
"ngen::mass_leaked" -- double

These variables must be available to ngen over the following bmi interfaces. It is not required for these "variables" to be exposed over get_output_var_names.

get_value
get_value_ptr
get_var_units
get_var_type

ngen will query the module for the following names via get_value then get_var_units and use the output return value (BMI_SUCCESS / BMI_FAILURE) to determine if the module implements the protocol.

aaraney
aaraney previously approved these changes Oct 24, 2025
@hellkite500 hellkite500 force-pushed the bmi-mass-balance branch 5 times, most recently from 0dc6ed3 to ddc4c2b Compare October 24, 2025 19:50
aaraney
aaraney previously approved these changes Oct 24, 2025
-DNGEN_WITH_SQLITE:BOOL=${{ inputs.use_sqlite }} \
-DNGEN_WITH_MPI:BOOL=${{ inputs.use_mpi }} -S .
-DNGEN_WITH_MPI:BOOL=${{ inputs.use_mpi }} \
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 -S .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here to make pybind's cmake happy. See pybind/pybind11#5593

- name: Run Unit Tests
run: |
. .venv/bin/activate
export ASAN_OPTIONS=${ASAN_OPTIONS}:detect_odr_violation=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 08d8379 for explanation.

@hellkite500
Copy link
Contributor Author

hellkite500 commented Oct 24, 2025

The last couple commits here have all been around trying to ensure the address sanitizer on macos using Clang 17 is behaving as expected by

  1. Trying to ensure all libs/code under test are instrumented (2be7792)

  2. Ignoring the potential python overflow well outside the ngen realm of responsibility (c3c1a80)

Despite these efforts, I cannot get a clean bill of health for the a few of the macos tests cases, and they seem to be indeterminate.

test_bmi_c
test_bmi_cpp
and
test_bmi_multi

all fail, but they don't always fail, with our without the previous changes.

One final thing we may want to try is to use a dynamic linked asan instead of static link (default for clang) which some indications say may help avoid the (possibly) false positives coming form the linked/dlopened libs.

Since we don't see the gcc sanitizer on ubuntu runners complaining, I'm inclined to stop pushing to this PR trying to fix up all the CI and move that its own issue/PR and let this merge.

@aaraney Thanks for the reviews, sorry about invaliding them so many times. One more for good measure?

@hellkite500
Copy link
Contributor Author

hellkite500 commented Oct 27, 2025

Canceled the testing and validation workflow after the ubuntu tests ran as the macos tests will never run and will evenutally timeout anyways. #913 looks to address the mac runners.

Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for working through this, @hellkite500!

@hellkite500 hellkite500 merged commit c7ab4c8 into NOAA-OWP:master Oct 27, 2025
10 of 21 checks passed
auto NgenMassBalance::initialize(const ModelPtr& model, const Properties& properties) -> expected<void, ProtocolError>
{
//Ensure the model is capable of mass balance using the protocol
check_support(model).or_else( error_or_warning );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this, I think we should check model support only after reading the configuration. Otherwise, this will raise an exception for models that don't support the mass balance checker and aren't configured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #916

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants