[`limit_n_basins` mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size. by amitmarkel · Pull Request #246 · google-research/flood-forecasting

amitmarkel · 2026-02-18T14:16:26Z

train:
Using e.g. limit_n_basins: 100 in the config results in materializing data into memory of only up to 100 basins during training. Data is freed once an epoc is complete.

NOTE: This results in shorter epocs, but users adjust save_weights_every and max_updates_per_epoch accordingly to their selection.

validation:
During validation, support is partial at this time, to simplify this PR to focus on training, since that is the main memory bottleneck and validation isn't usually done over e.g. 46 yrs of data. When enabled (i.e. specified or positive), validation data is loaded only during validation phases, and only up to validate_n_random_basins if specified.

test/infer:
These modes load all data at this time.

limit_n_basins allows to train on datasets stretching over e.g. 16k basins over 46yrs using ~6GB or so of memory while training.

NOTE: Memory spikes of multimet's init and compute()s are handled in separate.

A complementary mode is lazy_load which lowers memory usage even further (e.g. 2-3GB) however at significant expense to runtime.

…tion on practically any dataset size. `train`: Using e.g. `limit_n_basins: 100` in the config results in materializing data into memory of only up to 100 basins during training. Data is freed once an epoc is complete. `validation`: During validation, support is partial at this time, to simplify this PR to focus on training, since that is the main memory bottleneck and validation isn't usually done over e.g. 46 yrs of data. When not disabled (unspecified or zero), validation data is loaded only during validation phases, and only up to `validate_n_random_basins` if specified. `test`/`infer`: These modes load all data at this time. --- `limit_n_basins` allows to train on datasets stretching over e.g. 16k basins over 46yrs using ~6GB of memory while training. NOTE: Memory spikes of multimet::__init__ and compute()s are handled in separate. --- A complementary mode is `lazy_load` which lowers memory usage even further at significant expense to runtime though.

exclude basins logic needs to run when dataset's dataset object is available (after `load_basins` is called).

omrishefi · 2026-02-22T08:04:22Z

googlehydrology/datasetzoo/multimet.py

+        with contextlib.suppress(AttributeError):
+            del self._num_samples
+        with contextlib.suppress(AttributeError):
+            del self._per_basin_target_stds


Can you merge all lines to use a single "with" clause?

Want it to be safe, otherwise a line failing means dels stop at that line.

I considered a loop on value names and __dict__.pop(name, None) but something like that is harder to read and doesn't link symbols and limits search. This way, self.<field name> is findable.

omrishefi · 2026-02-22T08:05:00Z

googlehydrology/datasetzoo/multimet.py

+        LOGGER.debug('# forecast dataset init complete (%s)', self._period)
+
+        self._dataset_all = self._dataset
+        del self._dataset


Why del here?

Want self._dataset not be defined for safety and clarity below. Another option is to rename all the way up dataset to dataset_all or assess if can just keep without self. until the end, but that would pollute this PR with refactor, could add a todo.

omrishefi · 2026-02-22T08:07:34Z

googlehydrology/datasetzoo/multimet.py

+        LOGGER.debug(
+            'Dataset size: %f MB (%s)',
+            self._dataset.nbytes / 1024**2,
+            self._period,


Why printing self._period in all debug lines?

Helped see it faster as a standalone line (contextless). Can remove though.

omrishefi

Let's review the changes together

omrishefi · 2026-02-24T12:45:41Z

googlehydrology/datasetzoo/multimet.py

+            del self._per_basin_target_stds
+
+    def load_basins(self, basins: list[str] | None = None) -> None:
+        self._data_cache: dict[str, xr.DataArray] = {}


Move to unload_basins

omrishefi · 2026-02-24T12:46:49Z

googlehydrology/datasetzoo/multimet.py

+        with contextlib.suppress(AttributeError):
+            del self._per_basin_target_stds
+
+    def load_basins(self, basins: list[str] | None = None) -> None:


Add a comment

omrishefi · 2026-02-24T12:56:33Z

googlehydrology/evaluation/tester.py

-        self.basins = [e for e in self.basins if e not in exclude_basins]
+        if cfg.limit_n_basins < 1:
+            self.dataset.load_basins()
+            self.basins = self._calc_and_apply_excluded_basins(self.basins)


It seems like a change of logic. Let's not pass self.basins to the function.

omrishefi · 2026-02-24T13:08:50Z

googlehydrology/training/basetrainer.py

                f'Could not resolve the following module parts for finetuning: {unresolved_modules}'
            )

+    def init_loader(


Please split into smaller PRs

omrishefi

Please see comments

amitmarkel requested a review from omrishefi February 18, 2026 14:16

amitmarkel changed the title ~~[limit_n_basins mode] [opt mem] add mode to allow training and validation on practically any dataset size.~~ [limit_n_basins mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size. Feb 18, 2026

amitmarkel changed the title ~~[limit_n_basins mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size.~~ [limit_n_basins mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size. Feb 18, 2026

omrishefi previously approved these changes Feb 18, 2026

View reviewed changes

amitmarkel added 2 commits February 18, 2026 16:46

Merge branch 'main' into amarkel-limit_n_basins__train

e253195

Support tester_skip_obs_all_nan.

5219c6c

exclude basins logic needs to run when dataset's dataset object is available (after `load_basins` is called).

amitmarkel dismissed omrishefi’s stale review via 5219c6c February 18, 2026 15:22

amitmarkel requested a review from omrishefi February 18, 2026 15:22

omrishefi reviewed Feb 22, 2026

View reviewed changes

amitmarkel requested a review from omrishefi February 24, 2026 09:21

omrishefi reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`limit_n_basins` mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size.#246

[`limit_n_basins` mode] [opt mem] [opt runtime] add mode to allow training and validation on practically any dataset size.#246
amitmarkel wants to merge 3 commits intomainfrom
amarkel-limit_n_basins__train

amitmarkel commented Feb 18, 2026 •

edited

Loading

Uh oh!

omrishefi Feb 22, 2026

Uh oh!

amitmarkel Feb 24, 2026 •

edited

Loading

Uh oh!

omrishefi Feb 22, 2026

Uh oh!

amitmarkel Feb 24, 2026 •

edited

Loading

Uh oh!

omrishefi Feb 22, 2026

Uh oh!

amitmarkel Feb 24, 2026

Uh oh!

omrishefi left a comment

Uh oh!

omrishefi Feb 24, 2026

Uh oh!

omrishefi Feb 24, 2026

Uh oh!

omrishefi Feb 24, 2026

Uh oh!

omrishefi Feb 24, 2026

Uh oh!

omrishefi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amitmarkel commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amitmarkel Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amitmarkel Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omrishefi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omrishefi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amitmarkel commented Feb 18, 2026 •

edited

Loading

amitmarkel Feb 24, 2026 •

edited

Loading

amitmarkel Feb 24, 2026 •

edited

Loading