Correctly track train / validation losses #485

LarsKue · 2025-05-22T20:32:19Z

This PR:

Fixes BasicWorkflow: Unstable val_losses due to validation data shuffling #481
Supercedes Aggregate logs in evaluate #483
Fixes validation loss tracking in test_step
Removes loss/..._loss and other nested metrics
Temporarily removes the ability to track custom metrics

Future TODO:

We have to reenable tracking custom metrics by keeping a keras.metrics.Mean tracker object on the approximator for each custom metric the user passes. We also need to call update_state on each of those trackers in the compute_metrics or train/test_step method.

@vpratz Do you think you have capacity to take care of the TODO?

Copilot

Pull Request Overview

This PR refines loss tracking by simplifying metric computation and ensuring that the validation loss is correctly tracked.

Updated continuous approximator to return only the computed loss.
Modified torch and tensorflow approximators to update the loss tracker during the validation step.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
bayesflow/approximators/continuous_approximator.py	Simplified compute_metrics by removing nested metrics.
bayesflow/approximators/backend_approximators/torch_approximator.py	Updated test_step to update the loss tracker.
bayesflow/approximators/backend_approximators/tensorflow_approximator.py	Updated test_step to update the loss tracker.

bayesflow/approximators/continuous_approximator.py

codecov · 2025-05-22T20:43:30Z

Codecov Report

Attention: Patch coverage is 67.27273% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...mators/backend_approximators/numpy_approximator.py	0.00%	13 Missing ⚠️
...ximators/backend_approximators/jax_approximator.py	83.33%	1 Missing ⚠️
...s/backend_approximators/tensorflow_approximator.py	92.85%	1 Missing ⚠️
...mators/backend_approximators/torch_approximator.py	92.85%	1 Missing ⚠️
bayesflow/approximators/continuous_approximator.py	80.00%	1 Missing ⚠️
...low/approximators/model_comparison_approximator.py	50.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/approximators/approximator.py	`59.09% <100.00%> (-0.91%)`	⬇️
...ximators/backend_approximators/jax_approximator.py	`95.55% <83.33%> (-2.12%)`	⬇️
...s/backend_approximators/tensorflow_approximator.py	`93.33% <92.85%> (-1.12%)`	⬇️
...mators/backend_approximators/torch_approximator.py	`94.11% <92.85%> (-1.34%)`	⬇️
bayesflow/approximators/continuous_approximator.py	`77.24% <80.00%> (-0.07%)`	⬇️
...low/approximators/model_comparison_approximator.py	`34.84% <50.00%> (+0.23%)`	⬆️
...mators/backend_approximators/numpy_approximator.py	`0.00% <0.00%> (ø)`

... and 24 files with indirect coverage changes

vpratz · 2025-05-23T12:39:15Z

Do I read it correctly that this PR does not modify the shuffling and therefore not fix this aspect of #481?

Regarding the TODO: I'm not familiar with this part of the code base yet, so I cannot really judge how complicated it is and when I will find the time, but I can try to a closer look at it in the coming weeks.

stefanradev93 · 2025-05-24T00:19:51Z

Shall we still merge this PR, as it provides a pretty important fix and open another one pertaining to general metrics and aggregation.

vpratz · 2025-05-24T14:19:25Z

I'm still a bit confused by the description. Could you quickly re-explain which problem this PR addresses? I'm currently unable to judge whether the fix is worth a temporary regression until we get to the TODO.

stefanradev93 · 2025-05-24T14:23:07Z

It fixes the incorrect calculation and display of validation losses only on the last validation batch. Additionally, it removes the duplicate printing of loss, inference_net/loss, val_loss / val_inference_net/loss.

vpratz · 2025-05-24T14:28:03Z

Ahh, thanks a lot! I will do a proper review in the next days, then. Or if you have already thoroughly tested the changes, you might also merge already and open an issue regarding the tracking of custom metrics.

vpratz · 2025-05-27T09:20:18Z

I now understand better what is going on, and the changes look good to me. I will try to add the trackers for the other metrics today, and let you know if I encounter any difficulties...

vpratz · 2025-05-27T11:36:02Z

Quick question @LarsKue @stefanradev93 : The individual losses were removed in the PR. I would add them as metrics in the case that more than one loss is present (not if there is only one, to avoid the useless duplicate), so that they are individually tracked. This will also display them, which I think is desirable. Do you agree, or would you rather not display that information?

vpratz · 2025-05-27T13:28:09Z

Short update: I have succeeded in tracking the metrics, but the serialization for custom inference_metrics and summary_metrics seems to be broken (at least in TensorFlow), and might need some restructuring to work nicely. It might make sense to move this to a separate PR though, I'm currently checking if the issues are separate or interdependent...

stefanradev93 · 2025-05-27T13:32:32Z

Quick question @LarsKue @stefanradev93 : The individual losses were removed in the PR. I would add them as metrics in the case that more than one loss is present (not if there is only one, to avoid the useless duplicate), so that they are individually tracked. This will also display them, which I think is desirable. Do you agree, or would you rather not display that information?

I think this makes sense!

vpratz · 2025-05-27T15:21:05Z

@LarsKue @stefanradev93 I think the changes are ready to review, the issues regarding custom metrics are (as far as I can tell) not related to the changes in this PR.

vpratz · 2025-05-29T09:58:04Z

I was able to resolve this properly. The reason for the order-dependency was taking the unweighted mean of means. So in the averaged metrics, average values obtained from batches with batch size had the same weight as values obtained from batch size 2.
By setting the sample_weight argument of Metric.update_state to the batch size, we can obtain the correct averages. This requires calculating the batch sizes from the data at hand, which is approximator-specific. Therefore I outsourced this calculation into a private method that each approximator has to overwrite. If you have different design-ideas for this, please let me know.

This fixes #481 and supersedes that aspect of #482.
@LarsKue @stefanradev93 Could you please take another look?

Note: tqdm will report wrong values for the training loss, as it does naive sampling, the value stored in history will be correct, though.

Code to reproduce

import bayesflow as bf
import keras

workflow = bf.BasicWorkflow(
    inference_network=bf.networks.CouplingFlow(subnet_kwargs={"dropout": 0.0}),
    inference_variables=["parameters"],
    inference_conditions=["observables"],
    simulator=bf.simulators.GaussianMixture(),
    initial_learning_rate=0.0,
    standardize=[]
)

training_data = workflow.simulate(66)
validation_data = workflow.simulate(66)
history = workflow.fit_offline(
    data=training_data,
    epochs=3,
    batch_size=32,
    validation_data=validation_data,
)
print(history.history)

stefanradev93

Hi Valentin and Lars, I buffed up the docs for the new changes. I also decided to repeat _batch_size_from_data in the backend-specific classes to avoid confusion, since the method is already explicitly needed in these classes. Other than that, I think this PR is ready to merge.

correctly track train / validation losses

c5a5d6b

LarsKue requested review from stefanradev93, vpratz and Copilot May 22, 2025 20:32

LarsKue self-assigned this May 22, 2025

LarsKue added the fix Pull request that fixes a bug label May 22, 2025

Copilot AI reviewed May 22, 2025

View reviewed changes

bayesflow/approximators/continuous_approximator.py Outdated Show resolved Hide resolved

LarsKue mentioned this pull request May 22, 2025

Aggregate logs in evaluate #483

Closed

remove mmd from two moons test

4278ea0

elseml mentioned this pull request May 23, 2025

Fix #481 #482

Merged

[no ci] Merge remote-tracking branch 'upstream/dev' into track-losses

87e3c92

reenable metrics in continuous approximator, add trackers

4e72a0a

readd custom metrics to two_moons test

777507e

vpratz removed their request for review May 27, 2025 15:20

take batch size into account when aggregating metrics

85d8149

stefanradev93 added 3 commits June 1, 2025 09:25

Add docs to backend approximator interfaces

9767f6a

Add small doc improvements

ef6a32a

Fix typehints to docs.

9c2059e

stefanradev93 reviewed Jun 1, 2025

View reviewed changes

stefanradev93 merged commit 996a700 into dev Jun 1, 2025
8 of 9 checks passed

stefanradev93 deleted the track-losses branch June 1, 2025 14:32

vpratz mentioned this pull request Jun 1, 2025

BasicWorkflow: Unstable val_losses due to validation data shuffling #481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correctly track train / validation losses #485

Correctly track train / validation losses #485

Uh oh!

LarsKue commented May 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading

Uh oh!

vpratz commented May 23, 2025

Uh oh!

stefanradev93 commented May 24, 2025

Uh oh!

vpratz commented May 24, 2025

Uh oh!

stefanradev93 commented May 24, 2025

Uh oh!

vpratz commented May 24, 2025 •

edited

Loading

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

stefanradev93 commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 29, 2025

Uh oh!

stefanradev93 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Correctly track train / validation losses #485

Correctly track train / validation losses #485

Uh oh!

Conversation

LarsKue commented May 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vpratz commented May 23, 2025

Uh oh!

stefanradev93 commented May 24, 2025

Uh oh!

vpratz commented May 24, 2025

Uh oh!

stefanradev93 commented May 24, 2025

Uh oh!

vpratz commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

stefanradev93 commented May 27, 2025

Uh oh!

vpratz commented May 27, 2025

Uh oh!

vpratz commented May 29, 2025

Uh oh!

stefanradev93 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading

vpratz commented May 24, 2025 •

edited

Loading

stefanradev93 left a comment •

edited

Loading