Bug description
With EMAWeightAveraging and a delayed start (e.g. update_starting_at_step: 1000), validation still swaps pl_module and _average_model on every val epoch (e.g. val every 100 steps), even while self._average_model.n_averaged == 0. The _average_model is only a deepcopy(pl_module) from setup() and has never been updated. Retrieval metrics before update_starting_at_step near 0.
Example for EMA init:
use_buffers: false
decay: 0.999
update_every_n_steps: 1
update_starting_at_step: 1000
update_starting_at_epoch: -1
Suggested fix: in WeightAveraging on_validation_epoch_start / on_validation_epoch_end, only call _swap_models when self._average_model.n_averaged > 0.
What version are you seeing the problem on?
v2.6
Reproduced in studio
No response
How to reproduce the bug
Error messages and logs
No response
Environment
No response
More info
No response
cc @ethanwharris
Bug description
With
EMAWeightAveragingand a delayed start (e.g. update_starting_at_step: 1000), validation still swapspl_moduleand_average_modelon every val epoch (e.g. val every 100 steps), even whileself._average_model.n_averaged == 0. The_average_modelis only a deepcopy(pl_module) from setup() and has never been updated. Retrieval metrics beforeupdate_starting_at_stepnear 0.Example for EMA init:
Suggested fix: in
WeightAveragingon_validation_epoch_start / on_validation_epoch_end, only call_swap_modelswhenself._average_model.n_averaged > 0.What version are you seeing the problem on?
v2.6
Reproduced in studio
No response
How to reproduce the bug
Error messages and logs
No response
Environment
No response
More info
No response
cc @ethanwharris