Skip to content

EMA validation swap models before first update_parameters #21724

@Martovark

Description

@Martovark

Bug description

With EMAWeightAveraging and a delayed start (e.g. update_starting_at_step: 1000), validation still swaps pl_module and _average_model on every val epoch (e.g. val every 100 steps), even while self._average_model.n_averaged == 0. The _average_model is only a deepcopy(pl_module) from setup() and has never been updated. Retrieval metrics before update_starting_at_step near 0.

Example for EMA init:

use_buffers: false
decay: 0.999
update_every_n_steps: 1
update_starting_at_step: 1000
update_starting_at_epoch: -1

Suggested fix: in WeightAveraging on_validation_epoch_start / on_validation_epoch_end, only call _swap_models when self._average_model.n_averaged > 0.

What version are you seeing the problem on?

v2.6

Reproduced in studio

No response

How to reproduce the bug

Error messages and logs

No response

Environment

No response

More info

No response

cc @ethanwharris

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.6.x

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions