Skip to content

Commit 321454a

Browse files
gitttt-1234claude
andauthored
Fix plateau detection to use absolute threshold mode (#2469)
Changed plateau detection from relative to absolute threshold mode to match PyTorch's ReduceLROnPlateau behavior. This fixes inconsistencies in early stopping and learning rate scheduling. Changes: - Updated monitor.py plateau check from relative (< best * (1 - delta)) to absolute (< best - delta) mode - Changed training_editor_form.yaml default min_delta from 1e-6 to 1e-8 - Updated all training profile configs to use threshold_mode='abs' with threshold=1e-6 - Reduced early stopping patience from 20 to 10 epochs - Updated code comments to clarify absolute threshold usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>
1 parent ed0635c commit 321454a

11 files changed

+23
-24
lines changed

sleap/config/training_editor_form.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -487,7 +487,7 @@ optimization:
487487
label: Stop Training on Plateau
488488
name: trainer_config.early_stopping.stop_training_on_plateau
489489
type: bool
490-
- default: 1e-6
490+
- default: 1e-8
491491
help: Minimum absolute decrease in the loss in order to consider an epoch as not in a plateau.
492492
label: Plateau Min. Delta
493493
name: trainer_config.early_stopping.min_delta

sleap/gui/widgets/monitor.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -670,7 +670,7 @@ def reset(
670670
corresponds to.
671671
plateau_patience: Number of epochs to wait in plateau before stopping.
672672
plateau_min_delta: Minimum change in validation loss to be considered
673-
significant.
673+
significant (absolute threshold).
674674
"""
675675
self.canvas = LossPlot(
676676
width=5,
@@ -935,11 +935,10 @@ def _check_messages(
935935
self.best_epoch_loss = self.last_epoch_val_loss
936936

937937
if self.plateau_min_delta is not None:
938-
# plateau check according to `rel` thrsh mode in torch.
938+
# Plateau check using absolute threshold mode in torch
939939
is_better = (
940940
self.last_epoch_val_loss
941-
< self.best_epoch_loss
942-
* (1.0 - self.plateau_min_delta)
941+
< self.best_epoch_loss - self.plateau_min_delta
943942
)
944943
else:
945944
is_better = (

sleap/training_profiles/baseline.centroid.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,15 +126,15 @@ trainer_config:
126126
lr_scheduler:
127127
step_lr: null
128128
reduce_lr_on_plateau:
129-
threshold: 1.0e-08
130-
threshold_mode: rel
129+
threshold: 1.0e-06
130+
threshold_mode: abs
131131
cooldown: 3
132132
patience: 5
133133
factor: 0.5
134134
min_lr: 1.0e-08
135135
early_stopping:
136136
min_delta: 1.0e-08
137-
patience: 20
137+
patience: 10
138138
stop_training_on_plateau: true
139139
online_hard_keypoint_mining:
140140
online_mining: false

sleap/training_profiles/baseline.multi_class_bottomup.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ trainer_config:
122122
step_lr: null
123123
reduce_lr_on_plateau:
124124
threshold: 1.0e-06
125-
threshold_mode: rel
125+
threshold_mode: abs
126126
cooldown: 3
127127
patience: 5
128128
factor: 0.5

sleap/training_profiles/baseline.multi_class_topdown.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ trainer_config:
124124
step_lr: null
125125
reduce_lr_on_plateau:
126126
threshold: 1.0e-06
127-
threshold_mode: rel
127+
threshold_mode: abs
128128
cooldown: 3
129129
patience: 5
130130
factor: 0.5

sleap/training_profiles/baseline_large_rf.bottomup.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,10 @@ trainer_config:
132132
lr_scheduler:
133133
step_lr: null
134134
reduce_lr_on_plateau:
135-
threshold: 1.0e-08
136-
threshold_mode: rel
135+
threshold: 1.0e-06
136+
threshold_mode: abs
137137
cooldown: 3
138-
patience: 8
138+
patience: 5
139139
factor: 0.5
140140
min_lr: 1.0e-08
141141
early_stopping:

sleap/training_profiles/baseline_large_rf.single.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,8 @@ trainer_config:
126126
lr_scheduler:
127127
step_lr: null
128128
reduce_lr_on_plateau:
129-
threshold: 1.0e-05
130-
threshold_mode: rel
129+
threshold: 1.0e-06
130+
threshold_mode: abs
131131
cooldown: 3
132132
patience: 5
133133
factor: 0.5

sleap/training_profiles/baseline_large_rf.topdown.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,8 @@ trainer_config:
128128
lr_scheduler:
129129
step_lr: null
130130
reduce_lr_on_plateau:
131-
threshold: 1.0e-08
132-
threshold_mode: rel
131+
threshold: 1.0e-06
132+
threshold_mode: abs
133133
cooldown: 3
134134
patience: 5
135135
factor: 0.5

sleap/training_profiles/baseline_medium_rf.bottomup.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,10 @@ trainer_config:
132132
lr_scheduler:
133133
step_lr: null
134134
reduce_lr_on_plateau:
135-
threshold: 1.0e-08
136-
threshold_mode: rel
135+
threshold: 1.0e-06
136+
threshold_mode: abs
137137
cooldown: 3
138-
patience: 8
138+
patience: 5
139139
factor: 0.5
140140
min_lr: 1.0e-08
141141
early_stopping:

sleap/training_profiles/baseline_medium_rf.single.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,8 @@ trainer_config:
126126
lr_scheduler:
127127
step_lr: null
128128
reduce_lr_on_plateau:
129-
threshold: 1.0e-08
130-
threshold_mode: rel
129+
threshold: 1.0e-06
130+
threshold_mode: abs
131131
cooldown: 3
132132
patience: 5
133133
factor: 0.5

0 commit comments

Comments
 (0)