Feature: skip_best_epochs parameter for transfer learning best-model selection

### Search before asking

- [x] I have searched the RF-DETR [issues](https://github.com/roboflow/rf-detr/issues) and found no similar feature requests.


### Description

When using `pretrain_weights` for transfer learning, epoch 0 evaluates the pretrained model                                                                                                                                
  before any training occurs. `BestMetricHolder` and `EarlyStoppingCallback` both treat this                                                                                                                                 
  evaluation as a valid candidate for "best model".                                                                                                                                                                          
                                                                                                                                                                                                                           
  If the target dataset is harder or smaller than the pretraining dataset, the pretrained mAP                                                                                                                                
  at epoch 0 may be higher than any subsequent trained epoch — causing the saved                                                                                                                                             
  `checkpoint_best_total.pth` to always be the untrained pretrained weights, and early stopping                                                                                                                              
  to trigger prematurely.                                                                                                                                                                                                    
                                                                                                                                                                                                                             
  **Proposed solution:** Add a `skip_best_epochs` (or `warmup_best_epochs`) parameter that                                                                                                                                   
  excludes the first N epochs from best-model tracking and early stopping evaluation.                                                                                                                                        
  For example, `skip_best_epochs=3` would:                                                                                                                                                                                 
                                                                                                                                                                                                                             
  - Prevent `BestMetricSingle.update()` from recording a new best during epochs 0–2                                                                                                                                        
  - Prevent `EarlyStoppingCallback` from counting patience during epochs 0–2                                                                                                                                                 
  - Reset baselines at epoch `skip_best_epochs` so the first eligible epoch becomes the                                                                                                                                    
    initial reference point                                                                                                                                                                                                  
                                                                                                                                                                                                                             
  Note: the existing `warmup_epochs` parameter only affects the LR schedule (linear warmup                                                                                                                                   
  before cosine annealing) and does not influence best-model selection or early stopping.

### Use case

Industrial transfer learning: we fine-tune RF-DETR (pretrained on COCO) on a domain-specific                                                                                                                               
  dataset with fewer classes and different image characteristics. The pretrained model achieves                                                                                                                              
  ~0.84 mAP on epoch 0 (before any training), but training on the new domain initially drops                                                                                                                                 
  mAP before recovering. Without skipping early epochs, the "best" checkpoint is always the                                                                                                                                  
  untrained pretrained model, and early stopping halts training before the model can adapt.                                                                                                                                  
                                                                                                                                                                                                                             
  This affects anyone using `pretrain_weights` for transfer learning on datasets where the                                                                                                                                   
  pretrained model's initial evaluation score is artificially high relative to the training                                                                                                                                  
  trajectory. A `skip_best_epochs` parameter would let users control when best-model tracking                                                                                                                                
  begins, ensuring the saved checkpoint reflects actual training progress.

### Additional

 **Proposed solution:**                                                                                                                                                                                                     
                                                                                                                                                                                                                             
  Add a `skip_best_epochs` (or `warmup_best_epochs`) integer parameter to `RFDETRConfig`                                                                                                                                     
  (default `0` for backward compatibility). During training:                                                                                                                                                               
                                                                                                                                                                                                                             
  1. `BestMetricSingle.update(new_res, epoch)` returns `False` for `epoch < skip_best_epochs`                                                                                                                                
     — no checkpoint is saved as "best" during warmup                                                                                                                                                                        
  2. `EarlyStoppingCallback.update(epoch)` skips patience counting for `epoch < skip_best_epochs`                                                                                                                            
  3. At `epoch == skip_best_epochs`, `best_res` is reset to `init_res` (0.0 or -1.0) so the                                                                                                                                  
     first eligible epoch establishes a fair baseline from actual training, not from pretrained                                                                                                                              
     evaluation                                                                                                                                                                                                              
                                                                                                                                                                                                                             
  This would be a small change in `BestMetricSingle.update()` (util/utils.py) and                                                                                                                                            
  `EarlyStoppingCallback.update()` (util/early_stopping.py), plus the config parameter.                                                                                                                                    
                                                                                                                                                                                                                             
  **Note:** The existing `warmup_epochs` parameter only affects the LR schedule (linear warmup                                                                                                                             
  before cosine annealing) and does not influence best-model selection or early stopping.                                                                                                                                    
  `freeze_encoder` also does not address this — the issue is in metric tracking, not in                                                                                                                                      
  weight updates.

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: skip_best_epochs parameter for transfer learning best-model selection #789

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: skip_best_epochs parameter for transfer learning best-model selection #789

Description

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions