Skip to content

Callback support for finetune recipe #805

@anubhutivyas

Description

@anubhutivyas

Is your feature request related to a problem? Please describe.

We are integrating Customizer to use Automodel now for finetuning and need to support metrics support.

Previously, when we used NeMo for training, this was straightforward because NeMo is built on PyTorch Lightning, which has native callback support. We simply added a NeMoCustomizerCallback to report training progress to our API.

But with Automodel, my understanding is that it doesn't use PyTorch Lightning, so I can't just hook our callback. The simplest approach I can find is to subclass [TrainFinetuneRecipeForNextTokenPrediction](https://github.com/NVIDIA-NeMo/Automodel/blob/main/nemo_automodel/recipes/llm/train_ft.py#L853) to override setup(), log_train_metrics(), log_val_metrics() methods to call our callback, but it doesn't seem to be the perfect solution.

Describe the solution you'd like

Could you add a callback mechanism similar to PyTorch Lightning's callbacks? Ideally with hooks for:

  • on_train_start (after setup)
  • on_train_batch_end (after each optimizer step)
  • on_validation_end (after validation)
  • on_save_checkpoint (when checkpoint is saved)
  • on_exception (on training failure)

This would help us maintain cleaner integration with Customizer

Describe alternatives you've considered

subclass [TrainFinetuneRecipeForNextTokenPrediction](https://github.com/NVIDIA-NeMo/Automodel/blob/main/nemo_automodel/recipes/llm/train_ft.py#L853) to override setup(), log_train_metrics(), log_val_metrics() methods to call our callback

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions