-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Is your feature request related to a problem? Please describe.
Adding dependency in MCore to call corresponding routines in NVRx for checkpointing.
We're in the middle of migrating the checkpointing at dist_checkpointing.
Tag the @mcore-oncall
to get oncall's attention to this issue.
Describe the solution you'd like
CI pipeline / any checkpointing routines having dependency on the migrated routines will run only when NVRx is installed. If not, it will run corresponding torch.distributed.checkpoint routines.
Async ckpt will be enabled only with NVRx installed.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.