Skip to content

How to convert checkpoint from fsdp_dtensor to torch_dist or huggingface in megatron-fsdp mode? #2805

@zhujian19891203

Description

@zhujian19891203

Thanks to the Meagtron-LM project for providing the parallel mode of megatron-fsdp, we have achieved excellent results on internal small GBS SFT tasks. For example, on a 200B model similar to DeepSeek V2 (GBS=128, 32K context or larger), the performance of the m-fsdp mode is about twice that of the n-D parallel mode (due to avoiding the high bubble rate caused by PP parallelism in small GBS).

As we know, m-fsdp saves ckpts in fsdp_dtensor format, but our subsequent evaluation tasks use torch_dist or HF format ckpts. I know there is a tool in tools/checkpoint/checkpoint_inspector.py that can convert ckpts from torch_dist format to fsdp_dtensor format, but what about the reverse? Or is there a tool that can convert to HF format?

I have searched the Meagtron-LM project, the Megatron-Bridge project, and the internet, but have not found such a method or tool. Can anyone help me or offer some suggestions?

Releated issues or PRs:

  1. [Dev] docs(megatron-fsdp): add Megatron-FSDP user guide #2397
  2. [QUESTION] Does custom_fsdp model support finetuned from a non-fsdp checkpoint #1578
  3. Support Megatron FSDP / fsdp_dtensor checkpoints for exporting to HF NVIDIA-NeMo/Megatron-Bridge#1211

By the way, could someone please review the two PRs I submitted?

  1. [dev][checkpoint] Add checkpoint heterogeneity conversion feature #2613
  2. [dev][fsdp_dtensor] Improve --convert-torch-dist-to-fsdp-dtensor option robustness in checkpoint_inspector.py file  #2612

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions