Skip to content

Commit b3e7678

Browse files
tardis-keymengchengTangShangwei-Li
authored
[megatron] feat: support discrete profiling for mindspeed (#4271)
### What does this PR do? 1. add role in annotate call for megatron_workers.py 2. update doc #4206 > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: mengchengTang <[email protected]> Co-authored-by: Shangwei-Li <[email protected]>
1 parent 71a6eb6 commit b3e7678

File tree

4 files changed

+15
-15
lines changed

4 files changed

+15
-15
lines changed

docs/ascend_tutorial/ascend_profiling_en.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
Data collection based on FSDP backend on Ascend devices(en)
1+
Performance data collection based on FSDP or MindSpeed(Megatron) on Ascend devices(en)
22
==========================================================================================
33

44
Last updated: 08/14/2025.
55

66
This is a tutorial for data collection using the GRPO or DAPO algorithm
7-
based on FSDP on Ascend devices.
7+
based on FSDP or MindSpeed(Megatron) on Ascend devices.
88

99
Configuration
1010
-------------

docs/ascend_tutorial/ascend_profiling_zh.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
Data collection based on FSDP backend on Ascend devices(zh)
1+
Performance data collection based on FSDP or MindSpeed(Megatron) on Ascend devices(zh)
22
====================================
33

4-
在昇腾设备上基于FSDP后端进行数据采集
4+
在昇腾设备上基于FSDP或MindSpeed(Megatron)后端进行性能数据采集
55

66
Last updated: 08/14/2025.
77

8-
这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程
8+
这是一份在昇腾设备上基于FSDP或MindSpeed(Megatron)后端,使用GRPO或DAPO算法进行数据采集的教程
99

1010
配置
1111
----

verl/workers/fsdp_workers.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1486,7 +1486,7 @@ def init_model(self):
14861486
)
14871487

14881488
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="critic"))
1489-
@DistProfiler.annotate(color="cyan")
1489+
@DistProfiler.annotate(color="cyan", role="compute_values")
14901490
def compute_values(self, data: DataProto):
14911491
if self._is_offload_param:
14921492
load_fsdp_model_to_gpu(self.critic_module)
@@ -1506,7 +1506,7 @@ def compute_values(self, data: DataProto):
15061506
return output
15071507

15081508
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="critic"))
1509-
@DistProfiler.annotate(color="pink")
1509+
@DistProfiler.annotate(color="pink", role="critic_update")
15101510
def update_critic(self, data: DataProto):
15111511
if self._is_offload_param:
15121512
load_fsdp_model_to_gpu(self.critic_module)
@@ -1874,7 +1874,7 @@ def _switch_chat_template(self, data: DataProto):
18741874
return DataProto.from_dict(rm_inputs)
18751875

18761876
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="reward"))
1877-
@DistProfiler.annotate(color="brown")
1877+
@DistProfiler.annotate(color="brown", role="compute_rm_score")
18781878
def compute_rm_score(self, data: DataProto):
18791879
import itertools
18801880

verl/workers/megatron_workers.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -724,7 +724,7 @@ async def trainer_mode(self):
724724

725725
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
726726
@GPUMemoryLogger(role="update_actor", logger=logger)
727-
@DistProfiler.annotate(color="red")
727+
@DistProfiler.annotate(color="red", role="actor_update")
728728
def update_actor(self, data: DataProto):
729729
assert self._is_actor
730730
if self._is_offload_param:
@@ -767,7 +767,7 @@ def update_actor(self, data: DataProto):
767767

768768
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="rollout"))
769769
@GPUMemoryLogger(role="generate_sequences", logger=logger)
770-
@DistProfiler.annotate(color="red")
770+
@DistProfiler.annotate(color="red", role="rollout_generate")
771771
def generate_sequences(self, prompts: DataProto):
772772
assert self._is_rollout
773773
prompts = prompts.to(get_device_name())
@@ -817,7 +817,7 @@ def generate_sequences(self, prompts: DataProto):
817817

818818
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
819819
@GPUMemoryLogger(role="compute_ref_log_prob", logger=logger)
820-
@DistProfiler.annotate(color="olive")
820+
@DistProfiler.annotate(color="olive", role="ref_compute_log_prob")
821821
def compute_ref_log_prob(self, data: DataProto):
822822
assert self._is_ref
823823
if self._ref_is_offload_param:
@@ -839,7 +839,7 @@ def compute_ref_log_prob(self, data: DataProto):
839839

840840
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
841841
@GPUMemoryLogger(role="compute_log_prob", logger=logger)
842-
@DistProfiler.annotate(color="blue")
842+
@DistProfiler.annotate(color="blue", role="actor_compute_log_prob")
843843
def compute_log_prob(self, data: DataProto):
844844
assert self._is_actor
845845
if self._is_offload_param:
@@ -1207,7 +1207,7 @@ def init_model(self):
12071207
)
12081208

12091209
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="critic"))
1210-
@DistProfiler.annotate(color="cyan")
1210+
@DistProfiler.annotate(color="cyan", role="compute_values")
12111211
def compute_values(self, data: DataProto):
12121212
micro_batch_size = self.config.ppo_micro_batch_size_per_gpu
12131213
data.meta_info["micro_batch_size"] = micro_batch_size
@@ -1224,7 +1224,7 @@ def compute_values(self, data: DataProto):
12241224
return output
12251225

12261226
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="critic"))
1227-
@DistProfiler.annotate(color="pink")
1227+
@DistProfiler.annotate(color="pink", role="critic_update")
12281228
def update_critic(self, data: DataProto):
12291229
data = data.to(get_device_id())
12301230

@@ -1448,7 +1448,7 @@ def init_model(self):
14481448
# TODO: reward model use itself tokenizer instead of sft tokenizer
14491449
# the input_ids, responses, attention_mask and position_ids may be different!
14501450
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="reward"))
1451-
@DistProfiler.annotate(color="brown")
1451+
@DistProfiler.annotate(color="brown", role="compute_rm_score")
14521452
def compute_rm_score(self, data: DataProto):
14531453
data.meta_info["micro_batch_size"] = self.config.micro_batch_size_per_gpu
14541454
data.meta_info["max_token_len"] = self.config.forward_max_token_len_per_gpu

0 commit comments

Comments
 (0)