WIP: Megatron backend support in critic models #169

taoluo · 2025-09-02T15:59:32Z

Summary

Adding Megatron backend support for critic models (WIP)
Created comprehensive design document outlining the implementation
approach
Established configuration file for testing the new backend

Description

This PR introduces the design and planning phase for adding Megatron
backend support to critic models, enabling them to leverage Megatron's
advanced parallelism features (tensor, pipeline, context, and expert
parallel) alongside the existing DeepSpeed backend.

What's included:

Design document (docs/critic_megatron_backend_design_final.md):
Complete technical design with:
- McaValueModel class implementation details
- Model provider integration strategy
- Data flow comparison between backends
- Minimal 3-phase implementation plan
Test configuration
(examples/docs_examples/example_ppo_megatron_critic.yaml): PPO config for
testing Megatron critic

What's coming next:

Implement McaValueModel class in mcore_adapter
Update default_value_model_provider to support Megatron
Add integration tests with CriticWorker
Validate distributed training features

Status

🚧 Work in Progress - This PR currently contains only the design
documentation and configuration. Implementation will follow based on the
outlined plan.

cc: @PanAndy it would be great if you can review the design doc before implementation begins, thanks

Add design document for Megatron backend support in critic models, including McaValueModel class implementation, model provider integration, and minimal 3-phase implementation plan.

…m GPT checkpoints

…ally

CLAassistant · 2025-09-04T21:16:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ taoluo
❌ kemurukagami
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

… critic value comparison and logging

liu-zichen · 2025-09-03T03:06:59Z

docs_roll/docs/English/DesignImplementation/critic_megatron_backend_design.md

+
+logger = get_logger(__name__)
+
+class McaValueModel(McaGPTModel):


Some model classes don’t instantiate McaGPTModel directly; they subclass it (e.g., Qwen2VLModel). If we introduce an McaValueModel to provide a value head, we’d have to create a matching ValueModel variant for every new model class. Would it be better to add a value_head option in McaModelConfig and build the value-head capability into the McaGPTModel base class, or to add a post-init hook that runs after model init?

- Add use_value_head config field to McaModelConfig - Create ValueHeadWrapper class with weight property - Replace output_layer with value head when use_value_head=True - Set share_embeddings_and_output_weights=False for value models - Filter value_head weights from missing_keys during checkpoint loading - Initialize value head weights to 0.01 for testing parity - Enable value head for CriticWorker in MegatronStrategy - Remove unused McaValueModel class - Update design documentation - Add test pipeline and configs for critic comparison Co-Authored-By: Claude <[email protected]>

taoluo and others added 4 commits September 2, 2025 11:50

docs: add Megatron critic backend design with implementation plan

26903f7

Add design document for Megatron backend support in critic models, including McaValueModel class implementation, model provider integration, and minimal 3-phase implementation plan.

feat: add initial implementation according to design doc

d3218fe

fix: handle missing value_head weights when loading McaValueModel fro…

d5b1952

…m GPT checkpoints

fix: GPTModel.forward() expects callable output_layer, got None initi…

09d6a90

…ally

kemurukagami and others added 4 commits September 6, 2025 22:40

fix: typo

71c0ab0

debug: megatron critic

b561433

fix: update reward, value, and target KL clipping parameters; enhance…

cff7e3e

… critic value comparison and logging

test: limit critic training to 5 steps for comparison purposes

44422e1

taoluo force-pushed the crtic_megatron branch from 3dfebb3 to 44422e1 Compare September 7, 2025 23:57

liu-zichen reviewed Sep 8, 2025

View reviewed changes

kemurukagami force-pushed the crtic_megatron branch from b85945c to 52104d4 Compare September 8, 2025 16:09

kemurukagami force-pushed the crtic_megatron branch from 52104d4 to dc5ec44 Compare September 8, 2025 16:13

chore: cleanup refactor test cases

752e325

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Megatron backend support in critic models #169

WIP: Megatron backend support in critic models #169

Uh oh!

taoluo commented Sep 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Sep 4, 2025 •

edited

Loading

Uh oh!

liu-zichen Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		logger = get_logger(__name__)

		class McaValueModel(McaGPTModel):

WIP: Megatron backend support in critic models #169

Are you sure you want to change the base?

WIP: Megatron backend support in critic models #169

Uh oh!

Conversation

taoluo commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

What's included:

What's coming next:

Status

Uh oh!

CLAassistant commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liu-zichen Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

taoluo commented Sep 2, 2025 •

edited

Loading

CLAassistant commented Sep 4, 2025 •

edited

Loading