- 
                Notifications
    
You must be signed in to change notification settings  - Fork 136
 
WIP: Megatron backend support in critic models #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add design document for Megatron backend support in critic models, including McaValueModel class implementation, model provider integration, and minimal 3-phase implementation plan.
…m GPT checkpoints
| 
           
  | 
    
3dfebb3    to
    44422e1      
    Compare
  
    | 
               | 
          ||
| logger = get_logger(__name__) | ||
| 
               | 
          ||
| class McaValueModel(McaGPTModel): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some model classes don’t instantiate McaGPTModel directly; they subclass it (e.g., Qwen2VLModel). If we introduce an McaValueModel to provide a value head, we’d have to create a matching ValueModel variant for every new model class. Would it be better to add a value_head option in McaModelConfig and build the value-head capability into the McaGPTModel base class, or to add a post-init hook that runs after model init?
b85945c    to
    52104d4      
    Compare
  
    - Add use_value_head config field to McaModelConfig - Create ValueHeadWrapper class with weight property - Replace output_layer with value head when use_value_head=True - Set share_embeddings_and_output_weights=False for value models - Filter value_head weights from missing_keys during checkpoint loading - Initialize value head weights to 0.01 for testing parity - Enable value head for CriticWorker in MegatronStrategy - Remove unused McaValueModel class - Update design documentation - Add test pipeline and configs for critic comparison Co-Authored-By: Claude <[email protected]>
52104d4    to
    dc5ec44      
    Compare
  
    
Summary
approach
Description
This PR introduces the design and planning phase for adding Megatron
backend support to critic models, enabling them to leverage Megatron's
advanced parallelism features (tensor, pipeline, context, and expert
parallel) alongside the existing DeepSpeed backend.
What's included:
docs/critic_megatron_backend_design_final.md):Complete technical design with:
(
examples/docs_examples/example_ppo_megatron_critic.yaml): PPO config fortesting Megatron critic
What's coming next:
Status
🚧 Work in Progress - This PR currently contains only the design
documentation and configuration. Implementation will follow based on the
outlined plan.
cc: @PanAndy it would be great if you can review the design doc before implementation begins, thanks