Skip to content

Conversation

@Xiaoming-AMD
Copy link
Collaborator

@Xiaoming-AMD Xiaoming-AMD commented Oct 24, 2025

Summary

This PR adds support for dynamically overriding TorchTitan model parameters from the CLI
(e.g., --model.n_layers=4) during Primus training.

Key Changes

  • Added _split_known_unknown() in parser to separate known and unknown overrides.
  • Forward unknown overrides (like model.*) to TorchTitan trainer as extra_args.
  • Added patch_titan_train_spec() to modify TorchTitan's model configuration dynamically.
    • Supports nested form {"model": {"n_layers": 4}}
    • Enforces strict "model." prefix validation
    • Raises clear errors for missing or invalid fields
  • Megatron trainer explicitly rejects unregistered overrides.

Example Usage

EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml bash examples/run_pretrain.sh  --model.n_layers=4

Motivation

Previously, TorchTitan's model configuration was static and not exposed externally.
This patch allows Primus users to quickly experiment with model structure or scale
without editing Titan source configs.

Copy link
Contributor

@wenxie-amd wenxie-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xiaoming-AMD Xiaoming-AMD merged commit 83748bb into main Oct 24, 2025
3 checks passed
@Xiaoming-AMD Xiaoming-AMD deleted the feature/torchtitan/add-model-patch-hook branch October 27, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants