feat(torchtitan): enable dynamic model parameter override via CLI #254

Xiaoming-AMD · 2025-10-24T03:35:50Z

Summary

This PR adds support for dynamically overriding TorchTitan model parameters from the CLI
(e.g., --model.n_layers=4) during Primus training.

Key Changes

Added _split_known_unknown() in parser to separate known and unknown overrides.
Forward unknown overrides (like model.*) to TorchTitan trainer as extra_args.
Added patch_titan_train_spec() to modify TorchTitan's model configuration dynamically.
- Supports nested form {"model": {"n_layers": 4}}
- Enforces strict "model." prefix validation
- Raises clear errors for missing or invalid fields
Megatron trainer explicitly rejects unregistered overrides.

Example Usage

EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml bash examples/run_pretrain.sh  --model.n_layers=4

Motivation

Previously, TorchTitan's model configuration was static and not exposed externally.
This patch allows Primus users to quickly experiment with model structure or scale
without editing Titan source configs.

…del patch

…odel patch

wenxie-amd

LGTM

feat(cli+titan): support split known/unknown overrides and dynamic mo…

1295b78

…del patch

Xiaoming-AMD requested review from limou102 and wenxie-amd as code owners October 24, 2025 03:35

feat(cli+titan): support split known/unknown overrides and dynamic m…

4ad3bd0

…odel patch

wenxie-amd approved these changes Oct 24, 2025

View reviewed changes

Xiaoming-AMD merged commit 83748bb into main Oct 24, 2025
3 checks passed

Xiaoming-AMD deleted the feature/torchtitan/add-model-patch-hook branch October 27, 2025 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(torchtitan): enable dynamic model parameter override via CLI #254

feat(torchtitan): enable dynamic model parameter override via CLI #254

Uh oh!

Xiaoming-AMD commented Oct 24, 2025 •

edited

Loading

Uh oh!

wenxie-amd left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(torchtitan): enable dynamic model parameter override via CLI #254

feat(torchtitan): enable dynamic model parameter override via CLI #254

Uh oh!

Conversation

Xiaoming-AMD commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Example Usage

Motivation

Uh oh!

wenxie-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xiaoming-AMD commented Oct 24, 2025 •

edited

Loading