Skip to content

Conversation

@Xiaoming-AMD
Copy link
Collaborator

@Xiaoming-AMD Xiaoming-AMD commented Oct 30, 2025

This PR rolls back the integrated TorchTitan backend to 99c0cb2(20250907), restoring compatibility with the current ROCm 7.0 stack and Primus-Turbo extension.
The update also refines TorchTitan trainer unit tests and attention modules to ensure stable end-to-end behavior under the new baseline.

Xiaoming-AMD and others added 15 commits October 30, 2025 04:16
Enhanced patch_titan_train_spec to support multi-level nested model overrides
(e.g., "model.moe_args.num_experts" or {"model": {"moe_args": {"num_experts": 16}}}).
Added recursive attribute assignment with dataclass/dict awareness and improved
error messages and logging for better traceability.
…AMD-AIG-AIMA/Primus into fix/torchtitan/patch-checkpoint-wrapper
…TorchTitan trainer

Re-enabled DeepSeek-V3 16B and 671B unit tests in .
Added explicit CLI overrides to disable PrimusTurbo ()
and grouped expert matmul () for consistent test behavior
across CI environments. Also updated other TorchTitan trainer tests to include PrimusTurbo flag
for clarity and reproducibility.
…on module

Condensed multi-line torch.split and apply_rotary_emb calls into single-line
expressions for improved readability and consistency with surrounding code.
No functional change.
…on module

Condensed multi-line torch.split and apply_rotary_emb calls into single-line
expressions for improved readability and consistency with surrounding code.
No functional change.
…on module

Condensed multi-line torch.split and apply_rotary_emb calls into single-line
expressions for improved readability and consistency with surrounding code.
No functional change.
@Xiaoming-AMD Xiaoming-AMD changed the title fix(titan): add checkpoint_wrapper patch and unit test to ignore unsupported kwargs (early_stop) refactor(torchtitan): rollback Titan to 99c0cb2(20250907) and stabilize trainer UTs Oct 31, 2025
@Xiaoming-AMD Xiaoming-AMD merged commit 1e2e1b1 into main Oct 31, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants