Skip to content

refactor: leverage MCore TransformerConfig.__post_init__ validation for config fields set in _apply_performance_config #2291

@terrykong

Description

@terrykong

Context

When _apply_performance_config() (and similar _apply_* functions in nemo_rl/models/megatron/setup.py) set fields on model_cfg, they do so via plain attribute assignment after the TransformerConfig dataclass has already been constructed and __post_init__ has run. This means MCore's built-in validation (e.g., checking recompute_modules against the allowed set) is bypassed for user-supplied values.

Call chain

  1. MegatronPolicyWorker.__init__validate_and_set_config()
  2. validate_and_set_config()setup_model_config()
  3. setup_model_config() calls ConfigContainer.from_yaml()__post_init__ runs here
  4. _apply_performance_config() mutates model_cfg via attribute assignment — __post_init__ does not re-run

Problem

A typo in a user config (e.g., recompute_modules: ["mopee"]) is silently accepted. MCore's runtime checks match module names via string comparison, so an unrecognized name is a silent no-op — the user thinks they're saving memory but nothing is actually recomputed.

Proposal

Consider refactoring the config initialization flow so that MCore's __post_init__ validation runs after all _apply_* mutations are complete. Options include:

  • Constructing the TransformerConfig from a merged dict (user overrides applied first, then construct once)
  • Re-running __post_init__() after all mutations (need to verify safety — it also sets defaults)
  • Extracting MCore's validation into a callable utility and invoking it post-mutation

This would eliminate the need to duplicate validation logic in NeMo-RL and ensure any future MCore validation additions are automatically picked up.

Related

Surfaced during review of PR #2280 (selective activation checkpointing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions