Skip to content

Fix scheduler/DDP/wandb reliability regressions and add tests#1716

Open
erdoganxarda wants to merge 1 commit intojunyanz:masterfrom
erdoganxarda:fix/reliability-guards
Open

Fix scheduler/DDP/wandb reliability regressions and add tests#1716
erdoganxarda wants to merge 1 commit intojunyanz:masterfrom
erdoganxarda:fix/reliability-guards

Conversation

@erdoganxarda
Copy link

Summary

This PR fixes three reliability issues and adds regression tests.

Why

  • Invalid --lr_policy handling returned a NotImplementedError object instead of raising.
  • DDP norm validation logic/message was contradictory.
  • wandb import failed even when --use_wandb was not enabled.

What Changed

  • Raise NotImplementedError for unknown LR policies.
  • In DDP mode, explicitly reject --norm batch and provide a clear error message.
  • Make wandb import optional; only raise if --use_wandb is requested and wandb is missing.
  • Added regression tests for all three cases.

Validation

  • python -m compileall tests models util
  • Full pytest run was not executed in this shell (pytest/torch unavailable here).

Risk / Compatibility

  • No intended behavior change for valid configs.
  • DDP now fails fast with a clearer error for unsupported --norm batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant