[iris] Ignore unknown inherited constraint fields#4168
Conversation
|
Claude finished @dlwh's task in 1m 39s —— View job PR Review: [iris] Ignore unknown inherited constraint fields
The Fix (commit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39d1c86213
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| outer_learning_rate=config.outer_learning_rate, | ||
| outer_optimizer=config.outer_optimizer, | ||
| outer_momentum=config.outer_momentum, | ||
| model_sync_mix=config.model_sync_mix, |
There was a problem hiding this comment.
Remove unsupported DiLoCo config argument
Passing model_sync_mix into DiLoCoSyncConfig(...) will raise TypeError at runtime because that dataclass does not define a model_sync_mix field. Any run_elastic_budget_compare execution that reaches the elastic branch will fail before launching training, so the new benchmark path is currently unusable.
Useful? React with 👍 / 👎.
| config.train_config.trainer, | ||
| id=worker_run_id, | ||
| elastic=elastic, | ||
| tracker=_tracker_for_elastic_worker(config.train_config.trainer.tracker, logical_run_id, worker_id), | ||
| ) |
There was a problem hiding this comment.
Isolate per-worker checkpoint paths in elastic workers
This worker assignment rewrites trainer.id per worker but reuses the same checkpointer settings for every worker. With the default launch flow (impute_run_id_from_output_path=True), workers keep a shared checkpoint base path, so separate jobs can write the same checkpoint directories/steps concurrently; that creates nondeterministic overwrites and can corrupt resume state for multi-worker elastic runs.
Useful? React with 👍 / 👎.
Ignore unknown fields when reconstructing inherited constraints in iris.cluster.client.get_job_info(), so inherited IRIS_JOB_CONSTRAINTS entries with newer schema fields like mode do not crash startup. Add a regression test for the environment parsing path.
Fixes #4167