feat: add framework-level validation for conflicting env vars across …#3237
feat: add framework-level validation for conflicting env vars across …#3237krishdef7 wants to merge 1 commit intokubeflow:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
1976685 to
9ce322c
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch) and between plugin-reserved envs and user-provided envs in a TrainJob.
Changes:
- Introduces
EnvVarsReserverPlugininterface for plugins to expose reserved environment variable names - Implements the interface on MPI and Torch plugins to return their respective reserved env var sets
- Adds cross-plugin env conflict detection in the framework's validation logic
- Adds user-env conflict validation to the MPI plugin (mirroring existing Torch pattern)
- Adds comprehensive test coverage for MPI env validation
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/runtime/framework/interface.go | Defines new EnvVarsReserverPlugin interface for exposing reserved env var names |
| pkg/runtime/framework/core/framework.go | Implements cross-plugin env conflict detection in RunCustomValidationPlugins |
| pkg/runtime/framework/plugins/mpi/mpi.go | Implements EnvVarsReserverPlugin and adds user-env validation |
| pkg/runtime/framework/plugins/torch/torch.go | Implements EnvVarsReserverPlugin interface |
| pkg/constants/constants.go | Defines MPIRunReservedEnvNames set with OpenMPI reserved env vars |
| pkg/runtime/framework/plugins/mpi/mpi_test.go | Adds tests for MPI reserved env validation |
|
Hi @jinchihe @kuizhiqing, Adding a quick concrete example to clarify the validation: If a user configures:
This currently passes validation but can lead to runtime issues. With this PR, validation fails early with a clear error indicating the conflicting env. Happy to adjust the validation scope or placement if needed. |
|
Hi @jinchihe @kuizhiqing, happy to make any changes based on your feedback. Let me know if the approach or placement needs adjustment. |
|
Addressed Copilot feedback and added concrete MPI/Torch conflict example above. Let me know if any changes are needed, otherwise happy for /lgtm. |
d211923 to
d62bb77
Compare
|
Rebased onto latest master and resolved merge conflicts. CI checks are passing. Let me know if any adjustments are needed. |
3887381 to
d232599
Compare
|
@jinchihe @kuizhiqing, Rebased to fix DCO sign-off and removed an accidentally included commit from another branch. The PR now contains only the env conflict validation changes across 6 files. DCO is green. Happy for review when you get a chance |
98165ed to
9fff59c
Compare
|
@andreyvelich @akshaychitneni, this PR implements the env conflict validation from the TODO you left in #2308. Cleaned up a dead NumProcPerNode validation block in this latest push. All CI passing, happy for review when you get a chance. |
…s plugins Add cross-plugin environment variable conflict detection to prevent runtime issues when plugins share reserved env names: - Add MPIRunReservedEnvNames constant for OpenMPI reserved env vars - Add EnvVarsReserverPlugin interface for plugins to expose reserved env names - Implement ReservedEnvVarNames() on MPI and Torch plugins - Add user-env conflict validation to MPI plugin (mirrors existing Torch pattern) - Add cross-plugin conflict detection in RunCustomValidationPlugins - Add unit tests for MPI env validation Closes kubeflow#3147 Signed-off-by: krishdef7 <gargkrish06@gmail.com>
9fff59c to
b10a47c
Compare
What this PR does / why we need it
Adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch), and between plugin-reserved envs and user-provided envs in a TrainJob.
Changes
MPIRunReservedEnvNamesconstant for OpenMPI reserved env varsEnvVarsReserverPlugininterface for plugins to expose reserved env var namesReservedEnvVarNames()on MPI and Torch pluginsRunCustomValidationPluginsWhich issue(s) this PR fixes
Closes #3147