feat(runtimes): Add XGBoost runtime(KEP-2598)#3200
feat(runtimes): Add XGBoost runtime(KEP-2598)#3200Krishna-kg732 wants to merge 3 commits intokubeflow:masterfrom
Conversation
|
@Krishna-kg732: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
Adds an initial XGBoost runtime plugin scaffold to the Trainer V2 runtime framework (per KEP-2598), along with the API wiring and constants needed to support a future Rabit env var injection implementation.
Changes:
- Introduces an
xgboostruntime plugin scaffold implementingEnforceMLPolicyPlugin(stubbed behavior for now). - Extends the TrainingRuntime API (
MLPolicySource) with anxgboostpolicy source and updates the “only one policy” validation rule. - Adds XGBoost/Rabit-related env var constants and registers the plugin in the runtime plugin registry (and updates PlainML fallback guard).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/runtime/framework/plugins/xgboost/xgboost.go | New XGBoost plugin scaffold (EnforceMLPolicy stub + plugin name/factory). |
| pkg/runtime/framework/plugins/registry.go | Registers the XGBoost plugin in the plugin factory registry. |
| pkg/runtime/framework/plugins/plainml/plainml.go | Ensures PlainML no-ops when XGBoost (and JAX) ML policy sources are configured. |
| pkg/constants/constants.go | Adds Rabit/XGBoost env var constants + reserved env name set. |
| pkg/apis/trainer/v1alpha1/trainingruntime_types.go | Adds XGBoostMLPolicySource + MLPolicySource.XGBoost, and updates ML policy exclusivity validation. |
Signed-off-by: Krishna-kg732 <2405732@kiit.ac.in>
729c8be to
49c768a
Compare
a57f8a6 to
985eaf4
Compare
Signed-off-by: Krishna-kg732 <2405732@kiit.ac.in>
985eaf4 to
e5c552e
Compare
Pull Request Test Coverage Report for Build 22064651978Details
💛 - Coveralls |
1d98811 to
7ec359f
Compare
Signed-off-by: Krishna-kg732 <2405732@kiit.ac.in>
7ec359f to
38e1f5a
Compare
What this PR does
Implements the XGBoost runtime plugin for Kubeflow Trainer V2, as proposed in KEP-2598. This plugin enables distributed XGBoost training using Rabit/Collective coordination by automatically injecting DMLC environment variables into trainer containers.
Changes
New Files
pkg/runtime/framework/plugins/xgboost/xgboost.go— Plugin implementingEnforceMLPolicyPluginandCustomValidationPlugin. InjectsDMLC_TRACKER_URI,DMLC_TRACKER_PORT,DMLC_TASK_ID,DMLC_NUM_WORKERenv vars and auto-derivesnumWorkersPerNodefrom GPU resources (1 worker per GPU, or 1 per node for CPU).pkg/runtime/framework/plugins/xgboost/xgboost_test.go— Unit tests coveringEnforceMLPolicy(nil guards, single/multi-node CPU, GPU resources, numNodes override) andValidate(reserved DMLC_* env name rejection).Modified Files
pkg/apis/trainer/v1alpha1/trainingruntime_types.go— AddedXGBoostMLPolicySourcestruct,XGBoostfield toMLPolicySource, and updated CEL mutual exclusion validation rule.pkg/constants/constants.go— Added XGBoost/Rabit constants andXGBoostReservedEnvNamesset.pkg/runtime/framework/plugins/registry.go— Registered the XGBoost plugin.pkg/runtime/framework/plugins/plainml/plainml.go— Added XGBoost to the PlainML fallback guard.pkg/runtime/framework/core/framework_test.go— UpdatedTestNewto include XGBoost in expected plugin lists.pkg/util/testing/wrapper.go— AddedXGBoostPolicy()test helper.How was this tested?
go test ./pkg/runtime/framework/plugins/xgboost/...✅ (9 test cases)go test ./pkg/runtime/framework/core/ -run TestNew✅go test ./pkg/runtime/framework/plugins/...✅ (all plugins pass)TODO (follow-up PRs)
/kind feature
/area runtime