Component
webhook
Problem Statement
When a GPU sharing strategy is configured incorrectly, the validation error message is not very helpful. For example, when MPS strategy is set but the MPSSupport feature gate is not enabled, the error is:
unknown GPU sharing strategy: MPS
This message doesn't tell you why it failed or what to do about it.
I personally spent time checking if mps (lowercase) should be used, looking for whitespace issues in my config, checking the MPS control daemon, before finally realizing the feature gate wasn't enabled 😄
Similarly, for an unknown time-slice interval, the error doesn't tell you what valid values are:
unknown time-slice interval: InvalidInterval
Proposed Solution
When a known strategy is used but its feature gate is not enabled, the error should say so clearly:
"MPS" is selected as the GPU sharing strategy, but the "MPSSupport" feature gate is not enabled
When an unknown strategy or interval is used, the error should list the supported values:
unknown GPU sharing strategy: foo, supported GPU sharing strategies: TimeSlicing, MPS
or
unknown time-slice interval: InvalidInterval, supported time-slice intervals: Default, Short, Medium, Long
Alternatives Considered
No response
Scope
Small: CLI flag, config option, minor behavior change
Upstream Kubernetes Dependencies
No response
Additional Context
No response
Component
webhook
Problem Statement
When a GPU sharing strategy is configured incorrectly, the validation error message is not very helpful. For example, when MPS strategy is set but the MPSSupport feature gate is not enabled, the error is:
unknown GPU sharing strategy: MPSThis message doesn't tell you why it failed or what to do about it.
I personally spent time checking if mps (lowercase) should be used, looking for whitespace issues in my config, checking the MPS control daemon, before finally realizing the feature gate wasn't enabled 😄
Similarly, for an unknown time-slice interval, the error doesn't tell you what valid values are:
unknown time-slice interval: InvalidIntervalProposed Solution
When a known strategy is used but its feature gate is not enabled, the error should say so clearly:
"MPS" is selected as the GPU sharing strategy, but the "MPSSupport" feature gate is not enabledWhen an unknown strategy or interval is used, the error should list the supported values:
unknown GPU sharing strategy: foo, supported GPU sharing strategies: TimeSlicing, MPSor
unknown time-slice interval: InvalidInterval, supported time-slice intervals: Default, Short, Medium, LongAlternatives Considered
No response
Scope
Small: CLI flag, config option, minor behavior change
Upstream Kubernetes Dependencies
No response
Additional Context
No response