docs: Add User Documentation for GPU Limiter and Scale-to-Zero Features#638
Open
ev-shindin wants to merge 7 commits intollm-d:mainfrom
Open
docs: Add User Documentation for GPU Limiter and Scale-to-Zero Features#638ev-shindin wants to merge 7 commits intollm-d:mainfrom
ev-shindin wants to merge 7 commits intollm-d:mainfrom
Conversation
…ero features Add comprehensive user documentation for three scaling features: - GPU Limiter (Experimental): Resource-aware scaling that constrains autoscaling decisions based on actual GPU availability. Documents the greedy-by-saturation allocation algorithm and per-accelerator type tracking. - Scale to Zero: Automatic scaling of idle model deployments to zero replicas after a configurable retention period. Includes namespace- scoped configuration for multi-environment deployments. - Scale from Zero (placeholder): Documents the complementary feature that automatically scales models back up when requests arrive. All documents follow consistent structure with overview, configuration, prerequisites, example scenarios, troubleshooting, and best practices.
Update documentation to reference the new GPU Limiter and Scale-to-Zero features consistently across all user-facing docs: - README.md: Add features to Key Features section and User Guide links - docs/README.md: Add links to GPU Limiter, Scale-to-Zero, Scale-from-Zero - Helm Chart README: Add configuration sections for both features with examples showing how to enable via Helm values and ConfigMaps - Helm values.yaml: Improve comments for limitedMode and scaleToZero options with references to documentation - config/samples: Add saturation-scaling-config.yaml sample with enableLimiter example
Collaborator
lionelvillard
left a comment
There was a problem hiding this comment.
LGTM. Added few comments.
| Result: Each model limited by its GPU type availability | ||
| ``` | ||
|
|
||
| ## Troubleshooting |
Collaborator
There was a problem hiding this comment.
We now have a troubleshooting guide. Can you move this section over there?
Collaborator
Author
There was a problem hiding this comment.
I moved this to another to saturation-scaling-config.md. Or it should be in another file?
- scale-to-zero: clarify retention_period is unrelated to HPA stabilizationWindowSeconds - scale-to-zero: remove cold start limitation (applies to scale-from-zero, not scale-to-zero) - scale-to-zero: remove broken links to removed scale-from-zero.md - gpu-limiter: move troubleshooting section to saturation-scaling-config.md - saturation-scaling-config: add GPU Limiter Issues troubleshooting subsection
|
|
||
| ### Scaling Pipeline | ||
|
|
||
| ``` |
Collaborator
There was a problem hiding this comment.
I would make this pipeline more generic. Basically there is a pipeline block before GPU limiter and after it. The block before computes a desired number of replicas and the saturation analyzer is just an example.
lionelvillard
previously approved these changes
Mar 10, 2026
Signed-off-by: Lionel Villard <villard@us.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive user-facing documentation for the GPU Limiter and Scale-to-Zero features, along with cross-references throughout existing documentation to ensure discoverability.
Changes
New Documentation Files
docs/user-guide/gpu-limiter.mddocs/user-guide/scale-to-zero.mddocs/user-guide/scale-from-zero.mdconfig/samples/saturation-scaling-config.yamlenableLimiter: trueUpdated Documentation
README.mddocs/README.mdcharts/workload-variant-autoscaler/README.mdcharts/workload-variant-autoscaler/values.yamllimitedModeandscaleToZerowith doc referencesDocumentation Content
GPU Limiter (Experimental)
The GPU Limiter documentation covers:
saturation-scaling-configConfigMap withenableLimiter: trueScale-to-Zero
The Scale-to-Zero documentation covers:
model-scale-to-zero-configConfigMapScale-from-Zero (Placeholder)
Basic placeholder documentation noting:
Helm Chart Updates
Added configuration sections to chart README:
Improved values.yaml comments:
Related Issues
Checklist