-
Notifications
You must be signed in to change notification settings - Fork 37
Updating the replicas for kserve #738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -123,7 +123,7 @@ type NIMServiceSpec struct { | |
| Scale Autoscaling `json:"scale,omitempty"` | ||
| SchedulerName string `json:"schedulerName,omitempty"` | ||
| Metrics Metrics `json:"metrics,omitempty"` | ||
| // +kubebuilder:validation:Minimum=1 | ||
| // +kubebuilder:validation:Minimum=0 | ||
| Replicas *int32 `json:"replicas,omitempty"` | ||
| UserID *int64 `json:"userID,omitempty"` | ||
| GroupID *int64 `json:"groupID,omitempty"` | ||
|
|
@@ -1638,26 +1638,32 @@ func (n *NIMService) GetInferenceServiceParams( | |
| delete(params.PodAnnotations, utils.NvidiaAnnotationParentSpecHashKey) | ||
|
|
||
| // Set template spec | ||
| if !n.IsAutoScalingEnabled() || !utils.IsKServeStandardDeploymentMode(deploymentMode) { | ||
| params.MinReplicas = n.GetReplicas() | ||
| } else { | ||
| params.Annotations[kserveconstants.AutoscalerClass] = string(kserveconstants.AutoscalerClassHPA) | ||
| if utils.IsKServeStandardDeploymentMode(deploymentMode) { | ||
| if n.IsAutoScalingEnabled() { | ||
| params.Annotations[kserveconstants.AutoscalerClass] = string(kserveconstants.AutoscalerClassHPA) | ||
|
|
||
| minReplicas, maxReplicas, metric, metricType, target := n.GetInferenceServiceHPAParams() | ||
| if minReplicas != nil { | ||
| params.MinReplicas = minReplicas | ||
| } | ||
| if maxReplicas > 0 { | ||
| params.MaxReplicas = ptr.To[int32](maxReplicas) | ||
| } | ||
| if metric != "" { | ||
| params.ScaleMetric = metric | ||
| } | ||
| if metricType != "" { | ||
| params.ScaleMetricType = metricType | ||
| } | ||
| if target > 0 { | ||
| params.ScaleTarget = ptr.To(target) | ||
| minReplicas, maxReplicas, metric, metricType, target := n.GetInferenceServiceHPAParams() | ||
| if minReplicas != nil { | ||
| params.MinReplicas = minReplicas | ||
| } | ||
| if maxReplicas > 0 { | ||
| params.MaxReplicas = ptr.To[int32](maxReplicas) | ||
| } | ||
| if metric != "" { | ||
| params.ScaleMetric = metric | ||
| } | ||
| if metricType != "" { | ||
| params.ScaleMetricType = metricType | ||
| } | ||
| if target > 0 { | ||
| params.ScaleTarget = ptr.To(target) | ||
| } | ||
| } else { | ||
| params.MinReplicas = ptr.To[int32](0) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For serverless/knative mode, if auto-scaling is enabled, shouldn't we be setting
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For serverless/knative mode just passing the min max annotation under the |
||
| params.MaxReplicas = ptr.To[int32](0) | ||
| params.ScaleMetric = "" | ||
| params.ScaleMetricType = "" | ||
| params.ScaleTarget = nil | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-kserve NIMs, is it expected to be able to deploy it with 0 replicas?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the issue linked looks like the user want to spin it down to
0to optimize GPU usage as we don't have scale-to-zero support yet. @visheshtanksale if we want to support this then we need to avoid trying to set model status as well and set the status asNotReadyor introduce a new state.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both for standalone and kserve inferencePlatform, when the NIMService is scaled down to zero the status is set to
NotReady. I think this state is good enough.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I remember now. We had already added support to avoid querying the model details if the deployment is scaled down. This change makes sense with that context.