Updating the replicas for kserve #738

visheshtanksale · 2026-01-22T01:08:55Z

Fixing Add support for scaling down NIMServices to 0 replicas #710
Not adding replicas on InferenceService for Knative/serverless deployment mode

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

copy-pr-bot · 2026-01-22T01:08:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

visheshtanksale · 2026-01-22T01:09:49Z

/ok to test a64260a

varunrsekar · 2026-01-22T02:42:03Z

api/apps/v1alpha1/nimservice_types.go

 	SchedulerName  string        `json:"schedulerName,omitempty"`
 	Metrics        Metrics       `json:"metrics,omitempty"`
-	// +kubebuilder:validation:Minimum=1
+	// +kubebuilder:validation:Minimum=0


For non-kserve NIMs, is it expected to be able to deploy it with 0 replicas?

Based on the issue linked looks like the user want to spin it down to 0 to optimize GPU usage as we don't have scale-to-zero support yet. @visheshtanksale if we want to support this then we need to avoid trying to set model status as well and set the status as NotReady or introduce a new state.

Both for standalone and kserve inferencePlatform, when the NIMService is scaled down to zero the status is set to NotReady. I think this state is good enough.

Okay I remember now. We had already added support to avoid querying the model details if the deployment is scaled down. This change makes sense with that context.

shivamerla · 2026-01-22T04:49:42Z

api/apps/v1alpha1/nimservice_types_test.go

+				}
+			}
+
+			_ = deploymentMode // Use the variable to avoid unused warning


this is not used anyway, we can remove this.

shivamerla · 2026-01-22T04:56:19Z

api/apps/v1alpha1/nimservice_types.go

+				params.ScaleTarget = ptr.To(target)
+			}
+		} else {
+			params.MinReplicas = ptr.To[int32](0)


For serverless/knative mode, if auto-scaling is enabled, shouldn't we be setting predictor.minReplicas to either scale.minReplicas or the annotation value?

For serverless/knative mode just passing the min max annotation under the spec.predictor.annotations is sufficient.

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

varunrsekar · 2026-01-23T08:50:23Z

api/apps/v1alpha1/nimservice_types.go

 	SchedulerName  string        `json:"schedulerName,omitempty"`
 	Metrics        Metrics       `json:"metrics,omitempty"`
-	// +kubebuilder:validation:Minimum=1
+	// +kubebuilder:validation:Minimum=0


Okay I remember now. We had already added support to avoid querying the model details if the deployment is scaled down. This change makes sense with that context.

varunrsekar · 2026-01-23T08:56:14Z

api/apps/v1alpha1/nimservice_types.go

 	if n.IsAutoScalingEnabled() {
 		return n.Spec.Scale.HPA.MinReplicas
 	}
 	return n.Spec.Replicas


What's the expected behavior when replicas is nil here? For deployments, if replicas is nil, it defaults to 1.

GROUP: apps KIND: Deployment VERSION: v1 FIELD: replicas <integer> DESCRIPTION: Number of desired pods. This is a pointer to distinguish between explicit zero and not specified. Defaults to 1.

I'm assuming we want to keep this behavior here as well

The replica on NIMservice can stay nil the corresponding deployment with will default to 1.

varunrsekar · 2026-01-23T09:01:00Z

api/apps/v1alpha1/nimservice_types.go

 	if !n.IsAutoScalingEnabled() {
 		params.Replicas = n.GetReplicas()
 	}


Suggested change

params.Replicas = n.GetReplicas()

Its probably good practice to set this to HPA minReplicas if its enabled.

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

Updating the replicas for kserve

a64260a

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

visheshtanksale requested review from ArangoGutierrez, shengnuo, shivamerla and varunrsekar as code owners January 22, 2026 01:08

varunrsekar reviewed Jan 22, 2026

View reviewed changes

shivamerla reviewed Jan 22, 2026

View reviewed changes

Removing unused variable in test

630cfdf

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

varunrsekar reviewed Jan 23, 2026

View reviewed changes

Addressing review comments

d67363e

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>

Updating the replicas for kserve #738

Are you sure you want to change the base?

Updating the replicas for kserve #738

Uh oh!

Conversation

visheshtanksale commented Jan 22, 2026

Uh oh!

copy-pr-bot bot commented Jan 22, 2026

Uh oh!

visheshtanksale commented Jan 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shivamerla Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shivamerla Jan 22, 2026 •

edited

Loading