Skip to content

Conversation

@visheshtanksale
Copy link
Collaborator

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@visheshtanksale
Copy link
Collaborator Author

/ok to test a64260a

SchedulerName string `json:"schedulerName,omitempty"`
Metrics Metrics `json:"metrics,omitempty"`
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Minimum=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-kserve NIMs, is it expected to be able to deploy it with 0 replicas?

Copy link
Collaborator

@shivamerla shivamerla Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the issue linked looks like the user want to spin it down to 0 to optimize GPU usage as we don't have scale-to-zero support yet. @visheshtanksale if we want to support this then we need to avoid trying to set model status as well and set the status as NotReady or introduce a new state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both for standalone and kserve inferencePlatform, when the NIMService is scaled down to zero the status is set to NotReady. I think this state is good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I remember now. We had already added support to avoid querying the model details if the deployment is scaled down. This change makes sense with that context.

}
}

_ = deploymentMode // Use the variable to avoid unused warning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used anyway, we can remove this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

params.ScaleTarget = ptr.To(target)
}
} else {
params.MinReplicas = ptr.To[int32](0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For serverless/knative mode, if auto-scaling is enabled, shouldn't we be setting predictor.minReplicas to either scale.minReplicas or the annotation value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For serverless/knative mode just passing the min max annotation under the spec.predictor.annotations is sufficient.

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
SchedulerName string `json:"schedulerName,omitempty"`
Metrics Metrics `json:"metrics,omitempty"`
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Minimum=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I remember now. We had already added support to avoid querying the model details if the deployment is scaled down. This change makes sense with that context.

if n.IsAutoScalingEnabled() {
return n.Spec.Scale.HPA.MinReplicas
}
return n.Spec.Replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected behavior when replicas is nil here? For deployments, if replicas is nil, it defaults to 1.

GROUP:      apps
KIND:       Deployment
VERSION:    v1

FIELD: replicas <integer>


DESCRIPTION:
    Number of desired pods. This is a pointer to distinguish between explicit
    zero and not specified. Defaults to 1.

I'm assuming we want to keep this behavior here as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replica on NIMservice can stay nil the corresponding deployment with will default to 1.

Comment on lines 1093 to 1095
if !n.IsAutoScalingEnabled() {
params.Replicas = n.GetReplicas()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
params.Replicas = n.GetReplicas()

Its probably good practice to set this to HPA minReplicas if its enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants