Skip to content

Allow custom scheduler to be specified in NIMService and NIMPipeline#489

Merged
shengnuo merged 3 commits intoNVIDIA:mainfrom
shengnuo:nim-custom-scheduler
May 16, 2025
Merged

Allow custom scheduler to be specified in NIMService and NIMPipeline#489
shengnuo merged 3 commits intoNVIDIA:mainfrom
shengnuo:nim-custom-scheduler

Conversation

@shengnuo
Copy link
Copy Markdown
Collaborator

@shengnuo shengnuo commented May 12, 2025

This PR adds allows NIMs to be scheduled with a custom scheduler.

  • custom scheduler can be specified in .spec.services[].spec.schedulerName for NIMPipeline; and in .spec.schedulerName for NIMServices.
  • Default to default-scheduler if the .schedulerName is unspecified

Scheduled with Volcano

$ kubectl get nimpipeline llama3-1b-pipeline
NAME                 STATUS   AGE
llama3-1b-pipeline   Ready    31m
$ kubectl get nimpipeline llama3-1b-pipeline -o json | jq '.spec.services[].spec.schedulerName'
{
  "type": "volcano"
}
$ kubectl get pods meta-llama3
-1b-instruct-74cc4c5c9b-dtkdn -o json | jq '.spec.schedulerName'
"volcano"

Describing the created NIM pod, we can see that the pod was scheduled with Volcano

Events:
  Type     Reason     Age                 From     Message
  ----     ------     ----                ----     -------
  Normal   Scheduled  32m                 volcano  Successfully assigned nemo/meta-llama3-1b-instruct-74cc4c5c9b-dtkdn to nim-operator-9vg0z43

Scheduled with default-scheduler

If the scheduler is unspecified, NIM operator will choose the default-scheduler as default.

$ kubectl get nimservice meta-llama3-8b-instruct -o json | jq '.spec.scheduler'
null

Describing the NIM pod, we can see that the pod was scheduled with default-scheduler

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned nemo/meta-llama3-8b-instruct-6d95cd589b-qp96s to nim-operator-9vg0z43

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shengnuo shengnuo force-pushed the nim-custom-scheduler branch 5 times, most recently from ba431f6 to 9156d33 Compare May 12, 2025 19:33
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Copy link
Copy Markdown
Collaborator

@visheshtanksale visheshtanksale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value around having some global config that will set the scheduler for all the pods created by NIM Operator?
Having scheduler on the CRDs is necessary. But it is not a value thats going to change across multiple objects created using NIM Operator, so having a global config makes it easy to use

@shengnuo shengnuo force-pushed the nim-custom-scheduler branch 3 times, most recently from 2cf6f6e to ff6291f Compare May 16, 2025 15:10
shengnuo added 2 commits May 16, 2025 11:32
Signed-off-by: Sheng Lin <shelin@nvidia.com>
@shengnuo shengnuo force-pushed the nim-custom-scheduler branch from ff6291f to 827617b Compare May 16, 2025 15:32
Copy link
Copy Markdown
Collaborator

@shivamerla shivamerla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shengnuo shengnuo merged commit 927e686 into NVIDIA:main May 16, 2025
9 checks passed
varunrsekar pushed a commit to varunrsekar/k8s-nim-operator that referenced this pull request May 20, 2025
…ler (NVIDIA#489)

Signed-off-by: Sheng Lin <shelin@nvidia.com>

Update manifests

Signed-off-by: Sheng Lin <shelin@nvidia.com>
varunrsekar pushed a commit that referenced this pull request May 20, 2025
…ler (#489)

Signed-off-by: Sheng Lin <shelin@nvidia.com>

Update manifests

Signed-off-by: Sheng Lin <shelin@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants