Skip to content

Support basic multi-node NIMs with LeaderWorkerSet#528

Merged
shivamerla merged 5 commits intoNVIDIA:mainfrom
shengnuo:lws
Jun 30, 2025
Merged

Support basic multi-node NIMs with LeaderWorkerSet#528
shivamerla merged 5 commits intoNVIDIA:mainfrom
shengnuo:lws

Conversation

@shengnuo
Copy link
Copy Markdown
Collaborator

@shengnuo shengnuo commented Jun 10, 2025

This PR adds an early support of multi-node NIMs using LeaderWorkerSet.

In NIMService.spec, if .multiNode is specified and not nil, a LeaderWorkerSet will be created for the NIMService.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shengnuo shengnuo force-pushed the lws branch 5 times, most recently from a707333 to fa43dfb Compare June 16, 2025 19:58
@shengnuo shengnuo changed the title DRAFT: Support basic multi-node NIMs with LeaderWorkerSet Support basic multi-node NIMs with LeaderWorkerSet Jun 16, 2025
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread api/apps/v1alpha1/nimservice_types.go
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread api/apps/v1alpha1/nimservice_types.go
Comment thread internal/controller/nimservice_controller.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
@shengnuo shengnuo force-pushed the lws branch 8 times, most recently from 1d036ba to b0cfe4f Compare June 18, 2025 03:33
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
@shengnuo shengnuo force-pushed the lws branch 5 times, most recently from 4dce616 to 601009f Compare June 25, 2025 17:18
Comment thread config/samples/nim/llm/nimservice.yaml Outdated
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
Copy link
Copy Markdown
Collaborator

@varunrsekar varunrsekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Just some more api comments...

Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread api/apps/v1alpha1/nimservice_types.go
Comment thread api/apps/v1alpha1/nimservice_types.go Outdated
Comment thread internal/controller/platform/standalone/nimservice.go Outdated
@shengnuo shengnuo force-pushed the lws branch 11 times, most recently from d977bbb to 0356f36 Compare June 30, 2025 16:25
Signed-off-by: Sheng Lin <shelin@nvidia.com>
shivamerla
shivamerla previously approved these changes Jun 30, 2025
@shivamerla shivamerla requested a review from varunrsekar June 30, 2025 18:14
varunrsekar
varunrsekar previously approved these changes Jun 30, 2025
Copy link
Copy Markdown
Collaborator

@varunrsekar varunrsekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else lgtm. Please fix the pending comment before merge in a follow-up

Comment thread internal/controller/platform/standalone/nimservice.go
shengnuo added 4 commits June 30, 2025 15:13
Signed-off-by: Sheng Lin <shelin@nvidia.com>
Signed-off-by: Sheng Lin <shelin@nvidia.com>
Signed-off-by: Sheng Lin <shelin@nvidia.com>
Signed-off-by: Sheng Lin <shelin@nvidia.com>
@shivamerla shivamerla merged commit b3631f3 into NVIDIA:main Jun 30, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants