Skip to content

Comments

Prevent same physical device for pods in a podgroup with vGPU #5049

Open
goyalankit wants to merge 4 commits intovolcano-sh:masterfrom
goyalankit:prevent-same-device-id
Open

Prevent same physical device for pods in a podgroup with vGPU #5049
goyalankit wants to merge 4 commits intovolcano-sh:masterfrom
goyalankit:prevent-same-device-id

Conversation

@goyalankit
Copy link

What type of PR is this?

Feature

What this PR does / why we need it:

This PR adds a feature to prevent same GPUDevice being assigned to two pods in the same PodGroup. This is needed to prevent NCCL communication failures between the two pods on the same physical device.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This feature is behind a feature flag and have been tested with a local minikube instance.

Testing

Built a custom scheduler image from the branch and deployed it with VGPUEnable: true and
VGPUPodGroupDeviceSpread: true. Faked two vGPU devices on the minikube node via
volcano.sh/node-vgpu-register and volcano.sh/node-vgpu-handshake annotations, and patched the node's
allocatable resources accordingly. Created a PodGroup with minMember: 2 and two pods each requesting one
vGPU.

With VGPUPodGroupDeviceSpread: true: the scheduler assigned the two pods to different GPU devices
(GPU-fake-...0001 and GPU-fake-...0002).

With VGPUPodGroupDeviceSpread: false: the scheduler assigned both pods to the same GPU device
(GPU-fake-...0002), confirming the flag correctly gates the behavior.

Does this PR introduce a user-facing change?

Support `VGPUPodGroupDeviceSpread` to prevent same physical device being assigned to two pods from the same pod group.

Copilot AI review requested due to automatic review settings February 18, 2026 23:00
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign monokaix for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot
Copy link
Contributor

Welcome @goyalankit! It looks like this is your first PR to volcano-sh/volcano 🎉

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 18, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @goyalankit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the vGPU scheduling capabilities by preventing multiple pods from the same PodGroup from being assigned to the same physical GPU device. This feature, controlled by a new flag, addresses potential NCCL communication failures in distributed training workloads by ensuring better device spread for co-located pods. The changes involve updating core data structures, adding new utility functions, and integrating the new spread logic into the scheduler's predicate functions.

Highlights

  • New Feature Flag: Introduced a new feature flag, VGPUPodGroupDeviceSpread, to control the behavior of preventing pods from the same PodGroup from sharing a vGPU device.
  • Scheduler Logic Enhancement: Implemented logic within the vGPU scheduler to ensure that when VGPUPodGroupDeviceSpread is enabled, pods belonging to the same PodGroup are not scheduled on the same physical GPU device.
  • Data Structure Update: Extended the GPUUsage data structure to include a PodGroupKey field, allowing the scheduler to track the PodGroup association for each pod using a vGPU.
  • New Utility Functions: Added getPodGroupKey and deviceHasPodFromSameGroup utility functions to facilitate the identification of PodGroups and checking for existing pods from the same group on a device.
  • Documentation and Testing: Updated user documentation to reflect the new feature flag and added comprehensive unit tests to validate the new scheduling logic and utility functions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/user-guide/how_to_use_volcano_vgpu.md
    • Documented the new deviceshare.VGPUPodGroupDeviceSpread configuration option.
  • pkg/scheduler/api/devices/nvidia/vgpu/device_info.go
    • Added PodGroupKey field to the GPUUsage struct.
    • Updated addResource and addToPodMap methods to populate the PodGroupKey for assigned pods.
  • pkg/scheduler/api/devices/nvidia/vgpu/device_info_test.go
    • Added a test case TestAddResourceSetsPodGroupKey to verify correct PodGroupKey assignment.
  • pkg/scheduler/api/devices/nvidia/vgpu/type.go
    • Declared the new global boolean variable VGPUPodGroupDeviceSpread.
  • pkg/scheduler/api/devices/nvidia/vgpu/utils.go
    • Imported v1beta1 for PodGroup annotations.
    • Implemented getPodGroupKey function to extract the PodGroup identifier from pod annotations.
    • Implemented deviceHasPodFromSameGroup function to check for existing pods from the same PodGroup on a GPU device.
    • Integrated the VGPUPodGroupDeviceSpread logic into checkNodeGPUSharingPredicateAndScore to prevent scheduling conflicts.
  • pkg/scheduler/api/devices/nvidia/vgpu/utils_test.go
    • Added comprehensive test cases for getPodGroupKey and deviceHasPodFromSameGroup.
  • pkg/scheduler/plugins/deviceshare/deviceshare.go
    • Defined VGPUPodGroupDeviceSpread as a new constant.
    • Enabled the VGPUPodGroupDeviceSpread feature flag in the enablePredicate function.
Activity
  • The author tested the feature locally using a minikube instance with faked vGPU devices.
  • Testing confirmed that with VGPUPodGroupDeviceSpread: true, pods from the same PodGroup were assigned to different GPU devices.
  • Testing also confirmed that with VGPUPodGroupDeviceSpread: false, pods from the same PodGroup could be assigned to the same GPU device, validating the flag's control.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Signed-off-by: Ankit Goyal <ankit3goyal@gmail.com>
@goyalankit goyalankit force-pushed the prevent-same-device-id branch from a08617d to 19b373b Compare February 18, 2026 23:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional vGPU scheduling constraint (behind deviceshare.VGPUPodGroupDeviceSpread) to avoid assigning the same physical GPUDevice to multiple pods from the same PodGroup, preventing intra-device NCCL communication failures.

Changes:

  • Introduces a new scheduler/plugin argument deviceshare.VGPUPodGroupDeviceSpread and wires it to the vgpu package feature flag.
  • Tracks a per-pod PodGroupKey in vGPU device usage and skips devices already used by the same PodGroup during allocation.
  • Adds/updates unit tests and documents the new configuration option.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/scheduler/plugins/deviceshare/deviceshare.go Adds new plugin argument constant and plumbs it into vgpu.VGPUPodGroupDeviceSpread.
pkg/scheduler/api/devices/nvidia/vgpu/type.go Adds global feature flag for podgroup device spreading.
pkg/scheduler/api/devices/nvidia/vgpu/utils.go Implements PodGroup key extraction + device filtering logic gated by the feature flag.
pkg/scheduler/api/devices/nvidia/vgpu/device_info.go Persists PodGroupKey onto GPUUsage entries when pods are added/tracked.
pkg/scheduler/api/devices/nvidia/vgpu/utils_test.go Adds unit tests for new helper functions (currently contains a compile issue).
pkg/scheduler/api/devices/nvidia/vgpu/device_info_test.go Adds test ensuring PodGroupKey is set when adding resources.
docs/user-guide/how_to_use_volcano_vgpu.md Documents the new optional scheduler argument.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 196 to 214
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/pg", UsedMem: 1000}}},
currentKey: "",
want: false,
},
{
name: "no pod from same group",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/other", UsedMem: 1000}}},
currentKey: "ns/my-pg",
want: false,
},
{
name: "same group with non-zero usage",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/my-pg", UsedMem: 1000, UsedCore: 50}}},
currentKey: "ns/my-pg",
want: true,
},
{
name: "same group but zero usage (released)",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/my-pg", UsedMem: 0, UsedCore: 0}}},
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test won't compile: PodMap is a map[string]*GPUUsage, but the literals use struct values ({...}) instead of pointers. Use &GPUUsage{...} (or change the map value type) for each entry to satisfy the declared type.

Suggested change
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/pg", UsedMem: 1000}}},
currentKey: "",
want: false,
},
{
name: "no pod from same group",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/other", UsedMem: 1000}}},
currentKey: "ns/my-pg",
want: false,
},
{
name: "same group with non-zero usage",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/my-pg", UsedMem: 1000, UsedCore: 50}}},
currentKey: "ns/my-pg",
want: true,
},
{
name: "same group but zero usage (released)",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": {PodGroupKey: "ns/my-pg", UsedMem: 0, UsedCore: 0}}},
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": &GPUUsage{PodGroupKey: "ns/pg", UsedMem: 1000}}},
currentKey: "",
want: false,
},
{
name: "no pod from same group",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": &GPUUsage{PodGroupKey: "ns/other", UsedMem: 1000}}},
currentKey: "ns/my-pg",
want: false,
},
{
name: "same group with non-zero usage",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": &GPUUsage{PodGroupKey: "ns/my-pg", UsedMem: 1000, UsedCore: 50}}},
currentKey: "ns/my-pg",
want: true,
},
{
name: "same group but zero usage (released)",
gd: &GPUDevice{PodMap: map[string]*GPUUsage{"uid1": &GPUUsage{PodGroupKey: "ns/my-pg", UsedMem: 0, UsedCore: 0}}},

Copilot uses AI. Check for mistakes.
Comment on lines 181 to 190
func deviceHasPodFromSameGroup(gd *GPUDevice, currentKey string) bool {
if currentKey == "" {
return false
}
for _, usage := range gd.PodMap {
if usage.PodGroupKey == currentKey && (usage.UsedMem > 0 || usage.UsedCore > 0) {
return true
}
}
return false
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deviceHasPodFromSameGroup dereferences gd.PodMap entries as *GPUUsage without checking for gd == nil or usage == nil. Since PodMap is a map[string]*GPUUsage, a nil value would panic here. Add defensive checks (e.g., return false when gd == nil, and skip nil usage entries).

Copilot uses AI. Check for mistakes.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the VGPUPodGroupDeviceSpread feature, aiming to prevent multiple pods from the same PodGroup from being scheduled onto the same physical GPU device, which helps avoid NCCL communication issues in distributed workloads. However, the implementation introduces a significant race condition in the PodMap structure. This PodMap is shared across scheduling snapshots and is accessed and modified concurrently without synchronization, which can lead to scheduler panics and Denial of Service. Additionally, there is a minor redundancy in the state management logic that could be optimized.

Comment on lines +230 to 234
PodGroupKey: getPodGroupKey(pod),
}
}

gsdevice.PodMap[podUID].UsedMem += deviceused.Usedmem

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The map write to gsdevice.PodMap (lines 230-234) is not synchronized and can occur concurrently with map iterations in deviceHasPodFromSameGroup (introduced in utils.go), leading to a runtime panic and Denial of Service vulnerability. Concurrent map writes and iterations in Go are not thread-safe. While PodGroupKey is initialized within the if !ok block at line 233, the overall access to gsdevice.PodMap requires thread-safe synchronization, especially since GPUDevices is shared and updated asynchronously.

Comment on lines 185 to 189
for _, usage := range gd.PodMap {
if usage.PodGroupKey == currentKey && (usage.UsedMem > 0 || usage.UsedCore > 0) {
return true
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The deviceHasPodFromSameGroup function iterates over gd.PodMap without any synchronization. This map is shared across snapshots and is concurrently modified by AddResource, SubResource, and Allocate calls (which can be triggered by asynchronous event handlers). Concurrent map iteration and write operations in Go result in a runtime panic, which would crash the scheduler and cause a Denial of Service. To fix this, ensure that PodMap is deep-copied when creating a snapshot in getGPUDeviceSnapShot, or use a mutex to synchronize access.

@goyalankit goyalankit changed the title Prevent same device Prevent same physical device for vGPU Feb 18, 2026
@goyalankit goyalankit changed the title Prevent same physical device for vGPU Prevent same physical device for pods in a podgroup with vGPU Feb 18, 2026
Signed-off-by: Ankit Goyal <ankit3goyal@gmail.com>
Replace locally defined podGroupAnnotationKey and volcanoPodGroupAnnotation
constants with the canonical KubeGroupNameAnnotationKey and
VolcanoGroupNameAnnotationKey from volcano.sh/apis/pkg/apis/scheduling/v1beta1.

Signed-off-by: Ankit Goyal <ankit3goyal@gmail.com>
Signed-off-by: Ankit Goyal <ankit3goyal@gmail.com>
@goyalankit goyalankit force-pushed the prevent-same-device-id branch from ed1f400 to 5aee881 Compare February 18, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants