Skip to content

Auto-Create DRA Resources#609

Merged
varunrsekar merged 1 commit intoNVIDIA:mainfrom
varunrsekar:autoassign-gpus
Aug 16, 2025
Merged

Auto-Create DRA Resources#609
varunrsekar merged 1 commit intoNVIDIA:mainfrom
varunrsekar:autoassign-gpus

Conversation

@varunrsekar
Copy link
Copy Markdown
Collaborator

@varunrsekar varunrsekar commented Aug 12, 2025

This PR adds new APIs to auto-create DRA resourceclaims.

At the bare minimum, the following input is needed:

spec:
  draResources:
  - claimSpec:
      devices:
      - name: gpu

This would generate the following resourceclaim spec:

spec:
  devices:
    requests:
    - exactly:
        allocationMode: ExactCount
        count: 1
        deviceClassName: gpu.nvidia.com
        selectors:
        - cel:
              expression: device.driver == "gpu.nvidia.com"
      name: gpu

For a more advanced auto-create experience:

spec:
  draResources:
  - claimSpec:
      isTemplate: true
      devices:
      - name: gpu
        deviceClassName: gpu.nvidia.com
        driverName: gpu.nvidia.com
        matchAttributes:
        - key: index
          op: NotEqual
          value:
            intValue: 1
        - key: driverVersion
          op: GreaterThanOrEqual
          value:
            versionValue: "550.127.8"
        - key: productName
          op: Equal
          value:
            stringValue: "NVIDIA A100-PCIE-40GB"
        matchCapacity:
        - key: memory
          value: 40Gi

This would generate the following resourceclaim template spec:

spec:
  devices:
    requests:
    - exactly:
        allocationMode: ExactCount
        count: 1
        deviceClassName: gpu.nvidia.com
        selectors:
        - cel:
              expression: device.driver == "gpu.nvidia.com"
          - cel:
              expression: device.attributes["gpu.nvidia.com"].index != 1
          - cel:
              expression: (device.attributes["gpu.nvidia.com"].driverVersion).compareTo(semver("550.127.8"))
                >= 0
          - cel:
              expression: device.attributes["gpu.nvidia.com"].productName == "NVIDIA
                A100-PCIE-40GB"
          - cel:
              expression: (device.capacity["gpu.nvidia.com"].memory).compareTo(quantity("40Gi"))
                == 0
      name: gpu

Notes:

  1. The CEL expression device.driver == <spec.draResources[].claimSpec.devices[].driverName> is always added.
  2. The label nim-operator.nvidia.com/dra-auto-generated="true" is always added.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Aug 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread internal/shared/resourceclaims.go Outdated
Comment thread internal/shared/resourceclaims.go Outdated
Comment thread api/apps/v1alpha1/common_types.go Outdated
@varunrsekar varunrsekar force-pushed the autoassign-gpus branch 3 times, most recently from 788d8cf to 25810e8 Compare August 15, 2025 22:42
@varunrsekar varunrsekar marked this pull request as ready for review August 15, 2025 22:42
Signed-off-by: Varun Ramachandra Sekar <vsekar@nvidia.com>
Copy link
Copy Markdown
Collaborator

@shivamerla shivamerla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this! thanks @varunrsekar

@varunrsekar varunrsekar merged commit dee5311 into NVIDIA:main Aug 16, 2025
9 checks passed
@varunrsekar varunrsekar deleted the autoassign-gpus branch August 16, 2025 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants