Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NVIDIA MIG #4418

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

piyush-jena
Copy link
Contributor

@piyush-jena piyush-jena commented Feb 27, 2025

Issue number:
Fixes: #4406, #4252

Description of changes:

Testing done:

Migration Testing

v1.32.0

bash-5.1# apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "cacc4ce9",
    "pretty_name": "Bottlerocket OS 1.32.0 (aws-k8s-1.29-nvidia)",
    "variant_id": "aws-k8s-1.29-nvidia",
    "version_id": "1.32.0"
  }
}
bash-5.1# apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"
Failed to change settings: Failed PATCH request to '/settings/keypair?tx=apiclient-set-vELpjamiDtvsQslp': Status 400 when PATCHing /settings/keypair?tx=apiclient-set-vELpjamiDtvsQslp: Unable to match your input to the data model.  We may not have enough type information.  Please try the --json input form.  Cause: Error during deserialization: unknown field `device-partitioning-strategy`, expected one of `pass-device-specs`, `device-id-strategy`, `device-list-strategy`, `device-sharing-strategy`, `time-slicing` at line 1 column 67
bash-5.1# apiclient apply <<EOF
> [settings.kubelet-device-plugins.nvidia.mig.profile]
> "a100.40gb"="7"
> "h100.80gb"="4"
> "g100.80gb"="1g.5gb"
> EOF
Failed to apply settings: Failed to PATCH settings from '-' to '/settings?tx=apiclient-apply-jnaXTAbi9QIbJm0M': Status 400 when PATCHing /settings?tx=apiclient-apply-jnaXTAbi9QIbJm0M: Json deserialize error: unknown field `mig`, expected one of `pass-device-specs`, `device-id-strategy`, `device-list-strategy`, `device-sharing-strategy`, `time-slicing` at line 1 column 42
bash-5.1# updog check-update -a --json
[
  {
    "variant": "aws-k8s-1.29-nvidia",
    "arch": "x86_64",
    "version": "1.34.0",
    "max_version": "1.34.0",
    "waves": {
      "0": "2025-02-27T03:28:57.496827355Z",
      "20": "2025-02-27T06:28:57.496827355Z",
      "102": "2025-02-28T02:28:57.496827355Z",
      "307": "2025-03-01T02:28:57.496827355Z",
      "819": "2025-03-03T02:28:57.496827355Z",
      "1228": "2025-03-04T02:28:57.496827355Z",
      "1843": "2025-03-05T02:28:57.496827355Z"
    },
    "images": {
      "boot": "bottlerocket-aws-k8s-1.29-nvidia-x86_64-1.34.0-2ac59cde-boot.ext4.lz4",
      "root": "bottlerocket-aws-k8s-1.29-nvidia-x86_64-1.34.0-2ac59cde-root.ext4.lz4",
      "hash": "bottlerocket-aws-k8s-1.29-nvidia-x86_64-1.34.0-2ac59cde-root.verity.lz4"
    }
  }
]
bash-5.1# updog update -i 1.34.0 -r -n
Starting update to 1.34.0
Reboot scheduled for Thu 2025-02-27 02:37:43 UTC, use 'shutdown -c' to cancel.
Update applied: aws-k8s-1.29-nvidia 1.34.0

Upgrade to v1.34.0

bash-5.1# apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "2ac59cde",
    "pretty_name": "Bottlerocket OS 1.34.0 (aws-k8s-1.29-nvidia)",
    "variant_id": "aws-k8s-1.29-nvidia",
    "version_id": "1.34.0"
  }
}
bash-5.1# apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"
bash-5.1# apiclient apply <<EOF
> [settings.kubelet-device-plugins.nvidia.mig.profile]
> "a100.40gb"="7"
> "h100.80gb"="4"
> "g100.80gb"="1g.5gb"
> EOF
bash-5.1# signpost rollback-to-inactive

Downgrade to v1.32.0

[ssm-user@control]$ apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "cacc4ce9",
    "pretty_name": "Bottlerocket OS 1.32.0 (aws-k8s-1.29-nvidia)",
    "variant_id": "aws-k8s-1.29-nvidia",
    "version_id": "1.32.0"
  }
}
[ssm-user@control]$ apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"
Failed to change settings: Failed PATCH request to '/settings/keypair?tx=apiclient-set-3ypsF3ruTrXjCzNq': Status 400 when PATCHing /settings/keypair?tx=apiclient-set-3ypsF3ruTrXjCzNq: Unable to match your input to the data model.  We may not have enough type information.  Please try the --json input form.  Cause: Error during deserialization: unknown field `device-partitioning-strategy`, expected one of `pass-device-specs`, `device-id-strategy`, `device-list-strategy`, `device-sharing-strategy`, `time-slicing` at line 1 column 67
[ssm-user@control]$ apiclient apply <<EOF
> [settings.kubelet-device-plugins.nvidia.mig.profile]
> "a100.40gb"="7"
> "h100.80gb"="4"
> "g100.80gb"="1g.5gb"
> EOF
Failed to apply settings: Failed to PATCH settings from '-' to '/settings?tx=apiclient-apply-yGwq7oN11jDcLKqn': Status 400 when PATCHing /settings?tx=apiclient-apply-yGwq7oN11jDcLKqn: Json deserialize error: unknown field `mig`, expected one of `pass-device-specs`, `device-id-strategy`, `device-list-strategy`, `device-sharing-strategy`, `time-slicing` at line 1 column 42

Feature Testing

Settings

bash-5.1# apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"
bash-5.1# apiclient apply <<EOF
> [settings.kubelet-device-plugins.nvidia.mig.profile]
> "a100.40gb"="7"
> "h100.80gb"="4"
> "g100.80gb"="1g.5gb"
> EOF

kubectl describe node output

Capacity:
  cpu:                96
  ephemeral-storage:  418812624Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1176277888Ki
  nvidia.com/gpu:     56
  pods:               737
Allocatable:
  cpu:                95690m
  ephemeral-storage:  384903971816
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1167612800Ki
  nvidia.com/gpu:     56
  pods:               737

nvidia-k8s-device-plugin status

ash-5.1# systemctl status nvidia-k8s-device-plugin
● nvidia-k8s-device-plugin.service - Start NVIDIA kubernetes device plugin
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-k8s-device-plugin.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf
             /etc/systemd/system/nvidia-k8s-device-plugin.service.d
             └─exec-start.conf
     Active: active (running) since Thu 2025-02-27 02:34:58 UTC; 15min ago
   Main PID: 14870 (nvidia-device-p)
      Tasks: 21 (limit: 629145)
     Memory: 61.7M
        CPU: 16.438s
     CGroup: /system.slice/nvidia-k8s-device-plugin.service
             └─14870 /usr/bin/nvidia-device-plugin --config-file=/etc/nvidia-k8s-device-plugin/settings.yaml

Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]:   },
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]:   "sharing": {
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]:     "timeSlicing": {}
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]:   },
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]:   "imex": {}
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]: }
Feb 27 02:34:59 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]: I0227 02:34:58.722298   14870 main.go:356] Retrieving plugins.
Feb 27 02:35:05 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]: I0227 02:35:05.440287   14870 server.go:195] Starting GRPC server fo…om/gpu'
Feb 27 02:35:05 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]: I0227 02:35:05.446465   14870 server.go:139] Starting to serve 'nvid…pu.sock
Feb 27 02:35:05 ip-192-168-155-1.us-west-2.compute.internal nvidia-device-plugin[14870]: I0227 02:35:05.451689   14870 server.go:146] Registered device plugi…Kubelet
Hint: Some lines were ellipsized, use -l to show in full.

Workload test

[fedora@ip-172-31-48-208 kubernetes]$ cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nvidia-cuda-workload-deployment-1
spec:
  replicas: 20
  selector:
    matchLabels:
      app: vector-add
  template:
    metadata:
      labels:
        app: vector-add
    spec:
      containers:
      - name: vector-add
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
        resources:
          limits:
              nvidia.com/gpu: 1
EOF
deployment.apps/nvidia-cuda-workload-deployment-1 created
[fedora@ip-172-31-48-208 kubernetes]$ kubectl get pods
NAME                                                READY   STATUS              RESTARTS   AGE
nvidia-cuda-workload-deployment-1-c9548dfc4-2wxw2   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-44wmq   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-5j8xd   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-5zldn   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-69gfp   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-9nnqf   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-b74mp   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-bdbwf   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-brvrt   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-dfsnd   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-dhddn   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-dtxnz   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-fwcmn   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-gp86x   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-k4sjd   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-mqwkt   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-p5bt7   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-wl5z8   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-z9vxr   0/1     ContainerCreating   0          6s
nvidia-cuda-workload-deployment-1-c9548dfc4-zhls8   0/1     ContainerCreating   0          6s
[fedora@ip-172-31-48-208 kubernetes]$ kubectl get pods
NAME                                                READY   STATUS      RESTARTS   AGE
nvidia-cuda-workload-deployment-1-c9548dfc4-2wxw2   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-44wmq   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-5j8xd   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-5zldn   0/1     Completed   0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-69gfp   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-9nnqf   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-b74mp   0/1     Completed   0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-bdbwf   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-brvrt   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-dfsnd   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-dhddn   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-dtxnz   0/1     Completed   0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-fwcmn   0/1     Completed   0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-gp86x   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-k4sjd   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-mqwkt   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-p5bt7   0/1     Completed   0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-wl5z8   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-z9vxr   1/1     Running     0          29s
nvidia-cuda-workload-deployment-1-c9548dfc4-zhls8   1/1     Running     0          29s
[fedora@ip-172-31-48-208 kubernetes]$ kubectl logs nvidia-cuda-workload-deployment-1-c9548dfc4-5j8xd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

MIG + Time-slicing kubectl describe node output (gpus = 8, partitions = 4, replicas = 4)

bash-5.1# apiclient get settings.kubelet-device-plugins.nvidia
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "volume-mounts",
        "device-partitioning-strategy": "mig",
        "device-sharing-strategy": "time-slicing",
        "mig": {
          "profile": {
            "a100.40gb": "7",
            "g100.80gb": "1g.5gb",
            "h100.80gb": "4"
          }
        },
        "pass-device-specs": true,
        "time-slicing": {
          "replicas": 4
        }
      }
    }
  }
}
[fedora@ip-172-31-48-208 kubernetes]$ kubectl describe node
Capacity:
  cpu:                    192
  ephemeral-storage:      418124376Ki
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 2097096404Ki
  nvidia.com/gpu:         0
  nvidia.com/gpu.shared:  128
  pods:                   100
Allocatable:
  cpu:                    191450m
  ephemeral-storage:      384269682460
  hugepages-1Gi:          0
  hugepages-2Mi:          0
  memory:                 2095606484Ki
  nvidia.com/gpu:         0
  nvidia.com/gpu.shared:  128
  pods:                   100

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@piyush-jena piyush-jena changed the title migration: add migrations for NVIDIA MIG Add support for NVIDIA MIG Feb 27, 2025
Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if testing comes back green.

@arnaldo2792
Copy link
Contributor

Is there a test to make sure that once timeSlicing is enabled, the Device Plugin advertises the correct device name based of the time slice?

@piyush-jena piyush-jena marked this pull request as ready for review February 27, 2025 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MIG support issues on P4 instances
3 participants