Skip to content
This repository was archived by the owner on Sep 19, 2022. It is now read-only.
This repository was archived by the owner on Sep 19, 2022. It is now read-only.

Multi-gpu in a single pod #362

Open
Open
@wallarug

Description

@wallarug

Hi Team,

I am trying to run a Kubernetes Pod with multiple GPUs in the same pod. I can't seem to find any resources for how to do this. All the resources I find are 1 pod = 1 gpu. I don't want this. I want to be able to spin up 2x4gpu (8gpu) pods or different combinations.

It seems this has been asked before in #219 #331 but no solid answers in there.

The YAML file I have based my testing on is from this tutorial: https://towardsdatascience.com/pytorch-distributed-on-kubernetes-71ed8b50a7ee

I have changed part of it to reflect using 2 GPUs in 1 pod.

 Worker:
      replicas: 1
      restartPolicy: OnFailure
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          volumes:
            - name: pv-k8s-storage
              persistentVolumeClaim:
                claimName: pvc-k8s-storage
          containers:
            - name: pytorch
              command: ["/bin/sh"]
              args: ["-c", "/usr/bin/python3 -m pip install --upgrade pip; pip install tensorboardX pandas scikit-learn; python3 ranzrc.py --epochs 5 --ba$
              image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
              resources:
                requests:
                  nvidia.com/gpu: 2
                limits:
                  nvidia.com/gpu: 2

I am seeing similar behaviour to #219 where when I spin this up, only 1 GPU gets used by the test code (when I told it to use 2).

Any assistance or pointing in the right direction on this would be great. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions