GPU sharing on cuda compute capability >=7.5#231
GPU sharing on cuda compute capability >=7.5#231guptaNswati wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
Signed-off-by: Swati Gupta <swatig@nvidia.com>
58f6bfa to
86de1cb
Compare
|
Thanks @guptaNswati. I will need to check how this differs from #58? |
| if deviceType.Gpu != nil { | ||
| cudaCCv := "v" + strings.TrimPrefix(deviceType.Gpu.cudaComputeCapability, "v") | ||
| gpuUUID := deviceType.Gpu.UUID | ||
| if semver.Compare(semver.Canonical(cudaCCv), semver.Canonical("v7.5")) >= 0 { |
There was a problem hiding this comment.
@guptaNswati where does the v7.5 threshold come from? In #58 we check for >= v7.0 and for MPS specifically, v3.5 is mentioned.
There was a problem hiding this comment.
I picked it from our device-plugin code checking if its Volta https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/device.go#L51
| // allow devices only with cuda compute compatility >= 7.5 as time slicing and MPS does not work with old arch | ||
| shareableAllocatableDevices := make(AllocatableDevices) | ||
| for device, deviceType := range allocatableDevices { | ||
| if deviceType.Gpu != nil { |
There was a problem hiding this comment.
Does this mean that we don't timeslice MIG devices?
There was a problem hiding this comment.
In general, does it make sense to factor these checks into a function where we can better test the various combinations of options?
There was a problem hiding this comment.
these changes also need in unpreprare function
| } | ||
| mpsControlDaemon := s.mpsManager.NewMpsControlDaemon(string(claim.UID), allocatableDevices) | ||
|
|
||
| mpsControlDaemon := s.mpsManager.NewMpsControlDaemon(string(claim.UID), shareableAllocatableDevices) |
There was a problem hiding this comment.
Should we distinguish between timeslicing-sharable and MPS-sharable devices?
|
I don't think we should silently ignore requests to do time-slicing. The way I'd like to see this take form is to
|
Ack. Need to rewrite this. |
|
Let's close this for now; but we can (and should!) certainly pick up the ideas in here again if desired. |
|
@jgehrcke Hi, can I pick this one? |
This is to add a check on allowing GPU sharing only when its a CUDA compute capability of 7.5 and higher. It skips both timeslicing and MPS. Referencing these 2 issues and related MR
#41
https://github.com/NVIDIA/cloud-native-team/issues/97
https://github.com/NVIDIA/cloud-native-team/issues/96
Tested on Geforce 980 and Titan