Unclear reason of validating GPU count == number of NVLink partitions for instance

Regarding https://github.com/NVIDIA/bare-metal-manager-rest/blob/e65b165d91deb3546cbde65361448096caeef2c1/api/pkg/api/model/instance.go#L328 - why the validation function requests for the number of nvlink partitions == the number of GPUs presented in machine caps?

What if I want to add only 2 out of 4 GPUs to partition for an instance? For example, I'm creating an instance and it becomes ready. Then I create nvlink logical partition via API and update the instance with assigning 2 GPUs to this partition. Then API will return this exact error because the rest 2 GPUs are not presented in instance's nvlink interface. As far as I know, nvlink partitioning supports creating partitions only for 1 GPU so it should be the same for 2 GPUs in partition.

One more small question:
I see that nvlink interface supports gpuGuid:
https://github.com/NVIDIA/bare-metal-manager-rest/blob/e65b165d91deb3546cbde65361448096caeef2c1/api/pkg/api/model/nvlinkinterface.go#L73
but in API docs I see only deviceInstance to use in nvlink interface definition for update instance: https://nvidia.github.io/bare-metal-manager-rest/#tag/Instance/operation/update-instance. May we use gpuGuid instead of deviceInstance? If not then how we can get deviceInstance value for GPUs in bare-metal manager API?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear reason of validating GPU count == number of NVLink partitions for instance #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unclear reason of validating GPU count == number of NVLink partitions for instance #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions