Skip to content

Unclear reason of validating GPU count == number of NVLink partitions for instance #123

@prazumovsky

Description

@prazumovsky

Regarding

"nvLinkInterfaces": errors.New("number of NVLink Interfaces must match the number of GPU indexes"),
- why the validation function requests for the number of nvlink partitions == the number of GPUs presented in machine caps?

What if I want to add only 2 out of 4 GPUs to partition for an instance? For example, I'm creating an instance and it becomes ready. Then I create nvlink logical partition via API and update the instance with assigning 2 GPUs to this partition. Then API will return this exact error because the rest 2 GPUs are not presented in instance's nvlink interface. As far as I know, nvlink partitioning supports creating partitions only for 1 GPU so it should be the same for 2 GPUs in partition.

One more small question:
I see that nvlink interface supports gpuGuid:

GpuGUID *string `json:"gpuGuid"`

but in API docs I see only deviceInstance to use in nvlink interface definition for update instance: https://nvidia.github.io/bare-metal-manager-rest/#tag/Instance/operation/update-instance. May we use gpuGuid instead of deviceInstance? If not then how we can get deviceInstance value for GPUs in bare-metal manager API?

Metadata

Metadata

Assignees

Labels

featureFeature (deprecated - use issue type, but it's needed for reporting now)

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions