ComputeDomain: do not require a priori node count (numNodes) for creation

An important design philosophy that we want to adhere to is for a ComputeDomain to follow the workload. Having to inject an _a priori_ known node count (with the [`numNodes` parameter](https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/bf444c4ffc5cc094034e7d1ceef623cb7c2da9cc/api/nvidia.com/resource/v1beta1/computedomain.go#L57)) contradicts this philosophy.

As far as I remember, we currently require this input parameter to "know" how many IMEX daemons to start (and wait for). Can we instead determine this _after_ the workload got scheduled (but _before_ us releasing it to actually run)?

If we can remove this input parameter then the cognitive load for adding a ComputeDomain "around" an existing workload spec becomes much lighter, I'd argue (or hope, at least).

In k8s, users do not often define their workload in terms of "number of nodes this gets deployed on". Sometimes, this is unknown. When it's unknown, before one can use a ComputeDomain to wrap the workload, one has to make the node count known.

If the number of nodes the workload lands on is dynamic but predictable then users can overcome the challenge with a bit of manual work, as I have for example done here: https://github.com/jgehrcke/jpsnips-nv/blob/bb7de571e4569143341b506cb917f3b3e28e9b42/nickelpie/one-pod-per-node/npie-job.yaml#L7

```yaml
apiVersion: resource.nvidia.com/v1beta1
kind: ComputeDomain
metadata:
  name: nickelpie-test-compute-domain
spec:
  numNodes: ${NICKELPIE_N_RANKS}
...
```

This strongly relates to #348 and #349.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComputeDomain: do not require a priori node count (numNodes) for creation #364

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ComputeDomain: do not require a priori node count (numNodes) for creation #364

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions