An important design philosophy that we want to adhere to is for a ComputeDomain to follow the workload. Having to inject an a priori known node count (with the numNodes parameter) contradicts this philosophy.
As far as I remember, we currently require this input parameter to "know" how many IMEX daemons to start (and wait for). Can we instead determine this after the workload got scheduled (but before us releasing it to actually run)?
If we can remove this input parameter then the cognitive load for adding a ComputeDomain "around" an existing workload spec becomes much lighter, I'd argue (or hope, at least).
In k8s, users do not often define their workload in terms of "number of nodes this gets deployed on". Sometimes, this is unknown. When it's unknown, before one can use a ComputeDomain to wrap the workload, one has to make the node count known.
If the number of nodes the workload lands on is dynamic but predictable then users can overcome the challenge with a bit of manual work, as I have for example done here: https://github.com/jgehrcke/jpsnips-nv/blob/bb7de571e4569143341b506cb917f3b3e28e9b42/nickelpie/one-pod-per-node/npie-job.yaml#L7
apiVersion: resource.nvidia.com/v1beta1
kind: ComputeDomain
metadata:
name: nickelpie-test-compute-domain
spec:
numNodes: ${NICKELPIE_N_RANKS}
...
This strongly relates to #348 and #349.
An important design philosophy that we want to adhere to is for a ComputeDomain to follow the workload. Having to inject an a priori known node count (with the
numNodesparameter) contradicts this philosophy.As far as I remember, we currently require this input parameter to "know" how many IMEX daemons to start (and wait for). Can we instead determine this after the workload got scheduled (but before us releasing it to actually run)?
If we can remove this input parameter then the cognitive load for adding a ComputeDomain "around" an existing workload spec becomes much lighter, I'd argue (or hope, at least).
In k8s, users do not often define their workload in terms of "number of nodes this gets deployed on". Sometimes, this is unknown. When it's unknown, before one can use a ComputeDomain to wrap the workload, one has to make the node count known.
If the number of nodes the workload lands on is dynamic but predictable then users can overcome the challenge with a bit of manual work, as I have for example done here: https://github.com/jgehrcke/jpsnips-nv/blob/bb7de571e4569143341b506cb917f3b3e28e9b42/nickelpie/one-pod-per-node/npie-job.yaml#L7
This strongly relates to #348 and #349.