Context: #816, #723
Quoting a high-level idea from #732:
I think there are many different dimensions in solution space that we can explore. [...] And/or server-side applies: https://kubernetes.io/docs/reference/using-api/server-side-apply/ (I think that has a lot of potential)
Today, I started looking into how exactly we could leverage server-side apply (SSA) to achieve conflict-free patches from many writers against the nodes list in an a ComputeDomain object. Without having to update the CD schema (because we also need a solution for 25.8.x, and here I think we really want to try hard getting away without changing the CRD).
Generally, SSA is explicitly designed so that controllers can avoid read-before-write and also avoid specifying resourceVersion. But for leveraging SSA properly, I thought, we probably also have to update the CRD. To make it more "SSA-compatible".
After looking at the details, I got my hopes up that we may actually be able to leverage SSA without having to change the current CD CRD schema.
One key insight is: SSA allows different owners to own different items in a list when the list type is set to map. Turns out, we already do this:
https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/0cefba8118b94195ecb0f15f2e6251b1206eebd3/deployments/helm/nvidia-dra-driver-gpu/crds/resource.nvidia.com_computedomains.yaml#L146C1-L147C23
Specifically, the status.nodes list is already configured as x-kubernetes-list-type: map. That precisely means that different managers can manage different entries of that list.
We use x-kubernetes-list-map-keys: name which means that SSA can merge by node name -- so each writer can really own its own node-specific item in the list.
What do we gain? Conflict-free updates, basically. That would be huge.
What does this cost?
- Each owner/manager creates a little bit of overhead (at least it's annotated in the meta data of the object -- not sure yet how impactful that is)
- The unknowns -- could any distributed system bugs creep by using this strategy? What if a manager/owner is lost? ... (I think we're good, but this may be important to think through anyway)
It seems like we can decide between:
- have as many managers as unique clique IDs (and allow for smaller-scale conflict resolution as before)
- have as many managers as nodes (and prevent any conflicts)
I haven't tried this out yet -- but it seems like code changes can actually be pretty simple. The gist of it:
1) create node-specific patch
A patch would really just specify an individual nodes list item, and it doesn't need to specify a resource version. So, a patch would be only containing the new payload -- something like
{
"status": {
"nodes": [{
"name": "node-1",
"cliqueID": "clique-1",
"ipAddress": "1.1.1.1",
"index": 0,
"status": "Ready"
}]
}
}
The magic can happen through "name": "node-1" -- only one owner may exist for a list item with this k/v pair, and that owner may update this item at will.
Notably, to create that patch we probably do not need to re-fetch the most recent version of the object.
2) PATCH object with specific fieldManager
Let's see about the specific code, but it can be a small change. We'd use Patch() instead of Update()
Resources:
Context: #816, #723
Quoting a high-level idea from #732:
Today, I started looking into how exactly we could leverage server-side apply (SSA) to achieve conflict-free patches from many writers against the nodes list in an a ComputeDomain object. Without having to update the CD schema (because we also need a solution for 25.8.x, and here I think we really want to try hard getting away without changing the CRD).
Generally, SSA is explicitly designed so that controllers can avoid read-before-write and also avoid specifying resourceVersion. But for leveraging SSA properly, I thought, we probably also have to update the CRD. To make it more "SSA-compatible".
After looking at the details, I got my hopes up that we may actually be able to leverage SSA without having to change the current CD CRD schema.
One key insight is: SSA allows different owners to own different items in a list when the list type is set to
map. Turns out, we already do this:https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/0cefba8118b94195ecb0f15f2e6251b1206eebd3/deployments/helm/nvidia-dra-driver-gpu/crds/resource.nvidia.com_computedomains.yaml#L146C1-L147C23
Specifically, the
status.nodeslist is already configured asx-kubernetes-list-type: map. That precisely means that different managers can manage different entries of that list.We use
x-kubernetes-list-map-keys: namewhich means that SSA can merge by node name -- so each writer can really own its own node-specific item in the list.What do we gain? Conflict-free updates, basically. That would be huge.
What does this cost?
It seems like we can decide between:
I haven't tried this out yet -- but it seems like code changes can actually be pretty simple. The gist of it:
1) create node-specific patch
A patch would really just specify an individual nodes list item, and it doesn't need to specify a resource version. So, a patch would be only containing the new payload -- something like
The magic can happen through
"name": "node-1"-- only one owner may exist for a list item with this k/v pair, and that owner may update this item at will.Notably, to create that patch we probably do not need to re-fetch the most recent version of the object.
2) PATCH object with specific fieldManager
Let's see about the specific code, but it can be a small change. We'd use Patch() instead of Update()
Resources: