Skip to content

Use SSA for conflict-free status.nodes list updates #821

@jgehrcke

Description

@jgehrcke

Context: #816, #723

Quoting a high-level idea from #732:

I think there are many different dimensions in solution space that we can explore. [...] And/or server-side applies: https://kubernetes.io/docs/reference/using-api/server-side-apply/ (I think that has a lot of potential)

Today, I started looking into how exactly we could leverage server-side apply (SSA) to achieve conflict-free patches from many writers against the nodes list in an a ComputeDomain object. Without having to update the CD schema (because we also need a solution for 25.8.x, and here I think we really want to try hard getting away without changing the CRD).

Generally, SSA is explicitly designed so that controllers can avoid read-before-write and also avoid specifying resourceVersion. But for leveraging SSA properly, I thought, we probably also have to update the CRD. To make it more "SSA-compatible".

After looking at the details, I got my hopes up that we may actually be able to leverage SSA without having to change the current CD CRD schema.

One key insight is: SSA allows different owners to own different items in a list when the list type is set to map. Turns out, we already do this:

https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/0cefba8118b94195ecb0f15f2e6251b1206eebd3/deployments/helm/nvidia-dra-driver-gpu/crds/resource.nvidia.com_computedomains.yaml#L146C1-L147C23

Specifically, the status.nodes list is already configured as x-kubernetes-list-type: map. That precisely means that different managers can manage different entries of that list.

We use x-kubernetes-list-map-keys: name which means that SSA can merge by node name -- so each writer can really own its own node-specific item in the list.

What do we gain? Conflict-free updates, basically. That would be huge.
What does this cost?

  • Each owner/manager creates a little bit of overhead (at least it's annotated in the meta data of the object -- not sure yet how impactful that is)
  • The unknowns -- could any distributed system bugs creep by using this strategy? What if a manager/owner is lost? ... (I think we're good, but this may be important to think through anyway)

It seems like we can decide between:

  • have as many managers as unique clique IDs (and allow for smaller-scale conflict resolution as before)
  • have as many managers as nodes (and prevent any conflicts)

I haven't tried this out yet -- but it seems like code changes can actually be pretty simple. The gist of it:

1) create node-specific patch

A patch would really just specify an individual nodes list item, and it doesn't need to specify a resource version. So, a patch would be only containing the new payload -- something like

{
  "status": {
    "nodes": [{
      "name": "node-1",
      "cliqueID": "clique-1", 
      "ipAddress": "1.1.1.1",
      "index": 0,
      "status": "Ready"
    }]
  }
}

The magic can happen through "name": "node-1" -- only one owner may exist for a list item with this k/v pair, and that owner may update this item at will.

Notably, to create that patch we probably do not need to re-fetch the most recent version of the object.

2) PATCH object with specific fieldManager

Let's see about the specific code, but it can be a small change. We'd use Patch() instead of Update()

Resources:

Metadata

Metadata

Assignees

Labels

perfissue/pr related to performance

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions