Skip to content

Conversation

@jgehrcke
Copy link
Collaborator

@jgehrcke jgehrcke commented Oct 7, 2025

A side quest from #646 (comment).

I wanted to think this through, and not throw the patch away.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jgehrcke jgehrcke self-assigned this Oct 7, 2025
@jgehrcke jgehrcke moved this from Backlog to In Progress in Planning Board: k8s-dra-driver-gpu Oct 7, 2025
@jgehrcke jgehrcke added this to the v25.8.1 milestone Oct 7, 2025

// Try to find an existing entry for the current k8s node
for _, node := range newCD.Status.Nodes {
for _, node := range cd.Status.Nodes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be done on the newCD, otherwise we run the risk of this object changing out from under us (it's a pointer).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying that

nodeInfo = node.DeepCopy()

right below won't do the trick because we're just copying a pointer?

Will review.

Copy link
Collaborator

@klueska klueska Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm saying we can't loop over cd.Status.Nodes because that might change in the course of the loop because cd is a pointer into an object in the informer cache.

In this particular case it might be fine since there are no pointers in cd.Status.Nodes itself, but in general we would need to (at a minimum) iterate over a deep copy of cd.Status.Nodes to be sure nothing embedded in the object we are looking at gets changed behind our backs.

Copy link
Collaborator Author

@jgehrcke jgehrcke Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some conclusions:

  • Of course because we pass the CD in by pointer there could hypothetically be a concurrent execution unit mutating the object underneath us (we know/assume this to not be true right now)
  • DeepCopy() doesn't generally protect from concurrent mutation (it might happen during DeepCopy execution, and only locking would be able to guard against that).
  • The informer itself is not mutating the CD object underneath us (other callbacks however may, if they run concurrently -- not sure if they ever do).
  • Performance of just DeepCopy() is not a concern; it may only become significant when we call this function at pathologically high rate (in which case we should look into changing that).

Let's come back to this later. And when we do: we probably want to adjust all similar places in the same way, for symmetry. Maybe even using common code.

@klueska klueska added the usability issue/pr related to UX label Oct 8, 2025
@klueska klueska modified the milestones: v25.8.1, v25.8.0 Oct 8, 2025
@jgehrcke jgehrcke removed this from the v25.8.0 milestone Oct 8, 2025
@klueska klueska added this to the unscheduled milestone Oct 8, 2025
@klueska klueska marked this pull request as draft October 8, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

usability issue/pr related to UX

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants