capacity: fix duplicate topology (attempt #2)#1450
capacity: fix duplicate topology (attempt #2)#1450k8s-ci-robot merged 1 commit intokubernetes-csi:masterfrom
Conversation
| func TestHasSynced(t *testing.T) { | ||
| synctest.Test(t, func(t *testing.T) { | ||
| client := fakeclientset.NewSimpleClientset() | ||
| informerFactory := informers.NewSharedInformerFactory(client, 1*time.Hour) |
There was a problem hiding this comment.
For why not setting resync period to 0, see kubernetes/kubernetes#133500
There was a problem hiding this comment.
I don't see the connection to kubernetes/kubernetes#133500. Can you explain here why a resync period of one hour is useful?
There was a problem hiding this comment.
I'm guessing it's because without kubernetes/kubernetes#133500, a zero resync period would cause reads from neverExitWatch, which isn't valid in a synctest. But it would be nice to explicitly state that.
There was a problem hiding this comment.
Now we have Kubernetes v0.35 included in the go.mod. We actually can set resync period to 0. So no need special comment now.
| go func() { | ||
| <-ctx.Done() | ||
| nt.queue.ShutDown() | ||
| }() |
There was a problem hiding this comment.
synctest checks for leaked goroutines. So I have to clean it up.
|
We need go 1.25 to use the synctest package :( |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
/remove-lifecycle rotten @huww98: shall we finish this PR? See: |
|
Yes, I was waiting for go 1.25 update for synctest. Now we have it, let me rebase and resolve the comments. |
When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks. This will result in duplicated entries in capacities map, resulting in: either - Two CSIStorageCapacity object get created for the same topology, or - The same CSIStorageCapacity object get assigned to two keys in capacities map. When one of them is updated, the other one will hold an outdated object and all subsequent update will fail with conflict.
1b66f2e to
5f11dc4
Compare
| // wait for sync. | ||
| factoryForNamespace.Start(ctx.Done()) | ||
| } | ||
| if topologyInformer != nil { |
There was a problem hiding this comment.
Why is important that topologyInformer gets started here vs. where it was started before?
If it's important, then let's add a comment explainining why. If it's not important, then let's not change it.
There was a problem hiding this comment.
All other informers are started here. It is nature to start it here.
I think we should only start topologyInformer after factoryForNamespace.Start, to avoid cache.WaitForCacheSync waiting for not-started informers. Wasting a little CPU when waiting for leader election.
There is already a comment just before:
// Starting is enough, the capacityController and topologyInformer will
// wait for sync.Do you think this is enough?
There was a problem hiding this comment.
I think both approaches would be fine and in general I prefer to avoid drive-by enhancements that aren't related to what a PR is primarily trying to do (in this case "fix duplicate topology"). It causes churn and makes reviews harder.
But we can keep it.
| // wait for sync. | ||
| factoryForNamespace.Start(ctx.Done()) | ||
| } | ||
| if topologyInformer != nil { |
There was a problem hiding this comment.
I think both approaches would be fine and in general I prefer to avoid drive-by enhancements that aren't related to what a PR is primarily trying to do (in this case "fix duplicate topology"). It causes churn and makes reviews harder.
But we can keep it.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: huww98, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks.
This will result in duplicated entries in capacities map, resulting in: either
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Please see also #1435
Does this PR introduce a user-facing change?:
/cc @pohly