fix(informer): use ReadWriteLock in CacheImpl to prevent index read inconsistency by Desel72 · Pull Request #7558 · fabric8io/kubernetes-client

Desel72 · 2026-03-12T15:01:43Z

Description

Adds a disabled concurrency test that reproduces the race condition described in #7265.

CacheImpl.updateIndices() performs a two-step operation when updating an object's index entry: first removing the old entry, then adding the new one. While write methods (put(), remove()) are synchronized, read methods (byIndex(), indexKeys()) are not, allowing concurrent readers to observe the intermediate state where an item has been removed but not yet re-added.

This test verifies that index reads never observe partially-updated state during concurrent writes. It is @Disabled until a follow-up PR provides the fix.

Changes

CacheImplConcurrencyTest.java: New concurrent test that reproduces the race condition (4 writer + 8 reader threads)

Type of change

Bug fix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change
Chore (non-breaking change which doesn't affect codebase;
test, version modification, documentation, etc.)

Checklist

Code contributed by me aligns with current project license: Apache 2.0
I Added CHANGELOG entry regarding this change
I have implemented unit tests to cover my changes
I have added/updated the javadocs and other documentation accordingly
No new bugs, code smells, etc. in SonarCloud report
I tested my code in Kubernetes
I tested my code in OpenShift

Desel72 · 2026-03-12T16:28:42Z

@manusa Could you review this PR. I've added the test.

Desel72 · 2026-03-13T13:52:00Z

Hi @manusa Could you help me so that I can test? I can't find what the issue is. Thanks

Desel72 · 2026-03-14T13:19:05Z

HI @manusa Could you review this PR please?

Desel72 · 2026-03-15T13:37:30Z

Hi @manusa I've updated. Could you please review this?

Desel72 · 2026-03-17T01:21:06Z

Hi @manusa @ash-thakur-rh @shawkins
Could you please review this PR?

ash-thakur-rh · 2026-03-17T05:48:50Z

kubernetes-model-generator/openapi/generator/go.mod

 	knative.dev/eventing-couchdb v0.28.0
 	knative.dev/eventing-github v0.46.3
-	knative.dev/eventing-gitlab v0.46.3
+	knative.dev/eventing-gitlab v0.48.0


@Desel72 Please revert the changes from go mod file. These changes are out of scope for this fix.

ash-thakur-rh · 2026-03-17T05:49:51Z

pom.xml

    <jackson.bundle.version.annotations>2.21</jackson.bundle.version.annotations>
    <jetty.version>11.0.26</jetty.version>
-    <maven-core.version>3.9.13</maven-core.version>
+    <maven-core.version>3.9.14</maven-core.version>


Same here too, revert the changes for dependency updates. These are also out of scope for this fix. Any specific reason for which you have done these changes?

ash-thakur-rh · 2026-03-17T06:24:11Z

...ient/src/test/java/io/fabric8/kubernetes/client/informers/impl/cache/ProcessorStoreTest.java


    Mockito.doAnswer(invocation -> {
-      assertTrue(Thread.holdsLock(podCache.getLockObject()));
+      assertTrue(((java.util.concurrent.locks.ReentrantReadWriteLock) podCache.getLock()).isWriteLockedByCurrentThread());


minor request: use specific import instead of FQN import.

@ash-thakur-rh Thanks for your feedback. I will check soon.

…nconsistency

…Test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Desel72 · 2026-03-17T23:02:53Z

Hi @ash-thakur-rh @manusa I've solved. Can you please review this?

ash-thakur-rh

Looks good overall! Just a change request in test. Also add an entry for the fix in CHANGELOG file.

ash-thakur-rh · 2026-03-18T14:20:19Z

...rc/test/java/io/fabric8/kubernetes/client/informers/impl/cache/CacheImplConcurrencyTest.java

+  private static final String LABEL_INDEX = "label-index";
+
+  @Test
+  void byIndexShouldNeverMissObjectDuringConcurrentUpdates() throws InterruptedException {


these two tests are nearly identical, the tests can be parameterized tests or can share a helper method.

Hi @ash-thakur-rh Thanks for your feedback. I've done. Can you please review this?

@valuesource

- Replaced two nearly-identical test methods with a single parameterized test using @valuesource and a shared helper method - Added CHANGELOG entry for issue fabric8io#7265 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ash-thakur-rh

LGTM! Thanks!

Desel72 · 2026-03-19T11:27:25Z

Thanks @ash-thakur-rh

Desel72 · 2026-03-19T12:47:47Z

@manusa @shawkins Could you please review this?

shawkins · 2026-03-19T17:01:00Z

The changes look good if we're ok with introducing more locking.

It was an intentional choice to initially keep the implementation as lock free as possible. If we want to remain as lock free as possible, we should use something like #7575 - the read of an index will still not be fully consistent but there won't be the problem described in #7265

The test is a slippery slope the number of iterations here are small, but we'll generally want to avoid trying to test concurrency this way in unit tests.

Desel72 · 2026-03-20T02:02:56Z

The changes look good if we're ok with introducing more locking.

It was an intentional choice to initially keep the implementation as lock free as possible. If we want to remain as lock free as possible, we should use something like #7575 - the read of an index will still not be fully consistent but there won't be the problem described in #7265

The test is a slippery slope the number of iterations here are small, but we'll generally want to avoid trying to test concurrency this way in unit tests.

@shawkins Do you mean this should be updated? If then, can you please let me know what I should do? I will never give up and solve it.

shawkins · 2026-03-20T12:48:10Z

@shawkins Do you mean this should be updated? If then, can you please let me know what I should do? I will never give up and solve it.

We just need to have concensus on what direction to go. @manusa @csviri @metacosm @ash-thakur-rh do you want to keep the minimally locking behavior via the draft shown #7575

Or go with fully consistent index reads via a read/write lock?

I vaguely remember some user complaints the old fully locking behavior, and don't know if that would have been satisfied with a read/write lock (obviously that should depend on how frequent the events are).

csviri · 2026-03-20T13:04:31Z

Not sure how feasible is to measure the performance degradation. But without having there consistency in those index made it quite hard to reason about, so basically rarely missing resources from index pretty much made it unusable for josdk internal purposes; therefore we ended up having our own index.

Desel72 · 2026-03-20T13:20:05Z

@shawkins @csviri @manusa
I think this PR can be merged. Thanks for your effort.

shawkins · 2026-03-20T17:24:47Z

Not sure how feasible is to measure the performance degradation.

Adapting the test included here a little (jdk 25, 8000000 iterations, 8 readers, 1 writer - because that matches our usage - writing indefinitely with 2 ms between writes):

this pr - ~ 6 - 8.5 seconds per test

Under this scenario a readwrite lock seems to perform worse than full synchronization - that probably relates to the cost of the indexing function and how many values it returns - this example is simple so the index function cost is minimal. Since we have no control over what users may be doing with those functions, if we want full consistency it's best to stick with a read/write lock.

#7575 - ~ .4 - 1.1 seconds per test

But without having there consistency in those index made it quite hard to reason about, so basically rarely missing resources from index pretty much made it unusable for josdk internal purposes; therefore we ended up having our own index.

Sorry I didn't pay attention to this before.

To double check, are you good to just address the ephemeral removal, or do you want fully consistent reads from the indexes?

edit: to elaborate the current state of #7575:

there can be a delta (depending on the cost of indexing functions) between when items that need to be newly added to indexes are seen in the indexes vs when the item is put into the ItemStore. This was the intent of the original changes for Reduce CacheImpl lock contention #5973 but it was not implemented correctly.
If an item has already been indexed we double check before returning when the resource version does not match with the current cache item that it belongs in the index - this prevents stale reads wrt to the current state of the cache.

@csviri I believe this level of consistency is good enough for your usage - events aren't emitted until after the CacheImpl put call completes, at which point the indexes are up-to-date. If that is correct, then we should proceed with #7575.

If not, then this PR is good.

Desel72 · 2026-03-23T12:12:45Z

Hi @csviri I think this PR can be merged. Could you please review this?

csviri · 2026-03-24T09:04:48Z

@shawkins @Desel72 sorry, having bit busy days, will take a look tomorrow.

Desel72 · 2026-03-25T15:33:42Z

Hi @csviri, how are you doing? Are you busy now?

Desel72 · 2026-03-27T01:09:12Z

Hi @csviri how are you? Are you busy now?

csviri

LGTM, but @shawkins @manusa should make the final decision regarding which version to proceed with.

Desel72 · 2026-03-30T15:59:04Z

Hi @shawkins @manusa @ash-thakur-rh I think this can be merged. Please check this. thanks @csviri

shawkins · 2026-03-30T16:30:12Z

@Desel72 see #7575 (comment) I believe that @csviri is okay with the concurrency described there, so I will refine that PR to close #7265

Desel72 · 2026-03-30T16:38:41Z

@shawkins Do you mean my PR will be closed?

shawkins · 2026-03-31T01:46:42Z

@shawkins Do you mean my PR will be closed?

How about I squash the changes from other pr, then you cherry-pick that into this one and add whatever other tests seem appropriate?

Desel72 · 2026-03-31T02:17:51Z

Thanks @shawkins. I totally agree with you. I think this PR should be merged perfectly. I appreciate this.

manusa · 2026-03-31T05:24:13Z

Thanks for working on this @Desel72, the concurrency test you've put together is really valuable — it clearly reproduces the race condition from #7265.

However, after reviewing both this PR and the alternative approach in #7575, and considering the performance implications (@shawkins' benchmarks show a 6-10x regression with ReadWriteLock), I think we should go with the approach in #7575.

That said, the CacheImplConcurrencyTest you wrote is exactly what we need to validate the fix. Could you reduce this PR to just the reproducer test (removing the ReadWriteLock changes)? The test should be disabled (e.g., @Disabled("https://github.com/fabric8io/kubernetes-client/issues/7265")) since it will fail until #7575 lands the actual fix.

To summarize:

Keep CacheImplConcurrencyTest.java with @Disabled referencing Temporal resource miss on Informer index #7265
Revert changes to CacheImpl.java, ProcessorStore.java, and ProcessorStoreTest.java
Remove the CHANGELOG entry (the fix will come from fix: updating indexes in a more consistent manner #7575)

This way your contribution is preserved and provides the foundation for validating the actual fix. Thanks for your patience and persistence on this!

Desel72 · 2026-03-31T12:39:51Z

Hi @manusa I got it. Thanks

Revert ReadWriteLock changes to CacheImpl, ProcessorStore, and ProcessorStoreTest. Remove CHANGELOG entry. Keep CacheImplConcurrencyTest as a disabled reproducer for fabric8io#7265, to be enabled once fabric8io#7575 lands.

Desel72 · 2026-04-02T15:05:17Z

Hi @manusa @shawkins How are you? I've done. Is this right approach? Please check and let me know. Thanks for your feedback.

manusa

Thanks for splitting this out @Desel72, the reproducer test looks good overall.

One thing to address: doneLatch.await(30, TimeUnit.SECONDS) and executor.awaitTermination(5, TimeUnit.SECONDS) return values are not checked. If threads hang or deadlock, the test silently passes because missDetected defaults to false. This matters especially when the follow-up fix enables the test — a deadlock would go unnoticed.

Please assert completion, e.g.:

assertThat(doneLatch.await(30, TimeUnit.SECONDS))
    .as("All threads should complete within timeout")
    .isTrue();

Same for awaitTermination.

Check return values of doneLatch.await() and executor.awaitTermination() so that a deadlock or hung thread fails the test instead of silently passing.

Desel72 · 2026-04-02T15:24:58Z

Is this okay?

manusa

LGTM, thx!

Desel72 · 2026-04-02T21:21:04Z

Perfect!!! @manusa Is there any other issue more? If then, please let me know what should I do more? I really want to contribute.

Desel72 requested review from ash-thakur-rh, manusa and shawkins as code owners March 12, 2026 15:01

ash-thakur-rh requested changes Mar 17, 2026

View reviewed changes

Desel72 and others added 2 commits March 17, 2026 23:57

fix(informer): use ReadWriteLock in CacheImpl to prevent index read i…

8df2378

…nconsistency

fix: use specific import for ReentrantReadWriteLock in ProcessorStore…

86bfc2b

…Test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Desel72 force-pushed the fix/issue-#7265 branch from 884a5dd to 86bfc2b Compare March 17, 2026 23:01

ash-thakur-rh requested changes Mar 18, 2026

View reviewed changes

Desel72 and others added 2 commits March 18, 2026 16:32

Merge branch 'main' into fix/issue-#7265

27cd866

ash-thakur-rh approved these changes Mar 19, 2026

View reviewed changes

csviri approved these changes Mar 27, 2026

View reviewed changes

manusa mentioned this pull request Mar 31, 2026

fix: updating indexes in a more consistent manner #7575

Open

11 tasks

fix: reduce PR to concurrency reproducer test with @disabled

e38f732

Revert ReadWriteLock changes to CacheImpl, ProcessorStore, and ProcessorStoreTest. Remove CHANGELOG entry. Keep CacheImplConcurrencyTest as a disabled reproducer for fabric8io#7265, to be enabled once fabric8io#7575 lands.

Merge branch 'main' into fix/issue-#7265

0d381b7

manusa requested changes Apr 2, 2026

View reviewed changes

fix: assert thread completion in concurrency reproducer test

ac3b10c

Check return values of doneLatch.await() and executor.awaitTermination() so that a deadlock or hung thread fails the test instead of silently passing.

manusa approved these changes Apr 2, 2026

View reviewed changes

manusa added this to the 7.7.0 milestone Apr 2, 2026 — with automated-tasks

manusa merged commit 19d49dc into fabric8io:main Apr 2, 2026
18 of 19 checks passed

Conversation

Desel72 commented Mar 12, 2026 • edited by manusa Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Type of change

Checklist

Uh oh!

Desel72 commented Mar 12, 2026

Uh oh!

Desel72 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Desel72 commented Mar 14, 2026

Uh oh!

Desel72 commented Mar 15, 2026

Uh oh!

Desel72 commented Mar 17, 2026

Uh oh!

ash-thakur-rh Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

ash-thakur-rh Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

ash-thakur-rh Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Desel72 Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Desel72 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ash-thakur-rh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash-thakur-rh Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Desel72 Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash-thakur-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Desel72 commented Mar 19, 2026

Uh oh!

Desel72 commented Mar 19, 2026

Uh oh!

shawkins commented Mar 19, 2026

Uh oh!

Desel72 commented Mar 20, 2026

Uh oh!

shawkins commented Mar 20, 2026

Uh oh!

csviri commented Mar 20, 2026

Uh oh!

Desel72 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shawkins commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Desel72 commented Mar 23, 2026

Uh oh!

csviri commented Mar 24, 2026

Uh oh!

Desel72 commented Mar 25, 2026

Uh oh!

Desel72 commented Mar 27, 2026

Uh oh!

csviri left a comment

Choose a reason for hiding this comment

Uh oh!

Desel72 commented Mar 30, 2026

Uh oh!

Desel72 commented Mar 12, 2026 •

edited by manusa

Loading

Desel72 commented Mar 13, 2026 •

edited

Loading

Desel72 Mar 17, 2026 •

edited

Loading

Desel72 commented Mar 17, 2026 •

edited

Loading

ash-thakur-rh left a comment •

edited

Loading

Desel72 Mar 18, 2026 •

edited

Loading

Desel72 commented Mar 20, 2026 •

edited

Loading

shawkins commented Mar 20, 2026 •

edited

Loading

Desel72 commented Mar 30, 2026 •

edited

Loading

Desel72 commented Apr 2, 2026 •

edited

Loading