fix(informer): use ReadWriteLock in CacheImpl to prevent index read inconsistency#7558
fix(informer): use ReadWriteLock in CacheImpl to prevent index read inconsistency#7558manusa merged 7 commits intofabric8io:mainfrom
Conversation
|
@manusa Could you review this PR. I've added the test. |
|
Hi @manusa Could you help me so that I can test? I can't find what the issue is. Thanks |
|
HI @manusa Could you review this PR please? |
|
Hi @manusa I've updated. Could you please review this? |
|
Hi @manusa @ash-thakur-rh @shawkins |
| knative.dev/eventing-couchdb v0.28.0 | ||
| knative.dev/eventing-github v0.46.3 | ||
| knative.dev/eventing-gitlab v0.46.3 | ||
| knative.dev/eventing-gitlab v0.48.0 |
There was a problem hiding this comment.
@Desel72 Please revert the changes from go mod file. These changes are out of scope for this fix.
| <jackson.bundle.version.annotations>2.21</jackson.bundle.version.annotations> | ||
| <jetty.version>11.0.26</jetty.version> | ||
| <maven-core.version>3.9.13</maven-core.version> | ||
| <maven-core.version>3.9.14</maven-core.version> |
There was a problem hiding this comment.
Same here too, revert the changes for dependency updates. These are also out of scope for this fix. Any specific reason for which you have done these changes?
|
|
||
| Mockito.doAnswer(invocation -> { | ||
| assertTrue(Thread.holdsLock(podCache.getLockObject())); | ||
| assertTrue(((java.util.concurrent.locks.ReentrantReadWriteLock) podCache.getLock()).isWriteLockedByCurrentThread()); |
There was a problem hiding this comment.
minor request: use specific import instead of FQN import.
There was a problem hiding this comment.
@ash-thakur-rh Thanks for your feedback. I will check soon.
…Test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
884a5dd to
86bfc2b
Compare
|
Hi @ash-thakur-rh @manusa I've solved. Can you please review this? |
| private static final String LABEL_INDEX = "label-index"; | ||
|
|
||
| @Test | ||
| void byIndexShouldNeverMissObjectDuringConcurrentUpdates() throws InterruptedException { |
There was a problem hiding this comment.
these two tests are nearly identical, the tests can be parameterized tests or can share a helper method.
There was a problem hiding this comment.
Hi @ash-thakur-rh Thanks for your feedback. I've done. Can you please review this?
- Replaced two nearly-identical test methods with a single parameterized test using @valuesource and a shared helper method - Added CHANGELOG entry for issue fabric8io#7265 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @ash-thakur-rh |
|
The changes look good if we're ok with introducing more locking. It was an intentional choice to initially keep the implementation as lock free as possible. If we want to remain as lock free as possible, we should use something like #7575 - the read of an index will still not be fully consistent but there won't be the problem described in #7265 The test is a slippery slope the number of iterations here are small, but we'll generally want to avoid trying to test concurrency this way in unit tests. |
@shawkins Do you mean this should be updated? If then, can you please let me know what I should do? I will never give up and solve it. |
We just need to have concensus on what direction to go. @manusa @csviri @metacosm @ash-thakur-rh do you want to keep the minimally locking behavior via the draft shown #7575 Or go with fully consistent index reads via a read/write lock? I vaguely remember some user complaints the old fully locking behavior, and don't know if that would have been satisfied with a read/write lock (obviously that should depend on how frequent the events are). |
|
Not sure how feasible is to measure the performance degradation. But without having there consistency in those index made it quite hard to reason about, so basically rarely missing resources from index pretty much made it unusable for josdk internal purposes; therefore we ended up having our own index. |
Adapting the test included here a little (jdk 25, 8000000 iterations, 8 readers, 1 writer - because that matches our usage - writing indefinitely with 2 ms between writes): this pr - ~ 6 - 8.5 seconds per test Under this scenario a readwrite lock seems to perform worse than full synchronization - that probably relates to the cost of the indexing function and how many values it returns - this example is simple so the index function cost is minimal. Since we have no control over what users may be doing with those functions, if we want full consistency it's best to stick with a read/write lock. #7575 - ~ .4 - 1.1 seconds per test
Sorry I didn't pay attention to this before. To double check, are you good to just address the ephemeral removal, or do you want fully consistent reads from the indexes? edit: to elaborate the current state of #7575:
@csviri I believe this level of consistency is good enough for your usage - events aren't emitted until after the CacheImpl put call completes, at which point the indexes are up-to-date. If that is correct, then we should proceed with #7575. If not, then this PR is good. |
|
Hi @csviri I think this PR can be merged. Could you please review this? |
|
Hi @csviri, how are you doing? Are you busy now? |
|
Hi @csviri how are you? Are you busy now? |
|
Hi @shawkins @manusa @ash-thakur-rh I think this can be merged. Please check this. thanks @csviri |
|
@Desel72 see #7575 (comment) I believe that @csviri is okay with the concurrency described there, so I will refine that PR to close #7265 |
|
@shawkins Do you mean my PR will be closed? |
How about I squash the changes from other pr, then you cherry-pick that into this one and add whatever other tests seem appropriate? |
|
Thanks @shawkins. I totally agree with you. I think this PR should be merged perfectly. I appreciate this. |
|
Thanks for working on this @Desel72, the concurrency test you've put together is really valuable — it clearly reproduces the race condition from #7265. However, after reviewing both this PR and the alternative approach in #7575, and considering the performance implications (@shawkins' benchmarks show a 6-10x regression with That said, the To summarize:
This way your contribution is preserved and provides the foundation for validating the actual fix. Thanks for your patience and persistence on this! |
|
Hi @manusa I got it. Thanks |
Revert ReadWriteLock changes to CacheImpl, ProcessorStore, and ProcessorStoreTest. Remove CHANGELOG entry. Keep CacheImplConcurrencyTest as a disabled reproducer for fabric8io#7265, to be enabled once fabric8io#7575 lands.
manusa
left a comment
There was a problem hiding this comment.
Thanks for splitting this out @Desel72, the reproducer test looks good overall.
One thing to address: doneLatch.await(30, TimeUnit.SECONDS) and executor.awaitTermination(5, TimeUnit.SECONDS) return values are not checked. If threads hang or deadlock, the test silently passes because missDetected defaults to false. This matters especially when the follow-up fix enables the test — a deadlock would go unnoticed.
Please assert completion, e.g.:
assertThat(doneLatch.await(30, TimeUnit.SECONDS))
.as("All threads should complete within timeout")
.isTrue();Same for awaitTermination.
Check return values of doneLatch.await() and executor.awaitTermination() so that a deadlock or hung thread fails the test instead of silently passing.
|
Is this okay? |
|
Perfect!!! @manusa Is there any other issue more? If then, please let me know what should I do more? I really want to contribute. |
Description
Adds a disabled concurrency test that reproduces the race condition described in #7265.
CacheImpl.updateIndices()performs a two-step operation when updating an object's index entry: first removing the old entry, then adding the new one. While write methods (put(),remove()) aresynchronized, read methods (byIndex(),indexKeys()) are not, allowing concurrent readers to observe the intermediate state where an item has been removed but not yet re-added.This test verifies that index reads never observe partially-updated state during concurrent writes. It is
@Disableduntil a follow-up PR provides the fix.Changes
CacheImplConcurrencyTest.java: New concurrent test that reproduces the race condition (4 writer + 8 reader threads)Type of change
test, version modification, documentation, etc.)
Checklist