Conversation
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
|
|
||
| // Verify node count | ||
| assertThat(clusterInfo.getKnownNodes()).isEqualTo(EXPECTED_TOTAL_NODES); | ||
| // Wait for all cluster nodes to be available |
There was a problem hiding this comment.
i dont like retrying in the tests - we need to understand why does it happen...
There was a problem hiding this comment.
I'm not sure of the underlying reason, but when checking getKnownNodes it reports a higher count than it should sometimes (5 instead of 4). It seems that the node count is temporarily inconsistent while the cluster is initializing. A small sleep or retry eventually settles on the expected number of nodes (4).
| assertThat(c1Collector.poll(5, TimeUnit.SECONDS)).isNotNull(); | ||
| assertThat(c2Collector.poll(100, TimeUnit.MILLISECONDS)).isNull(); | ||
| // Wait for active subscription to receive message | ||
| await().atMost(Duration.ofSeconds(5)) |
There was a problem hiding this comment.
lets not use .untilAsserted in the test unless we understand the root cause
There was a problem hiding this comment.
It takes time for the message to be propagated and for the disposed subscriber to stop listening, which is why they originally had a 0.5s sleep. But that is not enough sometimes and the disposed subscriber still gets the message when it should not. A larger sleep or retry allows the subscription cleanup to complete before the message is sent.
| } | ||
| } | ||
|
|
||
| jedis.close(); |
There was a problem hiding this comment.
why is this code problematic - should not the jedis object be explicitly closed?
There was a problem hiding this comment.
This issue does not happen as often, but seems to be caused by the connection pool closing the connection prematurely sometimes so it fails when jedis.close() is explicitly called on an already closed connection. The try-with-resources closes the connection without throwing an error in case it was already closed.
|
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
Most failing tests seem to be timing issues that only occur in our CI. I replaced untilAsserted with a sleep (or just increased the existing sleep) to simplify things. |
dfb1eab to
0354deb
Compare
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
0354deb to
7486d83
Compare
Signed-off-by: Jeremy Parr-Pearson <jeremy.parr-pearson@improving.com>
Summary
Updates the following flaky tests that sometimes fail:
ValkeyGlideClusterConnectionCommandsIntegrationTests.testClusterGetClusterInfo:278(expects 4, but gets 5)ReactiveValkeyMessageListenerContainerIntegrationTests.multipleListenShouldTrackSubscriptions(expects null, but gets message)DefaultHyperLogLogOperationsIntegrationTests.sizeShouldCountValuesCorrectly:96(expects 3, but gets 2)ClusterSlotHashUtilsTests.localCalculationShouldMatchServers:60 » JedisConnection Unexpected end of stream.(connection is stale/already closed)Note there is another change in #23 to fix the Maven cache, This may help with network-related flaky tests:
Could not transfer artifact org.jetbrains.kotlin:kotlin-compiler:jar:1.9.25 from/to centralgzip: stdin: not in gzip formatwhen getting engine binaryCloses #26.
Testing
Tests pass locally and in CI a couple times (although these tests failed intermittently, so will need to observe over time).