Skip to content

Conversation

@hank95179
Copy link
Contributor

Description

This PR fixes the flaky test test pubsub numsub tracked in #4924.

Similar to other PubSub tests in Cluster mode, PUBSUB NUMSUB relies on metadata propagation across nodes. The test was failing intermittently because it checked the subscription count on a different client immediately after subscription. Due to network latency or Gossip delays, the node often reported 0 subscribers instead of the expected count.

Changes

  • Replaced the immediate assertion with a polling loop.
  • The test now continuously checks pubsubNumSub results until the returned subscriber counts match the expected values (or times out), allowing time for propagation.

Verification

  • Reproduction: I used a script to flood the cluster bus with noise, forcing propagation delays which consistently reproduced the Expected: x, Received: 0 failure.
  • Fix Verification: Verified that the test stabilizes and passes with the retry logic under the same stressed environment.

Related Issue

Fixes #4924

@hank95179 hank95179 requested a review from a team as a code owner November 21, 2025 05:57
@hank95179 hank95179 force-pushed the fix/node-flaky-pubsub-numsub-4924 branch from 8ed0968 to 04124e6 Compare November 21, 2025 07:32
@hank95179 hank95179 force-pushed the fix/node-flaky-pubsub-numsub-4924 branch from 04124e6 to 31766ce Compare December 17, 2025 00:38
@hank95179 hank95179 force-pushed the fix/node-flaky-pubsub-numsub-4924 branch from 31766ce to c38a289 Compare December 17, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Node][Flaky Test] PubSub › test pubsub numsub_true

2 participants