Fix flaky test TestClusterJoinAndReconnect/TestTLSConnection again (#3278) #4635
      
        
          +22
        
        
          −7
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Repeating my commit message for convenience:
Two commits have already merged in order to address the flakiness of this test.
However, I can still reproduce the issue using:
An easy way to increase the failure rate is to increase CPU load, e.g.,
On my machine the combination of these commands fails every time.
The underlying reason for the failure is that the test only waits for
p2to be ready, but this does not reflect whetherphas updated its memberlist. We can ensure thatphas updated its memberlist by waiting forNotifyJointo be called. The test is now slightly slower, 0.8 seconds on my machine.Side-note
I am new to the project and feedback is very appreciated. It was hard to find something that avoids spin looping and does not change the API of
Peer. Also, theWaitReadyandSettlecalls are redundant now, since we are actually waiting forNotifyJoin. But leaving them in does not hurt either.Fixes #3287