-
Notifications
You must be signed in to change notification settings - Fork 1
Fix ping/pong flow after reconnection #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Uncommented the failing test in tests/reconnection_test.zig - Fixed reader/flusher threads not being restarted after reconnection - Both threads are now properly restarted when reconnection succeeds - All tests now pass including the reconnection test Co-authored-by: Lukáš Lalinský <[email protected]>
|
Claude finished @lalinsky's task —— View job 🔍 PR Review CompleteReview Summary: This PR successfully fixes the ping/pong flow after reconnection by addressing the root cause - missing thread restarts during reconnection. ✅ Code Quality and Best PracticesThe fix follows good practices:
🔧 Technical AnalysisProblem solved:
Solution implemented:
🐛 Bug AnalysisNo new bugs introduced:
Edge case consideration: ⚡ Performance Considerations
🔒 Security Assessment
✅ Test CoverageExcellent test validation:
📋 Minor Observations
🎯 Recommendation✅ APPROVE - This is a solid fix that:
The fix is minimal, targeted, and restores the ping/pong functionality that's critical for detecting connection health in NATS. |
| // Restart reader thread for the new connection | ||
| self.reader_thread = std.Thread.spawn(.{}, readerLoop, .{self}) catch |err| { | ||
| log.err("Failed to restart reader thread: {}", .{err}); | ||
| self.triggerReconnect(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@claude Can we safely do this? Wouldn't it trigger infinite recursion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude finished @lalinsky's task —— View job
Analyzing Infinite Recursion Concern
Good question! Let me analyze the code flow to determine if calling self.triggerReconnect(err) could cause infinite recursion.
Todo List:
- Read src/connection.zig to understand the reconnection flow
- Examine the
triggerReconnectmethod implementation - Trace through the call stack to identify potential recursion
- Provide technical analysis and recommendation
Analysis of Call Stack:
Looking at line 1085, the concern is about this call:
self.reader_thread = std.Thread.spawn(.{}, readerLoop, .{self}) catch |err| {
log.err("Failed to restart reader thread: {}", .{err});
self.triggerReconnect(err); // ← Line 1085
continue; // Try next server
};Potential recursion path:
- doReconnect() (line 983) spawns reader thread
- Reader thread spawn fails → calls
triggerReconnect(err)(line 1085) - triggerReconnect() (line 932) spawns new reconnect thread
- New doReconnect() tries to spawn reader thread again
- If spawn keeps failing → infinite recursion
Key Safety Mechanisms:
✅ SAFE - No infinite recursion possible due to:
-
in_reconnectguard (lines 939-942):if (!self.options.reconnect.allow_reconnect or self.status != .connected or self.in_reconnect > 0) // ← Prevents multiple reconnection threads { return; // Already reconnecting }
-
State protection: When
triggerReconnect()is called from line 1085:- We're already IN the
doReconnect()function self.in_reconnectis already> 0(set at line 980)- The second
triggerReconnect()call will immediately return at line 941
- We're already IN the
-
Thread cleanup: The existing reconnection thread continues with
continueand tries the next server
Flow Analysis:
doReconnect() starts → in_reconnect = 1
├─ Try to spawn reader_thread
├─ FAIL → triggerReconnect(err)
│ └─ Checks in_reconnect > 0 → RETURNS IMMEDIATELY ✅
└─ continue → try next server
Conclusion:
The code is safe - infinite recursion is prevented by the in_reconnect counter. When triggerReconnect() is called from within doReconnect(), it will detect that a reconnection is already in progress and return immediately.
The current approach correctly handles thread spawn failures by:
- Logging the error
- Attempting to trigger reconnection (which safely no-ops)
- Continuing to try the next server in the pool
Fixes #2
The ping/pong flow was stopping after reconnection because the reader and flusher threads were not being restarted. This caused PONG responses from the server to never be processed, leading to flush timeouts.
Changes
Generated with Claude Code