Skip to content

Conversation

@lalinsky
Copy link
Owner

@lalinsky lalinsky commented Sep 8, 2025

Summary

This PR implements PING/PONG keep-alive functionality for the NATS Zig client, following the same pattern as the C client.

Fixes #88

Key Features

  • Configurable ping intervals: ping_interval_ms (default: 2 minutes, 0 = disabled)
  • Stale connection detection: max_pings_out (default: 2 unanswered PINGs)
  • Lock-free implementation: Uses atomic operations for performance
  • Automatic reconnection: Triggers existing reconnection logic on stale connections

Implementation Details

  • Added ping_timer and atomic pings_out counter to Connection struct
  • Implemented checkAndSendPing() helper that sends PINGs at configured intervals
  • Updated readerIteration() to check for needed PINGs after each read
  • Reset pings_out counter on any PONG (flush or keep-alive proves connection alive)
  • Added StaleConnection error for when too many PINGs go unanswered

Design Decisions

  • Separate counters: Keep-alive PINGs (pings_out) vs flush PINGs (outgoing_pings/incoming_pongs)
  • Any PONG resets counter: Follows C client behavior - any server response proves connection health
  • Thread-safe: Integrates cleanly with existing threading model using atomic operations
  • Timer-based: Uses std.time.Timer.lap() for efficient interval tracking

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Warning

Rate limit exceeded

@lalinsky has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 52 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0764f6d and c697a02.

📒 Files selected for processing (1)
  • src/connection.zig (7 hunks)

Walkthrough

Adds timer-driven keep-alive PING/PONG to Connection: configurable ping interval and max outstanding pings, atomic outstanding-pings counter, StaleConnection error when limit exceeded, handshake now emits CONNECT+PING, and PONG resets the outstanding-pings counter. (≤50 words)

Changes

Cohort / File(s) Summary of edits
Keep-alive timers, counters, and handshake updates
src/connection.zig
Added PING/PONG keep-alive: timer trigger, pings_out atomic counter, StaleConnection error. Extended ConnectionOptions with ping_interval_ms and max_pings_out. Added ping_timer, checkAndSendPing(), processPing(), PONG handling, and handshake path to send CONNECT+PING.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Automatically send PINGs per ADR-40 (#88)

Possibly related PRs

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch push-yzspxkspllsp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/connection.zig (3)

185-187: Clarify/adjust max_pings_out=0 semantics.

As written, a value of 0 will trigger StaleConnection on the very first keep-alive PING. If the intent is “unlimited” (common pattern), gate the stale check only when max_pings_out > 0 and document it.

Would you like 0 to mean “unlimited outstanding keep-alive PINGs”? If yes, the diff in my checkAndSendPing comment below implements that.


230-233: Fields look right; reset the timer on (re)connect to avoid immediate PING.

Without a reset, a long-lived Connection instance can fire a keep-alive PING immediately during handshake. Suggest lapping the timer after establishing a socket.

Apply after the existing ping/pong counter reset:

@@
-        // Reset ping/pong counters for fresh connection
-        self.outgoing_pings = 0;
-        self.incoming_pongs = 0;
+        // Reset ping/pong counters for fresh connection
+        self.outgoing_pings = 0;
+        self.incoming_pongs = 0;
+        // Reset keep-alive timer so we don't fire immediately on a fresh connection
+        _ = self.ping_timer.lap();

1302-1304: Lap the timer on any PONG to avoid back-to-back PINGs after long stalls.

Prevents sending another keep-alive immediately after we just heard from the server.

-        // Reset keep-alive ping counter - ANY PONG proves connection is alive
-        self.pings_out.store(0, .monotonic);
+        // Reset keep-alive ping counter - ANY PONG proves connection is alive
+        self.pings_out.store(0, .monotonic);
+        _ = self.ping_timer.lap(); // restart the keep-alive interval
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfa53ab and 225d0fc.

📒 Files selected for processing (1)
  • src/connection.zig (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.zig

📄 CodeRabbit inference engine (CLAUDE.md)

Format code using zig fmt before commit

Files:

  • src/connection.zig
🔇 Additional comments (2)
src/connection.zig (2)

116-117: Adding StaleConnection to ConnectionError — good call.

Fits the reconnection flow and keeps the failure mode explicit.


268-269: Timer init: fine.

Starting the timer here is OK given the lap-on-(re)connect suggestion above.

- Add ping_interval_ms (default 2min) and max_pings_out (default 2) config options
- Add ping_timer and atomic pings_out counter to Connection struct  
- Implement checkAndSendPing() helper that sends PINGs at intervals
- Update readerIteration() to check for needed PINGs after each read
- Reset pings_out counter to 0 on any PONG (proves connection alive)
- Add StaleConnection error for when too many PINGs go unanswered
- Follows C client pattern with separate counters for keep-alive vs flush PINGs
- Lock-free implementation using atomic operations for performance
@lalinsky lalinsky merged commit 24ffb3d into main Sep 8, 2025
2 checks passed
@lalinsky lalinsky deleted the push-yzspxkspllsp branch September 8, 2025 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically send PINGs

1 participant