Skip to content

Conversation

@lalinsky
Copy link
Owner

@lalinsky lalinsky commented Sep 9, 2025

Summary

Implements comprehensive subscription drain support that allows graceful subscription shutdown while processing remaining messages.

Key Features

  • Event-driven completion using ResetEvent for efficient waiting without polling
  • Automatic detection when the last message is processed
  • Message blocking - prevents new messages once draining starts
  • Universal support - works with both sync and async subscriptions
  • Thread-safe implementation using existing atomic counters

API

  • sub.drain() - Start the draining process
  • sub.isDraining() / sub.isDrainComplete() - Status checks
  • sub.waitForDrainCompletion(timeout_ms) - Block until completion with timeout

Implementation Details

  • Leverages existing pending_msgs/pending_bytes counters from the base branch
  • Helper functions incrementPending/decrementPending are module-private standalone functions
  • ResetEvent is automatically set when pending message count reaches zero
  • Connection automatically rejects new messages for draining subscriptions
  • Clean separation: users only see public drain methods, internal details are hidden

Test Coverage

Comprehensive test suite with 7 test cases covering:

  • ✅ Immediate completion (empty subscription)
  • ✅ Pending message processing (sync subscriptions)
  • ✅ Async callback handling with processing delays
  • ✅ Message blocking verification (new messages dropped)
  • ✅ Timeout scenarios
  • ✅ Error cases (not draining, etc.)

Architecture Benefits

  • Zero overhead: No background threads needed
  • Event-driven: Drain completion detected exactly when it happens
  • Clean API: No implementation details exposed to users
  • Reuses infrastructure: Builds perfectly on pending message tracking

Based on the pending-messages-tracking branch which provides the foundational atomic counters for tracking message state through the entire pipeline.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Adds subscription draining state and APIs, centralizes pending-messages accounting in the subscription module, updates connection and dispatcher to use the centralized helpers and to drop messages for draining subscriptions, and introduces tests covering drain behaviors.

Changes

Cohort / File(s) Summary
Subscription drain + pending helpers
src/subscription.zig
Adds drain state (draining, drain_complete), public APIs drain, isDraining, isDrainComplete, waitForDrainCompletion, and public helpers incrementPending / decrementPending. Signals drain completion when last pending message is processed; nextMsg uses decrementPending.
Connection message handling
src/connection.zig
Drops incoming messages for subscriptions that are draining. Replaces direct pending_msgs/pending_bytes updates with subscription_mod.incrementPending on enqueue and subscription_mod.decrementPending on error/rollback paths.
Dispatcher pending decrement refactor
src/dispatcher.zig
Imports subscription.zig as subscription_mod and replaces direct atomic subtractions with subscription_mod.decrementPending when discarding pending DispatchMessages and after handler completion.
Tests: add drain coverage
tests/all_tests.zig, tests/drain_test.zig
Adds drain_test.zig and registers it in all_tests.zig. Tests cover sync/async drain semantics, message dropping during drain, completion signaling, timeouts, and error when not draining.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly conveys the primary purpose of the changeset—adding subscription drain functionality—without unnecessary detail or noise, making the main intent immediately clear to reviewers.
Description Check ✅ Passed The description directly outlines the new drain support features, public APIs, implementation approach, and test coverage, all of which align closely with the changes introduced in the pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6405c3 and 388cd9b.

📒 Files selected for processing (1)
  • tests/drain_test.zig (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/drain_test.zig
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch subscription-drain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lalinsky
Copy link
Owner Author

lalinsky commented Sep 9, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Base automatically changed from pending-messages-tracking to main September 9, 2025 20:26
Add comprehensive subscription drain support that allows graceful
subscription shutdown while processing remaining messages.

Key features:
- Event-driven completion using ResetEvent for efficient waiting
- Automatic detection when last message is processed
- Blocks new messages once draining starts
- Works with both sync and async subscriptions
- Thread-safe implementation using atomic counters

API:
- sub.drain() - Start draining process
- sub.isDraining() / sub.isDrainComplete() - Status checks
- sub.waitForDrainCompletion(timeout_ms) - Block until complete

Implementation details:
- Leverages existing pending_msgs/pending_bytes counters
- Helper functions incrementPending/decrementPending are module-private
- ResetEvent is automatically set when pending count reaches zero
- Connection rejects new messages for draining subscriptions

Comprehensive test coverage includes immediate completion, pending
message processing, async callback handling, message blocking,
timeout scenarios, and error cases.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/connection.zig (1)

1121-1124: Nit: fix log format specifier.

Use an explicit integer formatter for sid.

Apply:

-                    log.err("Async subscription {} has no assigned dispatcher", .{message.sid});
+                    log.err("Async subscription {d} has no assigned dispatcher", .{message.sid});
tests/drain_test.zig (3)

43-45: Avoid sleep-based flakiness; wait until pending arrives.

Replace fixed sleep with a bounded wait on pending count.

-    // Give messages time to arrive
-    std.time.sleep(10 * std.time.ns_per_ms);
+    // Wait (up to 1s) for both messages to be counted as pending
+    var waited: u64 = 0;
+    while (sub.pending_msgs.load(.acquire) < 2 and waited < 1000) : (waited += 5) {
+        std.time.sleep(5 * std.time.ns_per_ms);
+    }

46-48: Tests rely on internal pending counters; consider public-API-only assertions.

The PR summary says internals are hidden, yet tests use sub.pending_msgs directly. If that field is not part of the public surface, these tests will be brittle.

Options:

  • Keep using pending counters but mark them public/stable in Subscription for tests.
  • Or switch to API-based checks: consume with nextMsg(), assert isDraining()/isDrainComplete(), and optionally count messages processed to infer pending=0.

Also applies to: 61-65, 71-75, 169-171, 137-138


160-179: Optional: assert that post-drain publishes are truly undispatchable.

After consuming the original message, try fetching another with a short timeout to prove drops.

     // Consume the original message
     var msg = try sub.nextMsg(1000);
     defer msg.deinit();
     try std.testing.expect(std.mem.eql(u8, msg.data, "before drain"));

     // Should be complete
     try std.testing.expect(sub.isDrainComplete());
+    // And no extra messages should be retrievable
+    const maybe = sub.nextMsg(50);
+    try std.testing.expectError(error.Timeout, maybe);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d353e0 and a33b5ec.

📒 Files selected for processing (5)
  • src/connection.zig (3 hunks)
  • src/dispatcher.zig (3 hunks)
  • src/subscription.zig (4 hunks)
  • tests/all_tests.zig (1 hunks)
  • tests/drain_test.zig (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.zig

📄 CodeRabbit inference engine (CLAUDE.md)

Format code using zig fmt before commit

Files:

  • tests/all_tests.zig
  • tests/drain_test.zig
  • src/dispatcher.zig
  • src/connection.zig
  • src/subscription.zig
🧠 Learnings (4)
📚 Learning: 2025-08-29T10:22:22.247Z
Learnt from: CR
PR: lalinsky/nats.zig#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-29T10:22:22.247Z
Learning: Run `zig build test` to confirm your changes are working

Applied to files:

  • tests/all_tests.zig
📚 Learning: 2025-08-25T14:51:33.570Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#31
File: tests/jetstream_pull_test.zig:21-23
Timestamp: 2025-08-25T14:51:33.570Z
Learning: The nats.zig project uses a "tests:beforeEach" test in tests/all_tests.zig that runs before each test to clean up all JetStream streams and consumers, ensuring test isolation. This eliminates the need for unique test resource names as each test starts with a clean JetStream state.

Applied to files:

  • tests/all_tests.zig
  • tests/drain_test.zig
📚 Learning: 2025-09-09T19:51:02.156Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#108
File: src/dispatcher.zig:60-65
Timestamp: 2025-09-09T19:51:02.156Z
Learning: In the NATS Zig codebase dispatcher.zig, when decrementing pending_bytes (atomic u64) with message_data_len (usize) using fetchSub, explicit casting is not required as Zig handles the implicit conversion automatically and the code builds successfully.

Applied to files:

  • src/dispatcher.zig
  • src/connection.zig
📚 Learning: 2025-08-25T02:49:59.119Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#28
File: src/response_manager.zig:89-105
Timestamp: 2025-08-25T02:49:59.119Z
Learning: In this NATS Zig codebase, subscriptions are reference counted, so calling deinit() on a subscription from multiple locations (like both Connection and ResponseManager) is safe and doesn't cause double-free issues.

Applied to files:

  • src/dispatcher.zig
  • src/connection.zig
🔇 Additional comments (15)
src/subscription.zig (5)

61-63: LGTM! Clean drain state management.

The drain state fields are well-designed - using atomic bool for the draining flag and ResetEvent for efficient completion waiting without polling.


125-139: Good implementation of the drain method.

The drain logic correctly handles both immediate completion (when no pending messages) and deferred completion. The UNSUB is sent immediately to stop new messages from arriving.


149-164: Solid timeout implementation with proper error handling.

The implementation properly handles indefinite wait (timeout_ms == 0) and converts milliseconds to nanoseconds for the timed wait. Good error mapping for NotDraining and Timeout cases.


174-174: Correct integration with pending counter management.

Using the centralized decrementPending helper maintains consistency with the drain completion logic.


218-232: Thread-safe pending counter management with proper drain completion signaling.

The implementation correctly uses atomic operations with appropriate memory ordering. The drain completion check (remaining_msgs == 1) is correct - it triggers when decrementing from 1 to 0. The logging provides good visibility into drain operations.

src/dispatcher.zig (3)

17-18: Clean module aliasing for the subscription module.

Using a module alias keeps the code readable while providing access to the new helper functions.


64-64: Consistent use of centralized pending counter management.

Good to see the dispatcher using the same decrementPending helper for dropped messages during cleanup.


147-147: Proper decrement after handler completion.

The pending counter is correctly decremented after the handler completes, whether it succeeds or fails. This ensures accurate tracking for drain completion.

tests/all_tests.zig (1)

11-11: Test suite properly updated.

Adding the drain tests to the test suite ensures the new functionality gets tested along with everything else.

src/connection.zig (4)

1098-1103: Correct place to drop messages for draining subs.

Early return before touching pending counters avoids churn and races. Looks good.


1105-1105: Centralized pending increment is fine.

Calling incrementPending once at receipt keeps semantics consistent across sync/async. No issues.


1114-1117: Good: undo pending on enqueue failure.

This keeps the drain-completion signal accurate.


1134-1142: Good: decrement on sync enqueue errors.

All error paths restore counters; drain completion won’t get stuck.

tests/drain_test.zig (2)

21-26: LGTM: immediate drain completes fast.

Asserts match the API.


205-207: LGTM: NotDraining error is validated.

Matches API contract.

… state manipulation

- Replace direct sub.draining.store() with proper test scenario
- Publish message, start drain without consuming, then expect timeout
- Addresses CodeRabbit review feedback about bypassing invariants

Co-authored-by: Lukáš Lalinský <[email protected]>
Replace sleep-based timing with bounded wait on pending count to eliminate test flakiness. Add assertion to verify dropped messages are truly undispatchable after drain completion.
@lalinsky lalinsky merged commit 1a56e05 into main Sep 9, 2025
2 checks passed
@lalinsky lalinsky deleted the subscription-drain branch September 9, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant