Implement subscription drain functionality #109

lalinsky · 2025-09-09T20:06:10Z

Summary

Implements comprehensive subscription drain support that allows graceful subscription shutdown while processing remaining messages.

Key Features

Event-driven completion using ResetEvent for efficient waiting without polling
Automatic detection when the last message is processed
Message blocking - prevents new messages once draining starts
Universal support - works with both sync and async subscriptions
Thread-safe implementation using existing atomic counters

API

sub.drain() - Start the draining process
sub.isDraining() / sub.isDrainComplete() - Status checks
sub.waitForDrainCompletion(timeout_ms) - Block until completion with timeout

Implementation Details

Leverages existing pending_msgs/pending_bytes counters from the base branch
Helper functions incrementPending/decrementPending are module-private standalone functions
ResetEvent is automatically set when pending message count reaches zero
Connection automatically rejects new messages for draining subscriptions
Clean separation: users only see public drain methods, internal details are hidden

Test Coverage

Comprehensive test suite with 7 test cases covering:

✅ Immediate completion (empty subscription)
✅ Pending message processing (sync subscriptions)
✅ Async callback handling with processing delays
✅ Message blocking verification (new messages dropped)
✅ Timeout scenarios
✅ Error cases (not draining, etc.)

Architecture Benefits

Zero overhead: No background threads needed
Event-driven: Drain completion detected exactly when it happens
Clean API: No implementation details exposed to users
Reuses infrastructure: Builds perfectly on pending message tracking

Based on the pending-messages-tracking branch which provides the foundational atomic counters for tracking message state through the entire pipeline.

coderabbitai · 2025-09-09T20:06:16Z

Walkthrough

Adds subscription draining state and APIs, centralizes pending-messages accounting in the subscription module, updates connection and dispatcher to use the centralized helpers and to drop messages for draining subscriptions, and introduces tests covering drain behaviors.

Changes

Cohort / File(s)	Summary
Subscription drain + pending helpers `src/subscription.zig`	Adds drain state (`draining`, `drain_complete`), public APIs `drain`, `isDraining`, `isDrainComplete`, `waitForDrainCompletion`, and public helpers `incrementPending` / `decrementPending`. Signals drain completion when last pending message is processed; `nextMsg` uses `decrementPending`.
Connection message handling `src/connection.zig`	Drops incoming messages for subscriptions that are draining. Replaces direct `pending_msgs`/`pending_bytes` updates with `subscription_mod.incrementPending` on enqueue and `subscription_mod.decrementPending` on error/rollback paths.
Dispatcher pending decrement refactor `src/dispatcher.zig`	Imports `subscription.zig` as `subscription_mod` and replaces direct atomic subtractions with `subscription_mod.decrementPending` when discarding pending DispatchMessages and after handler completion.
Tests: add drain coverage `tests/all_tests.zig`, `tests/drain_test.zig`	Adds `drain_test.zig` and registers it in `all_tests.zig`. Tests cover sync/async drain semantics, message dropping during drain, completion signaling, timeouts, and error when not draining.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Change nextMsg() to return error.Timeout instead of null #83 — Touches nextMsg and subscription internals; relevant because nextMsg now delegates pending updates and drain semantics.
Add pending message and byte counters to subscriptions #108 — Modifies per-subscription pending_msgs/pending_bytes accounting; this change centralizes that logic and builds on the same area.
Add thread-safe Queue(T) for message handling #12 — Changes message enqueue/dispatch paths; related because connection/dispatcher enqueue and drop behavior were adjusted here.

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly conveys the primary purpose of the changeset—adding subscription drain functionality—without unnecessary detail or noise, making the main intent immediately clear to reviewers.
Description Check	✅ Passed	The description directly outlines the new drain support features, public APIs, implementation approach, and test coverage, all of which align closely with the changes introduced in the pull request.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6405c3 and 388cd9b.

📒 Files selected for processing (1)

tests/drain_test.zig (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/drain_test.zig

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch subscription-drain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

lalinsky · 2025-09-09T20:07:08Z

@coderabbitai review

coderabbitai · 2025-09-09T20:07:14Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Add comprehensive subscription drain support that allows graceful subscription shutdown while processing remaining messages. Key features: - Event-driven completion using ResetEvent for efficient waiting - Automatic detection when last message is processed - Blocks new messages once draining starts - Works with both sync and async subscriptions - Thread-safe implementation using atomic counters API: - sub.drain() - Start draining process - sub.isDraining() / sub.isDrainComplete() - Status checks - sub.waitForDrainCompletion(timeout_ms) - Block until complete Implementation details: - Leverages existing pending_msgs/pending_bytes counters - Helper functions incrementPending/decrementPending are module-private - ResetEvent is automatically set when pending count reaches zero - Connection rejects new messages for draining subscriptions Comprehensive test coverage includes immediate completion, pending message processing, async callback handling, message blocking, timeout scenarios, and error cases.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

src/connection.zig (1)
1121-1124: Nit: fix log format specifier.

Use an explicit integer formatter for sid.

Apply:
-                    log.err("Async subscription {} has no assigned dispatcher", .{message.sid});
+                    log.err("Async subscription {d} has no assigned dispatcher", .{message.sid});
tests/drain_test.zig (3)
43-45: Avoid sleep-based flakiness; wait until pending arrives.

Replace fixed sleep with a bounded wait on pending count.
-    // Give messages time to arrive
-    std.time.sleep(10 * std.time.ns_per_ms);
+    // Wait (up to 1s) for both messages to be counted as pending
+    var waited: u64 = 0;
+    while (sub.pending_msgs.load(.acquire) < 2 and waited < 1000) : (waited += 5) {
+        std.time.sleep(5 * std.time.ns_per_ms);
+    }
46-48: Tests rely on internal pending counters; consider public-API-only assertions.

The PR summary says internals are hidden, yet tests use sub.pending_msgs directly. If that field is not part of the public surface, these tests will be brittle.

Options:

Keep using pending counters but mark them public/stable in Subscription for tests.

Or switch to API-based checks: consume with nextMsg(), assert isDraining()/isDrainComplete(), and optionally count messages processed to infer pending=0.

Also applies to: 61-65, 71-75, 169-171, 137-138

160-179: Optional: assert that post-drain publishes are truly undispatchable.

After consuming the original message, try fetching another with a short timeout to prove drops.
     // Consume the original message
     var msg = try sub.nextMsg(1000);
     defer msg.deinit();
     try std.testing.expect(std.mem.eql(u8, msg.data, "before drain"));

     // Should be complete
     try std.testing.expect(sub.isDrainComplete());
+    // And no extra messages should be retrievable
+    const maybe = sub.nextMsg(50);
+    try std.testing.expectError(error.Timeout, maybe);

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d353e0 and a33b5ec.

📒 Files selected for processing (5)

src/connection.zig (3 hunks)
src/dispatcher.zig (3 hunks)
src/subscription.zig (4 hunks)
tests/all_tests.zig (1 hunks)
tests/drain_test.zig (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.zig

📄 CodeRabbit inference engine (CLAUDE.md)

Format code using zig fmt before commit

Files:

tests/all_tests.zig
tests/drain_test.zig
src/dispatcher.zig
src/connection.zig
src/subscription.zig

🧠 Learnings (4)

📚 Learning: 2025-08-29T10:22:22.247Z

Learnt from: CR
PR: lalinsky/nats.zig#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-29T10:22:22.247Z
Learning: Run `zig build test` to confirm your changes are working

Applied to files:

tests/all_tests.zig

📚 Learning: 2025-08-25T14:51:33.570Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#31
File: tests/jetstream_pull_test.zig:21-23
Timestamp: 2025-08-25T14:51:33.570Z
Learning: The nats.zig project uses a "tests:beforeEach" test in tests/all_tests.zig that runs before each test to clean up all JetStream streams and consumers, ensuring test isolation. This eliminates the need for unique test resource names as each test starts with a clean JetStream state.

Applied to files:

tests/all_tests.zig
tests/drain_test.zig

📚 Learning: 2025-09-09T19:51:02.156Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#108
File: src/dispatcher.zig:60-65
Timestamp: 2025-09-09T19:51:02.156Z
Learning: In the NATS Zig codebase dispatcher.zig, when decrementing pending_bytes (atomic u64) with message_data_len (usize) using fetchSub, explicit casting is not required as Zig handles the implicit conversion automatically and the code builds successfully.

Applied to files:

src/dispatcher.zig
src/connection.zig

📚 Learning: 2025-08-25T02:49:59.119Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#28
File: src/response_manager.zig:89-105
Timestamp: 2025-08-25T02:49:59.119Z
Learning: In this NATS Zig codebase, subscriptions are reference counted, so calling deinit() on a subscription from multiple locations (like both Connection and ResponseManager) is safe and doesn't cause double-free issues.

Applied to files:

src/dispatcher.zig
src/connection.zig

🔇 Additional comments (15)

src/subscription.zig (5)

61-63: LGTM! Clean drain state management.

The drain state fields are well-designed - using atomic bool for the draining flag and ResetEvent for efficient completion waiting without polling.

125-139: Good implementation of the drain method.

The drain logic correctly handles both immediate completion (when no pending messages) and deferred completion. The UNSUB is sent immediately to stop new messages from arriving.

149-164: Solid timeout implementation with proper error handling.

The implementation properly handles indefinite wait (timeout_ms == 0) and converts milliseconds to nanoseconds for the timed wait. Good error mapping for NotDraining and Timeout cases.

174-174: Correct integration with pending counter management.

Using the centralized decrementPending helper maintains consistency with the drain completion logic.

218-232: Thread-safe pending counter management with proper drain completion signaling.

The implementation correctly uses atomic operations with appropriate memory ordering. The drain completion check (remaining_msgs == 1) is correct - it triggers when decrementing from 1 to 0. The logging provides good visibility into drain operations.

src/dispatcher.zig (3)

17-18: Clean module aliasing for the subscription module.

Using a module alias keeps the code readable while providing access to the new helper functions.

64-64: Consistent use of centralized pending counter management.

Good to see the dispatcher using the same decrementPending helper for dropped messages during cleanup.

147-147: Proper decrement after handler completion.

The pending counter is correctly decremented after the handler completes, whether it succeeds or fails. This ensures accurate tracking for drain completion.

tests/all_tests.zig (1)

11-11: Test suite properly updated.

Adding the drain tests to the test suite ensures the new functionality gets tested along with everything else.

src/connection.zig (4)

1098-1103: Correct place to drop messages for draining subs.

Early return before touching pending counters avoids churn and races. Looks good.

1105-1105: Centralized pending increment is fine.

Calling incrementPending once at receipt keeps semantics consistent across sync/async. No issues.

1114-1117: Good: undo pending on enqueue failure.

This keeps the drain-completion signal accurate.

1134-1142: Good: decrement on sync enqueue errors.

All error paths restore counters; drain completion won’t get stuck.

tests/drain_test.zig (2)

21-26: LGTM: immediate drain completes fast.

Asserts match the API.

205-207: LGTM: NotDraining error is validated.

Matches API contract.

tests/drain_test.zig

… state manipulation - Replace direct sub.draining.store() with proper test scenario - Publish message, start drain without consuming, then expect timeout - Addresses CodeRabbit review feedback about bypassing invariants Co-authored-by: Lukáš Lalinský <[email protected]>

Replace sleep-based timing with bounded wait on pending count to eliminate test flakiness. Add assertion to verify dropped messages are truly undispatchable after drain completion.

Base automatically changed from pending-messages-tracking to main September 9, 2025 20:26

lalinsky added 2 commits September 9, 2025 22:27

Format code with zig fmt

a33b5ec

lalinsky force-pushed the subscription-drain branch from 3e2f7e4 to a33b5ec Compare September 9, 2025 20:27

coderabbitai bot requested changes Sep 9, 2025

View reviewed changes

tests/drain_test.zig Show resolved Hide resolved

coderabbitai bot approved these changes Sep 9, 2025

View reviewed changes

Improve drain test reliability and robustness

388cd9b

Replace sleep-based timing with bounded wait on pending count to eliminate test flakiness. Add assertion to verify dropped messages are truly undispatchable after drain completion.

lalinsky merged commit 1a56e05 into main Sep 9, 2025
2 checks passed

lalinsky deleted the subscription-drain branch September 9, 2025 20:58

This was referenced Sep 10, 2025

Fix subscription reference counting with proper ownership separation #111

Merged

Implement connection-level draining functionality #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement subscription drain functionality #109

Implement subscription drain functionality #109

lalinsky commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

lalinsky commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement subscription drain functionality #109

Implement subscription drain functionality #109

Conversation

lalinsky commented Sep 9, 2025

Summary

Key Features

API

Implementation Details

Test Coverage

Architecture Benefits

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks (3 passed)

Uh oh!

lalinsky commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Sep 9, 2025 •

edited

Loading