Skip to content

Conversation

@lalinsky
Copy link
Owner

Problem

The subscription reference counting had a fundamental flaw:

  • Subscriptions started with only 1 reference shared between connection and user
  • unsubscribe() would immediately destroy the subscription even if user code still held a pointer
  • This could cause use-after-free bugs if user code tried to access the subscription after unsubscribing

Solution

Implemented proper two-reference ownership model:

  1. Connection reference: For hashmap storage in the connection
  2. User reference: For the returned pointer to user code

Key Changes

  • subscription.zig:

    • create() now calls sub.retain() to add user reference (starts with 2 refs total)
    • deinit() now unsubscribes from server AND releases user reference
  • connection.zig:

    • unsubscribe() only releases connection reference (with clarifying comment)
  • response_manager.zig:

    • Fixed cleanup to call sub.deinit() instead of just setting to null
  • jetstream.zig:

    • Fixed PullSubscription cleanup to call sub.deinit()

Benefits

  • Thread-safe: Atomic reference counting prevents race conditions
  • Memory-safe: No use-after-free bugs, guaranteed proper cleanup
  • Clean separation: Unsubscribing vs destroying are now separate concepts
  • Standard pattern: Follows same design as C++ std::shared_ptr

Testing

  • All 115 tests pass ✅
  • Zero memory leaks detected ✅
  • Verified with multiple test runs including request-reply and JetStream functionality

This fix ensures users can safely hold subscription pointers after unsubscribing, while maintaining proper cleanup when all references are released.

Problem:
- Subscriptions had only 1 reference shared between connection and user
- unsubscribe() would destroy subscription immediately, causing potential
  use-after-free if user code still held a pointer to it

Solution:
- Subscriptions now start with 2 references:
  * Connection reference (for hashmap storage)
  * User reference (for returned pointer)
- subscription.deinit() unsubscribes AND releases user reference
- connection.unsubscribe() only releases connection reference
- Subscription is destroyed only when all references are released

Changes:
- subscription.zig: Add extra retain() in create(), update deinit()
- connection.zig: Add comment clarifying reference release in unsubscribe()
- response_manager.zig: Fix cleanup to call sub.deinit() instead of null
- jetstream.zig: Fix PullSubscription cleanup to call sub.deinit()

This follows standard reference counting patterns like std::shared_ptr
and ensures thread-safe, memory-safe subscription lifecycle management.

All tests pass with zero memory leaks.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adjusts subscription lifecycle and cleanup: adds retain/release handling in Subscription, switches certain callers to call sub.deinit() directly, and explicitly deinitializes a response mux subscription during shutdown. Also adds a comment in Connection.unsubscribe without behavior change.

Changes

Cohort / File(s) Summary
Subscription lifecycle (retain/release and deinit)
src/subscription.zig
create(): adds sub.retain() to hold a user reference; deinit(): unsubscribes then calls self.release(), with doc comment clarifying ownership/references.
Response manager explicit teardown
src/response_manager.zig
deinit(): if resp_mux exists, call sub.deinit() and null it; rest of shutdown (closed flag, broadcast, pending cleanup) unchanged.
JetStream pull subscription cleanup
src/jetstream.zig
PullSubscription.deinit: replace nc.unsubscribe(inbox_subscription) with inbox_subscription.deinit(); other cleanup (consumer_info, inbox_prefix, destroy) unchanged.
Connection comment-only change
src/connection.zig
Add a comment before sub.release() in unsubscribe(); no behavioral changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly captures the central change—fixing subscription reference counting by introducing distinct connection and user ownership—which directly reflects the core modifications in subscription.zig and related components without extraneous detail.
Description Check ✅ Passed The description clearly outlines the problem with a single shared reference, explains the two-reference ownership solution, and lists the specific file-level changes alongside testing outcomes, all of which align closely with the actual code modifications.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be22829 and 423d14a.

📒 Files selected for processing (4)
  • src/connection.zig (1 hunks)
  • src/jetstream.zig (1 hunks)
  • src/response_manager.zig (1 hunks)
  • src/subscription.zig (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: lalinsky
PR: lalinsky/nats.zig#28
File: src/response_manager.zig:89-105
Timestamp: 2025-08-25T02:49:59.119Z
Learning: In this NATS Zig codebase, subscriptions are reference counted, so calling deinit() on a subscription from multiple locations (like both Connection and ResponseManager) is safe and doesn't cause double-free issues.
📚 Learning: 2025-08-25T02:49:59.119Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#28
File: src/response_manager.zig:89-105
Timestamp: 2025-08-25T02:49:59.119Z
Learning: In this NATS Zig codebase, subscriptions are reference counted, so calling deinit() on a subscription from multiple locations (like both Connection and ResponseManager) is safe and doesn't cause double-free issues.

Applied to files:

  • src/response_manager.zig
  • src/connection.zig
  • src/subscription.zig
  • src/jetstream.zig
🔇 Additional comments (5)
src/connection.zig (1)

779-781: Comment matches new ownership semantics.

Releasing only the connection’s reference here is correct; destruction is gated by the user ref (via deinit()).

src/response_manager.zig (1)

83-87: Good: deinit() unsubscribes under lock and clears resp_mux.

Holding pending_mutex while calling sub.deinit() avoids races with response delivery and aligns with the two-ref model.

src/subscription.zig (2)

85-91: Correct: establish two references at creation.

The extra retain() sets up connection + user ownership cleanly.


95-101: deinit() semantics are right for the split-ownership model.

Unsubscribing then releasing the user ref ensures finalization only after the connection ref is gone.

src/jetstream.zig (1)

416-416: JetStream PullSubscription cleanup uses subscription.deinit(): good.

This now releases both UNSUB and the user ref; matches the model.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-subscription-reference-counting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lalinsky lalinsky merged commit 0b36954 into main Sep 10, 2025
3 checks passed
@lalinsky lalinsky deleted the fix-subscription-reference-counting branch September 10, 2025 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant