-
Notifications
You must be signed in to change notification settings - Fork 1
Add thread-safe Queue(T) for message handling #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement generic Queue(T) with ArrayListUnmanaged backend - Thread-safe with mutex and condition variable - Support for blocking/non-blocking pop operations with timeout - Automatic memory compaction for efficiency - Replace LinearFifo usage in subscriptions with new Queue - Use std.time.Timer for accurate elapsed time tracking - Close queue during cleanup to prevent new messages - Comprehensive test coverage including multithreaded scenarios
WalkthroughReplaces per-subscription LinearFifo and explicit mutexes with a new generic, thread-safe Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Conn as Connection.processMsg
participant Sub as Subscription
participant Q as Queue(*Message)
participant Consumer as Subscription.nextMsg
Conn->>Q: push(message)
note right of Q #e8f4ff: (lock) append/rotate/grow, signal waiters
alt QueueClosed
Q-->>Conn: QueueClosed (drop message)
else PushError
Q-->>Conn: error (log, drop message)
else Success
Q-->>Conn: ok
end
Consumer->>Q: pop(timeout_ms)
alt item available
Q-->>Consumer: *Message
else timeout or closed
Q-->>Consumer: null
end
rect rgba(230,240,255,0.45)
Sub->>Q: close()
Q-->>Consumer: wake waiters (closed)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/connection.zig (1)
847-853: Handle queue push errors locally to avoid parser teardown and to prevent message leaksPushing to the queue can fail (Allocator.Error or error.QueueClosed). Letting this bubble up via
tryturns into a parser error inreaderLoop, which triggers a reconnect and leaksmessage(no deinit on the error path). Handle errors in-place: drop the message with a log and return success fromprocessMsg.Apply this diff:
if (s.handler) |handler| { // Execute callback without holding locks handler.call(message); } else { // Sync subscription - queue message - try s.messages.push(message); + s.messages.push(message) catch |err| { + switch (err) { + error.QueueClosed => { + // Subscription is closing/closed; drop gracefully. + log.debug("Queue closed for sid {d}; dropping message", .{ msg_arg.sid }); + message.deinit(); + return; + }, + else => { + // Allocation or unexpected push failure; drop but do not tear down the connection. + log.err("Failed to enqueue message for sid {d}: {}", .{ msg_arg.sid, err }); + message.deinit(); + return; + }, + } + }; }
🧹 Nitpick comments (7)
src/connection.zig (1)
844-851: Confirm async handler owns message lifetime (no deinit here)For the async path,
handler.call(message)doesn't deinitmessage. Ensure handlers are responsible for destroyingMessageinstances (and that this contract is documented), otherwise this will leak.src/subscription.zig (1)
79-85: Close-then-drain is safe; verify intended semantics for pending itemsClosing the queue wakes waiting consumers and prevents new pushes, then you drain and deinit pending messages. This means callers blocked in
nextMsg()will getnull(even if items remain) after close. If that's the intended unsubscribe semantics (drop remaining), this is correct. If you want pending items to be retrievable post-close, we’d need to adjustQueue.pop.src/queue.zig (5)
78-117: Clarify pop semantics on close (pending items vs early-null)
pop()returnsnullwhenclosed == trueeven if items are still queued. That’s a valid design, but callers must know they may lose pending items after close. If you instead want “no new pushes, but still drainable,” keep the loop running until the queue is empty, then returnnullon close.If you want “drain after close,” you could adjust the loop condition:
- while (!self.closed) { + while (true) { if (self.head < self.items.items.len) { const item = self.items.items[self.head]; self.head += 1; // compaction... return item; } - const elapsed_ns = timer.read(); - if (elapsed_ns >= timeout_ns) { - return null; - } - const remaining_ns = timeout_ns - elapsed_ns; - self.condition.timedWait(&self.mutex, remaining_ns) catch {}; + if (self.closed) return null; + const elapsed_ns = timer.read(); + if (elapsed_ns >= timeout_ns) return null; + const remaining_ns = timeout_ns - elapsed_ns; + self.condition.timedWait(&self.mutex, remaining_ns) catch {}; }
119-129: len/isEmpty: minor simplification and consistencyGiven internal invariants,
self.items.items.len >= self.headshould always hold. You can simplifylen()and makeisEmpty()calllen()to avoid divergence (micro-style).- pub fn len(self: *Self) usize { + pub fn len(self: *Self) usize { self.mutex.lock(); defer self.mutex.unlock(); - return if (self.items.items.len >= self.head) self.items.items.len - self.head else 0; + return self.items.items.len - self.head; } pub fn isEmpty(self: *Self) bool { - self.mutex.lock(); - defer self.mutex.unlock(); - return self.head >= self.items.items.len; + return self.len() == 0; }
141-152: Expose a concrete error set on API for discoverabilityYou defined
pub const QueueError = error{ QueueClosed } || Allocator.Error, butpushreturns an inferred error set. Consider annotating signatures withQueueErrorso callsites can import and match it without guessing.- pub fn push(self: *Self, item: T) !void { + pub fn push(self: *Self, item: T) QueueError!void { ... }
201-221: Reduce flakiness in timeout testAsserting
elapsed_ms >= 45on a 50ms wait can be flaky under load. Consider widening tolerance or asserting upper bounds instead (e.g., 30–200ms) to accommodate scheduler variability.
243-261: Add a regression test for “push when head > 0 at capacity”The rotate bug wouldn’t be caught by current tests. Add a test that fills to capacity, pops a prefix, then pushes again to trigger compaction-before-growth and verify no duplicates.
Proposed test to append to this file:
test "Queue push compacts when head > 0 at capacity (no duplication)" { const testing = std.testing; var gpa = std.heap.GeneralPurposeAllocator(.{}){}; defer _ = gpa.deinit(); const allocator = gpa.allocator(); var q = Queue(i32).init(allocator); defer q.deinit(); // Fill to initial capacity (8) for (0..8) |i| try q.push(@intCast(i)); // Pop a few to advance head try testing.expectEqual(@as(?i32, 0), q.tryPop()); try testing.expectEqual(@as(?i32, 1), q.tryPop()); try testing.expectEqual(@as(?i32, 2), q.tryPop()); try testing.expectEqual(@as(usize, 5), q.len()); // Push enough to require space; should compact, not rotate duplicates for (8..12) |i| try q.push(@intCast(i)); // We should see the remaining original items [3..7], then [8..11] for (3..12) |i| try testing.expectEqual(@as(?i32, @intCast(i)), q.tryPop()); try testing.expect(q.isEmpty()); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/connection.zig(1 hunks)src/queue.zig(1 hunks)src/subscription.zig(6 hunks)
🔇 Additional comments (3)
src/subscription.zig (2)
5-5: *Switch to Queue(Message) looks goodReplacing LinearFifo with
Queue(*Message)simplifies synchronization and localizes concurrency concerns. Init sites are consistent for sync and async. LGTM.Also applies to: 27-27, 41-41, 53-53
93-95: nextMsg: simple, race-free wrapperDelegating to
messages.pop(timeout_ms)is concise and removes manual polling. The return type?*Messagealigns withQueue.pop. LGTM.src/queue.zig (1)
66-74: Compaction logic on pop/tryPop is correct and uses copy-forwards with shrinkThe compaction threshold and copy/shrink pattern are sound and avoid quadratic behavior. Once the push-side rotate bug is fixed, the structure is consistent across push and pop paths.
Also applies to: 95-103
|
@claude see the review comments, are they relevant? what's the impact and what fixes would then need? |
|
Claude finished @lalinsky's task —— View job Analysis of CodeRabbit Review CommentsTodo List
SummaryThe CodeRabbit review identified 1 critical bug and several important issues. Here's my assessment: 🚨 CRITICAL: Data Corruption Bug in queue.zig:34-47Issue: The Impact:
Root Cause: When rotating consumed elements back into the active slice, the array length isn't adjusted, making previously consumed items visible again. Fix Priority: IMMEDIATE - This breaks core functionality.
|
|
@claude fix the issue |
|
Claude finished @lalinsky's task —— View job Fixed Critical Issues ✅I've successfully fixed the critical issues identified in the CodeRabbit review: Todo List
Changes Made🚨 Critical Fix: Data Corruption Bug
🧪 Regression Test Added
Test ResultsAll unit tests pass (32/32), including the new regression test. The build completes successfully with no compilation errors. Commit: The queue implementation is now safe for production use. The core threading and synchronization logic was already sound - these fixes address the critical data integrity and error handling issues. |
- Replace buggy std.mem.rotate logic with proper compaction in Queue.push() to prevent message duplication when head > 0 at capacity - Add proper error handling for queue push failures in connection.zig to prevent connection teardown and memory leaks - Add regression test for the compaction scenario Co-authored-by: Lukáš Lalinský <[email protected]>
Keep graceful handling for QueueClosed but let allocation and other unexpected push errors bubble up to trigger connection recovery as intended. Co-authored-by: Lukáš Lalinský <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/connection.zig (1)
852-867: Non-QueueClosed push errors now bubble up to tear down connection — LGTMThis matches the prior request to return on non-QueueClosed errors. Line 864 returning
errensures the error propagates throughprocessMsg→parser.parse→readerLoopcatch, which triggerstriggerReconnect(err). Memory ownership is handled correctly:message.deinit()on both error paths, anddefer s.release()guarantees the subscription ref is released. Good fix.To double-check that this indeed results in a reconnect, verify that parser propagates callback errors and that
readerLoopcatches them:#!/bin/bash # 1) Confirm parser.parse returns an error union (so it can propagate callback errors) rg -n -C2 -g 'src/**/*.zig' -P 'fn\s+parse\([^)]*\)\s*!' # 2) Confirm parse errors are caught in readerLoop and lead to triggerReconnect rg -n -C4 'parser\.parse\(' src/connection.zig # 3) Optional: confirm the Queue push error set matches the handled cases rg -n -C2 -g 'src/**/*.zig' -P 'pub\s+fn\s+push\([^)]*\)\s*!'
🧹 Nitpick comments (1)
src/connection.zig (1)
845-846: Avoid logging full payload in hot path to reduce log volumeSuggest logging size (and maybe a prefix) instead of the whole payload to keep debug logs useful under load.
- log.debug("Delivering message to subscription {d}: {s}", .{ msg_arg.sid, message.data }); + log.debug("Delivering message to subscription {d}: data_len={d}", .{ msg_arg.sid, message.data.len }); + // If you still want a peek, cap it: + // const preview_len = @min(message.data.len, 128); + // log.debug("Payload preview (first {} bytes): {s}", .{ preview_len, message.data[0..preview_len] });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/connection.zig(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
Extract duplicated compaction code from push(), tryPop(), and pop() methods into a private compact() helper. This improves code maintainability and reduces duplication. Co-authored-by: Lukáš Lalinský <[email protected]>
Summary
Queue(T)withArrayListUnmanagedbackend and thread-safe operationsstd.fifo.LinearFifousage in subscriptions (removed in Zig 0.15)std.time.Timerfor accurate elapsed time trackingKey Features
tryPop()(non-blocking),pop(timeout_ms)(blocking with timeout)TChanges
src/queue.zig- Thread-safe queue implementation with comprehensive testssrc/subscription.zig- UseQueue(*Message)instead ofLinearFifosrc/connection.zig- Use queue'spush()method for message deliveryTest Plan
The queue is ready for use in Connection.messages and provides a robust, idiomatic Zig solution for thread-safe message queuing.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests