changes after review by marcomq · Pull Request #2 · marcomq/mq-bridge

marcomq · 2025-12-16T11:32:58Z

Summary by CodeRabbit

New Features
- Middleware: deduplication, metrics and DLQ with configurable retries.
- Batch send/receive with per-item partial-failure handling and bulk commit semantics.
- Endpoint options: persistence toggle, delayed-ack, and dual-mode NATS (Core vs JetStream).
- Consolidated performance runner that aggregates direct test results and prints summaries.
Bug Fixes
- Improved shutdown and error propagation; credentials removed from logs.
Refactor
- Payloads use an efficient bytes representation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-16T11:33:08Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds batch send/receive APIs across traits and endpoints, converts payloads to bytes::Bytes, introduces a middleware module (Deduplication, Metrics, DLQ) with orchestration, refactors ack/persistence flags for NATS/AMQP/Kafka, and changes route shutdown/error flow to per-iteration internal channels.

Changes

Cohort / File(s)	Summary
Manifest / Features `Cargo.toml`	Added `rand`, `bytes` (with serde), dev-dep `uuid`; made `metrics` optional; expanded `full` feature set to include `metrics` and `dedup`; adjusted feature wiring; removed dotenv.
Core traits / API `src/traits.rs`	Introduced `BatchCommitFunc` (renamed), `into_commit_func`/`into_batch_commit_func`; added `receive_batch` and `send_batch` helpers; single-item APIs route through batch helpers; removed `ConsumerError`/`ChannelClosed`.
Canonical Message `src/canonical_message.rs`	`CanonicalMessage.payload` changed from `Vec<u8>` to `bytes::Bytes`; constructors updated accordingly.
Middleware module & orchestration `src/lib.rs`, `src/middleware/mod.rs`, `src/middleware/*`	New `middleware` module; `apply_middlewares_to_consumer` / `apply_middlewares_to_publisher` wrap boxed consumers/publishers with configured middlewares (Deduplication, Metrics, DLQ) using feature gating.
Deduplication middleware `src/middleware/deduplication.rs`	New `DeduplicationConsumer` using sled DB for message-id TTL dedupe, optional background cleanup, adapts to batch receive via wrapper.
Metrics middleware `src/middleware/metrics.rs`	New `MetricsPublisher` and `MetricsConsumer` capturing durations and counters for single and batch operations.
DLQ middleware `src/middleware/dlq.rs`	New `DlqPublisher` wrapping primary + DLQ publishers with configurable exponential-backoff retries for single and batch failures.
Models & Endpoint representation `src/models.rs`	`middlewares` changed from `Middlewares` -> `Vec<Middleware>`; custom `Deserialize` for `Endpoint`; `DeadLetterQueueMiddleware` now holds an `Endpoint` and `dlq_retry_attempts`; added default helpers and new config flags (`delayed_ack`, `no_jetstream`, `no_persistence`, etc.).
Endpoint factories & wiring `src/endpoints/mod.rs`	`create_*_from_route` now accept `&Endpoint`; added private `create_base_consumer`/`create_base_publisher` returning boxed trait objects; middleware applied to base objects; public wrappers retained.
Memory endpoint `src/endpoints/memory.rs`, `tests/...`	Switched to batch APIs: `send_batch` and `receive_batch`; internal buffering reworked; tests updated for batch semantics.
NATS endpoint `src/endpoints/nats.rs`	Introduced `NatsClient` and `NatsSubscription` enums to multiplex Core vs JetStream; publisher/consumer refactored for both modes; added `delayed_ack` and batch support; header normalization helpers added.
AMQP endpoint `src/endpoints/amqp.rs`	Added `no_persistence` and `delayed_ack` fields; queue durability follows `no_persistence`; metadata keys normalized to `ShortString`; added `send_batch` and `receive_batch`; sanitized connection logging.
Kafka endpoint `src/endpoints/kafka.rs`	`await_ack` → `delayed_ack`; payload handling adjusted to slices; added `send_batch` and `receive_batch`; consumer/publisher cleanup adjusted.
File endpoint `src/endpoints/file.rs`	Added `send_batch` and `receive_batch`; writes use BufWriter and `Bytes`; reads use `read_until(b'\n')`; batch commit semantics supported.
MongoDB endpoint `src/endpoints/mongodb.rs`	Added `send_batch` and `receive_batch`; payload conversion adapted for BSON; new helpers for claim/ack and batch delete on commit.
HTTP / MQTT / Static endpoints `src/endpoints/http.rs`, `src/endpoints/mqtt.rs`, `src/endpoints/static_endpoint.rs`	Added `send_batch` and `receive_batch` wrappers (single→batch conversions); MQTT eventloop handle renamed and event loop aborted on Drop.
Route / Shutdown / Error flow `src/route.rs`	Introduced per-iteration internal shutdown channels; inner tasks receive per-call shutdown; switched to `send_batch` in workers; partial-batch failures handled (DLQ/retry or escalate) and errors signaled to teardown.
Tests & helpers `tests/...`, `tests/integration/*`, `tests/integration/common.rs`	Tests migrated to batch APIs; performance tests consolidated into `run_direct_perf_test` returning `PerformanceResult`; added helpers (format_pretty, batch-based measure/read/write) and aggregated result reporting.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Supervisor as Route Supervisor
    participant Inner as Inner Route Task
    participant Worker as Worker
    participant Publisher as Publisher (wrapped)
    participant DLQ as DLQ Publisher

    Supervisor->>Inner: spawn inner task (per-iteration shutdown chan)
    Inner->>Worker: dispatch batch for processing
    Worker->>Publisher: send_batch(messages)
    alt primary all succeed
        Publisher-->>Worker: Ok((maybe_responses, []))
        Worker-->>Inner: commit successes
    else primary partial failures
        Publisher-->>Worker: Ok((maybe_responses, failed_msgs))
        Worker->>DLQ: send_batch(failed_msgs) with retries
        alt DLQ succeeds
            DLQ-->>Worker: Ok
            Worker-->>Inner: commit successes, report partial failure
        else DLQ exhausted
            DLQ-->>Worker: Err
            Worker-->>Inner: escalate error -> teardown/restart
        end
    else primary total failure
        Publisher-->>Worker: Err
        Worker-->>Inner: escalate error -> teardown/restart
    end
    Supervisor->>Inner: external shutdown
    Inner->>Worker: forward internal shutdown, await completion
    Inner-->>Supervisor: inner finished

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay special attention to:
- src/traits.rs for batch API defaults and helper correctness.
- src/middleware/mod.rs for middleware ordering, ownership and conditional compilation.
- src/middleware/dlq.rs for retry/backoff correctness and error composition in batch paths.
- src/middleware/deduplication.rs for sled usage, TTL pruning safety, and background task behavior.
- src/canonical_message.rs and endpoints that handle payloads (mongodb, file, kafka, nats, amqp) for Bytes conversions and zero-copy correctness.
- src/endpoints/nats.rs and src/endpoints/amqp.rs for dual-path behavior, ack/persistence flags, and sanitized logging.

Possibly related PRs

changes after review #2 — Appears to introduce the same batch API, middleware wiring, and endpoint refactors; strong code-level overlap.
Initial #1 — Initial large refactor touching traits, endpoints, canonical_message and middleware; probably the root/previous iteration of these changes.

Poem

🐇 I hopped through crates and features bright,
I wrapped your pubs with metrics' light,
I hid credentials out of sight,
I pruned dupes by moonlit night,
I nudged batches onward — middleware delight! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.62% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'changes after review' is vague and generic, using non-descriptive language that fails to convey the specific or main changes in this substantial pull request.	Replace with a more specific title that captures the primary change, such as 'Implement batch message processing with middleware support' or 'Add batch API and middleware layer for message processing'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77c7ce5 and 00619c9.

📒 Files selected for processing (1)

src/endpoints/nats.rs (5 hunks)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/route.rs (1)
199-201: Missing break when work channel is closed causes potential message loss.

If work_tx.send() fails (workers are gone), the loop continues receiving messages that cannot be processed, leading to message loss. The code should break or return an error.
 if work_tx.send((messages, commit)).await.is_err() {
     warn!("Work channel closed, cannot process more messages concurrently. Shutting down.");
+    break;
 }

🧹 Nitpick comments (1)

src/route.rs (1)
83-89: Potentially inverted logic for internal shutdown channel handling.

When shutdown_rx is None, the code uses internal_shutdown_rx. However, a freshly created channel is not closed, so shutdown_rx.is_closed() returns false, causing _shutdown_tx to be None. This drops internal_shutdown_tx immediately, closing the channel.

Currently this doesn't cause issues since run() always passes Some(internal_shutdown_rx), but if run_until_err is called directly with None, the internal channel would be closed immediately.

Consider simplifying or fixing the logic:
 let (internal_shutdown_tx, internal_shutdown_rx) = bounded(1);
-let shutdown_rx = shutdown_rx.unwrap_or(internal_shutdown_rx);
-let _shutdown_tx = if shutdown_rx.is_closed() {
-    Some(internal_shutdown_tx)
-} else {
-    None
-};
+let (shutdown_rx, _shutdown_tx) = match shutdown_rx {
+    Some(rx) => (rx, None),
+    None => (internal_shutdown_rx, Some(internal_shutdown_tx)),
+};

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6357336 and dcf3314.

📒 Files selected for processing (6)

Cargo.toml (1 hunks)
src/endpoints/amqp.rs (1 hunks)
src/endpoints/nats.rs (2 hunks)
src/models.rs (1 hunks)
src/route.rs (5 hunks)
tests/integration/mqtt.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/route.rs (2)

src/traits.rs (1)

send (52-52)

src/endpoints/memory.rs (1)

send (119-130)

🔇 Additional comments (8)

src/endpoints/amqp.rs (1)

153-154: Good security improvement - credentials no longer logged.

This change correctly avoids logging conn_uri after credentials have been embedded (lines 142-149). The original config.url logged at line 139 is safe as it's the base URL before credential injection.

tests/integration/mqtt.rs (1)

24-24: LGTM - YAML syntax correction.

The capacity placeholder {out_capacity} is correctly formatted and properly substituted via .replace() at lines 30-32 and 42-44.

src/models.rs (1)

7-7: LGTM - unused import removed.

Removing the unused Deref import improves code hygiene with no behavioral change.

src/endpoints/nats.rs (2)

118-118: Consistent with publisher changes.

The consumer's stream configuration mirrors the publisher's subject configuration at line 43, ensuring both sides agree on the stream's subject filters.

41-43: Good fix to ensure explicit subject capture.

Adding the explicit subject alongside the wildcard pattern ensures messages are captured even when the subject doesn't match the {stream_name}.> pattern. Multiple subjects can be specified and edited over time in NATS stream configurations, and both publisher and consumer implementations are updated consistently. If the subject already matches the wildcard, having both patterns in the subjects array is benign—NATS JetStream handles multiple subject patterns per stream without issue.

Cargo.toml (1)

83-83: LGTM - dev-dependency addition for test UUID generation.

The uuid crate version 1.19.0 with v4 feature is valid, and this dev-dependency addition ensures uuid is available during tests regardless of feature flags. Version consistency should be confirmed against the runtime dependency.

src/route.rs (2)

125-128: LGTM!

Error propagation for publisher failures is correct. This ensures the outer loop's reconnect logic is triggered appropriately.

32-50: Well-designed per-iteration shutdown channel.

The pattern of creating a fresh internal shutdown channel per iteration effectively prevents races between the outer loop and inner task consuming the same shutdown signal. The explicit forwarding and await ensures graceful termination.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/endpoints/mod.rs (1)
46-62: Redundant logic in NATS stream resolution.

Line 52 contains redundant logic: cfg.stream.as_deref().or(cfg.stream.as_deref()) checks the same field twice. The error message at line 55 mentions default_stream, suggesting the original intent may have been to check an alternative field.

If there's supposed to be a default_stream field, apply this diff:
-            let stream_name = cfg
-                .stream
-                .as_deref()
-                .or(cfg.stream.as_deref())
-                .ok_or_else(|| {
+            let stream_name = cfg.stream.as_deref().ok_or_else(|| {
Otherwise, simply remove the redundant .or() clause.

♻️ Duplicate comments (1)

src/route.rs (1)
217-226: End-of-stream handling remains inconsistent between sequential and concurrent runners.

This issue was flagged in a previous review. The return semantics still differ:

run_sequentially: returns Ok(false) on end-of-stream (line 113) → triggers graceful shutdown

run_concurrently: returns Ok(!shutdown_rx.is_empty()) (line 225) → always Ok(true) after break

When the consumer ends (line 204-206 break), shutdown_rx remains empty, so Ok(true) is returned, which falls through to _ => {} in the outer match (line 67) and causes reconnection instead of graceful shutdown.

Apply a fix similar to run_sequentially:
         // --- Graceful Shutdown ---
         drop(work_tx);
         for handle in worker_handles {
             let _ = handle.await;
         }
-        // Return true if we should continue (i.e., we were stopped by the running flag), false otherwise.
-        Ok(!shutdown_rx.is_empty())
+        // Return false for graceful end-of-stream exit, true only if shutdown was signaled.
+        Ok(shutdown_rx.is_empty())
Note: Also verify the outer match at line 67 handles Ok(true) correctly if it should trigger reconnection.

🧹 Nitpick comments (9)

tests/memory_test.rs (1)
10-11: Minor doc comment inconsistency.

The doc comment says "commas as thousand separators" but the implementation uses underscores (line 22). Consider updating the comment to match.
-/// Formats a number with commas as thousand separators.
+/// Formats a number with underscores as thousand separators.
src/models.rs (1)
120-122: Remove empty where clause.

The empty where clause is syntactically valid but serves no purpose and may confuse readers.
-fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String>
-where
-{
+fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String> {
src/traits.rs (1)
62-69: Error details are discarded when messages fail.

The implementation captures failed messages but discards the actual error. This makes debugging difficult - callers won't know why a message failed, only that it did.

Consider preserving error information:
-    ) -> anyhow::Result<(Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>)> {
+    ) -> anyhow::Result<(Option<Vec<CanonicalMessage>>, Vec<(CanonicalMessage, anyhow::Error)>)> {
         let mut responses = Vec::new();
-        let mut failed_messages = Vec::new();
+        let mut failed_messages: Vec<(CanonicalMessage, anyhow::Error)> = Vec::new();
         for msg in messages {
             match self.send(msg.clone()).await {
                 Ok(Some(resp)) => responses.push(resp),
                 Ok(None) => {}
-                Err(_) => {
-                    failed_messages.push(msg);
+                Err(e) => {
+                    failed_messages.push((msg, e));
                 }
             }
         }
This is a design consideration - if error details aren't needed for DLQ/retry logic, the current approach is acceptable.
src/middleware/dlq.rs (1)
101-131: Unnecessary clone of entire message batch on line 105.

The messages.clone() on line 105 clones the entire batch upfront, even when all messages succeed and no DLQ handling is needed. This is wasteful for the common success path.

Consider deferring the clone until it's actually needed:
     async fn send_bulk(
         &self,
         messages: Vec<CanonicalMessage>,
     ) -> anyhow::Result<(Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>)> {
-        match self.inner.send_bulk(messages.clone()).await {
+        match self.inner.send_bulk(messages).await {
             Ok((responses, failed)) if failed.is_empty() => Ok((responses, failed)),
             Ok((_responses, failed)) => {
                 // ... DLQ handling
             }
             Err(e) => Err(e),
         }
     }
Since you only need the failed messages for DLQ (which are already returned in the failed vector), the original messages clone appears unnecessary.
src/middleware/metrics.rs (2)
42-45: Repeated string clones for metric labels on every operation.

Each metric call clones route_name and endpoint_direction strings. For high-throughput scenarios, consider using static labels or caching the metric handles.

One approach is to pre-create labeled metrics in the constructor:
pub struct MetricsPublisher {
    inner: Box<dyn MessagePublisher>,
    messages_counter: metrics::Counter,
    duration_histogram: metrics::Histogram,
}

impl MetricsPublisher {
    pub fn new(...) -> Self {
        let messages_counter = metrics::counter!(
            "queue_messages_processed_total",
            "route" => route_name.to_string(),
            "endpoint" => endpoint_direction.to_string()
        );
        // ...
    }
}
This is a minor optimization and can be deferred.

68-71: Consider adding failure metrics for observability.

The comment acknowledges that error counting could be added. For production observability, tracking failure rates alongside success rates is valuable for alerting and debugging.

Would you like me to help add failure counter metrics for both single and bulk operations?
src/route.rs (1)
83-84: Clarify the intent of the unused shutdown sender.

The _internal_shutdown_tx is created but never used when shutdown_rx is None. This works correctly (the route runs until error/end-of-stream), but a brief comment would clarify intent.
     let (_internal_shutdown_tx, internal_shutdown_rx) = bounded(1);
+    // When no external shutdown_rx is provided, we create a dummy channel.
+    // The sender is never used, so shutdown can only occur via error/end-of-stream.
     let shutdown_rx = shutdown_rx.unwrap_or(internal_shutdown_rx);
src/endpoints/mod.rs (2)
102-107: Remove or condense verbose implementation comments.

The multi-line comment block reads like in-progress implementation notes rather than production documentation. Since the implementation is complete and follows the same pattern as create_consumer_from_route, these comments can be safely removed.

Apply this diff:
-    // This function was partially refactored. It should create a base publisher
-    // and then apply middlewares, similar to `create_consumer_from_route`.
-    // However, since the middleware application logic already exists in `middleware/mod.rs`,
-    // we can simply call it after creating the base publisher.
-    // For now, let's fix the immediate compilation errors by creating the base publisher
-    // and wrapping it correctly. The middleware logic can be added back cleanly.
     let publisher = create_base_publisher(route_name, &endpoint.endpoint_type).await?;
116-161: Simplify redundant type casts and unusual error handling pattern.

Two optional improvements:

Redundant explicit type casts (lines 120, 125, 130, 135, 137, 144, 146, 147, 151): The as Box<dyn MessagePublisher> casts are unnecessary since the function return type already specifies Box<dyn MessagePublisher> and trait object coercion happens automatically.

Unusual ?; pattern (lines 159-160): Binding the match result to a variable, then applying ?, then wrapping in Ok() can be simplified by directly returning the match expression.

Apply this diff to simplify:
-    let publisher = match endpoint_type {
+    match endpoint_type {
         #[cfg(feature = "kafka")]
         EndpointType::Kafka(cfg) => {
             let topic = cfg.topic.as_deref().unwrap_or(route_name);
-            Ok(Box::new(kafka::KafkaPublisher::new(&cfg.config, topic).await?) as Box<dyn MessagePublisher>)
+            Ok(Box::new(kafka::KafkaPublisher::new(&cfg.config, topic).await?))
         }
         #[cfg(feature = "nats")]
         EndpointType::Nats(cfg) => {
             let subject = cfg.subject.as_deref().unwrap_or(route_name);
-            Ok(Box::new(nats::NatsPublisher::new(&cfg.config, subject, cfg.stream.as_deref()).await?) as Box<dyn MessagePublisher>)
+            Ok(Box::new(nats::NatsPublisher::new(&cfg.config, subject, cfg.stream.as_deref()).await?))
         }
         // ... similar changes for remaining arms ...
         #[allow(unreachable_patterns)]
         _ => Err(anyhow!(
             "[route:{}] Unsupported publisher endpoint type",
             route_name
         )),
-    }
-    ?;
-    Ok(publisher)
+    }
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcf3314 and fd78502.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

Cargo.toml (3 hunks)
src/endpoints/memory.rs (1 hunks)
src/endpoints/mod.rs (2 hunks)
src/lib.rs (1 hunks)
src/middleware/deduplication.rs (1 hunks)
src/middleware/dlq.rs (1 hunks)
src/middleware/metrics.rs (1 hunks)
src/middleware/mod.rs (1 hunks)
src/models.rs (11 hunks)
src/route.rs (7 hunks)
src/traits.rs (1 hunks)
tests/memory_test.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Cargo.toml

🧰 Additional context used

🧬 Code graph analysis (7)

src/middleware/dlq.rs (2)

src/endpoints/mod.rs (1)

create_publisher_from_route (98-110)

src/traits.rs (2)

send (52-52)

send_bulk (56-77)

src/middleware/mod.rs (4)

src/middleware/dlq.rs (1)

new (23-37)

src/middleware/metrics.rs (2)

new (21-32)

new (88-99)

src/models.rs (1)

new (103-108)

src/middleware/deduplication.rs (1)

new (23-38)

src/traits.rs (1)

src/canonical_message.rs (1)

new (18-24)

src/middleware/deduplication.rs (3)

src/middleware/dlq.rs (2)

new (23-37)

as_any (133-135)

src/middleware/metrics.rs (4)

new (21-32)

new (88-99)

as_any (76-78)

as_any (133-135)

src/models.rs (1)

new (103-108)

src/route.rs (1)

src/endpoints/mod.rs (2)

create_publisher_from_route (98-110)

create_consumer_from_route (28-34)

src/models.rs (1)

src/middleware/dlq.rs (1)

new (23-37)

src/endpoints/mod.rs (2)

src/middleware/mod.rs (2)

apply_middlewares_to_consumer (27-51)

apply_middlewares_to_publisher (57-80)

src/models.rs (1)

new (103-108)

🔇 Additional comments (17)

tests/memory_test.rs (1)

43-96: Test changes look good.

The increased message count and improved output formatting with format_pretty enhance the test's utility for performance benchmarking.

src/middleware/deduplication.rs (2)

64-67: Probabilistic pruning may cause unbounded DB growth under certain conditions.

If message throughput is very low (or stops), the ~2% pruning chance may never trigger, allowing stale entries to accumulate indefinitely. Also, iterating the entire DB on each prune could become expensive as the DB grows.

Consider a time-based approach (e.g., prune every N seconds) or limiting iteration scope. This is acceptable for now if the expected workload is consistent.

44-58: Duplicate detection and commit flow looks correct.

The receive loop properly handles messages without message_id (pass-through), detects duplicates via DB lookup, commits duplicates as no-ops, and stores new message timestamps.

src/models.rs (3)

52-100: Custom Endpoint deserializer is well-structured.

The approach of buffering into serde_json::Map to separate middlewares from the flattened EndpointType is a reasonable solution for handling the complex deserialization requirement. The visitor pattern is correctly implemented.

167-174: New Middleware enum provides clean extensibility.

The enum-based middleware configuration with #[serde(rename_all = "lowercase")] aligns well with YAML/env-var configuration patterns.

488-549: Unsafe block for setting environment variables.

The unsafe { std::env::set_var(...) } blocks are required in Rust 2024 edition but can cause issues in multi-threaded test execution. Consider using a crate like temp-env or ensuring tests with env var manipulation run serially with #[serial].

src/lib.rs (1)

11-11: LGTM!

The new middleware module is correctly exposed in the crate's public API.

src/traits.rs (2)

63-63: Cloning messages before send adds overhead.

msg.clone() is required to retain the message for the failed_messages collection, but this adds allocation overhead for every message in the batch. If batches are large and failures are rare, consider a more optimized approach (e.g., storing indices of failed messages and extracting them post-iteration).

56-77: Breaking trait change verification complete — all implementations and callers properly updated.

The send_bulk return type change to (Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>) has been fully implemented. All 4 implementations (in traits.rs, metrics.rs, dlq.rs, and memory.rs) use the correct return type, and all callers properly handle the tuple unpacking with pattern matching. No outdated code remains.

src/endpoints/memory.rs (1)

132-146: LGTM! Return type change aligns with the updated trait contract.

The updated send_bulk signature correctly returns (Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>) to distinguish between responses and failed messages. The atomic nature of memory channel sends is well-documented, and returning (None, Vec::new()) on success is semantically correct.
src/middleware/dlq.rs (1)
22-37: Verify DLQ endpoint configuration to prevent infinite recursion.

The Box::pin breaks the compile-time recursive type, but if the DLQ endpoint itself is configured with a DLQ middleware, this could cause infinite recursion at runtime during construction.

Consider adding a guard or documenting that DLQ endpoints must not have DLQ middleware configured:
// Validate that the DLQ endpoint doesn't have DLQ middleware to prevent infinite recursion
if config.endpoint.middlewares.iter().any(|m| matches!(m, Middleware::Dlq(_))) {
    return Err(anyhow::anyhow!("DLQ endpoint cannot have DLQ middleware configured"));
}
src/middleware/mod.rs (1)

27-51: LGTM! Consumer middleware application is well-structured.

The reverse iteration correctly makes the first middleware in config the outermost layer. The feature-gated middleware handling and fallback error for unsupported middlewares is appropriate.

src/middleware/metrics.rs (1)

102-136: LGTM! Consumer metrics implementation is consistent with publisher.

The MetricsConsumer correctly mirrors the publisher's metrics pattern, with proper handling of bulk operations and average duration calculation.

src/route.rs (2)

32-51: Good improvement: Per-iteration internal shutdown channel prevents race conditions.

The introduction of a per-iteration internal shutdown channel properly decouples the outer shutdown signal from the inner task, preventing races where both could try to consume the same signal. The pattern of signaling internal shutdown and awaiting task completion ensures graceful termination.

118-127: Correct handling of partial bulk send failures.

The sequential runner correctly:

Commits successful messages via commit(response) even on partial failure

Returns an error for failed messages to trigger reconnection

This prevents reprocessing of already-committed messages while ensuring failed ones get another chance.

src/endpoints/mod.rs (2)

21-22: LGTM!

The new imports correctly support the refactored function signatures that now accept Endpoint and apply middlewares.

28-34: LGTM!

The refactored function correctly separates base consumer creation from middleware application, following the decorator pattern.

coderabbitai · 2025-12-16T16:00:07Z

+/// Wraps a `MessagePublisher` with the middlewares specified in the endpoint configuration.
+///
+/// Middlewares are applied in the order of the configuration list.
+/// This means the first middleware in the config is the outermost layer, executed first.
+pub async fn apply_middlewares_to_publisher(
+    mut publisher: Box<dyn MessagePublisher>,
+    endpoint: &Endpoint,
+    route_name: &str,
+) -> Result<Arc<dyn MessagePublisher>> {
+    for middleware in &endpoint.middlewares {
+        publisher = match middleware {
+            Middleware::Dlq(cfg) => Box::new(DlqPublisher::new(publisher, cfg, route_name).await?),
+            #[cfg(feature = "metrics")]
+            Middleware::Metrics(cfg) => {
+                Box::new(MetricsPublisher::new(publisher, cfg, route_name, "output"))
+            }
+            // This middleware is consumer-only
+            #[cfg(feature = "dedup")]
+            Middleware::Deduplication(_) => publisher,
+            #[allow(unreachable_patterns)]
+            _ => return Err(anyhow::anyhow!(
+                "[middleware:{}] Unsupported publisher middleware",
+                route_name
+            )),
+        };
+    }
+    Ok(publisher.into())


⚠️ Potential issue | 🟡 Minor

Inconsistent middleware application order between consumers and publishers.

The doc comments state "first middleware in the config is the outermost layer" for both functions, but:

apply_middlewares_to_consumer iterates .rev() (line 32)

apply_middlewares_to_publisher iterates forward (line 62)

With forward iteration, the last middleware in the config becomes the outermost layer for publishers, contradicting the documentation.

Either update the iteration order for consistency:

- for middleware in &endpoint.middlewares { + for middleware in endpoint.middlewares.iter().rev() {

Or correct the doc comment to reflect actual behavior:

/// Wraps a `MessagePublisher` with the middlewares specified in the endpoint configuration. /// -/// Middlewares are applied in the order of the configuration list. -/// This means the first middleware in the config is the outermost layer, executed first. +/// Middlewares are applied in the order of the configuration list. +/// This means the last middleware in the config is the outermost layer, executed first.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Wraps a `MessagePublisher` with the middlewares specified in the endpoint configuration.

///

/// Middlewares are applied in the order of the configuration list.

/// This means the first middleware in the config is the outermost layer, executed first.

pub async fn apply_middlewares_to_publisher(

mut publisher: Box<dyn MessagePublisher>,

endpoint: &Endpoint,

route_name: &str,

) -> Result<Arc<dyn MessagePublisher>> {

for middleware in &endpoint.middlewares {

publisher = match middleware {

Middleware::Dlq(cfg) => Box::new(DlqPublisher::new(publisher, cfg, route_name).await?),

#[cfg(feature = "metrics")]

Middleware::Metrics(cfg) => {

Box::new(MetricsPublisher::new(publisher, cfg, route_name, "output"))

}

// This middleware is consumer-only

#[cfg(feature = "dedup")]

Middleware::Deduplication(_) => publisher,

#[allow(unreachable_patterns)]

_ => return Err(anyhow::anyhow!(

"[middleware:{}] Unsupported publisher middleware",

route_name

)),

};

}

Ok(publisher.into())

/// Wraps a `MessagePublisher` with the middlewares specified in the endpoint configuration.

///

/// Middlewares are applied in the order of the configuration list.

/// This means the first middleware in the config is the outermost layer, executed first.

pub async fn apply_middlewares_to_publisher(

mut publisher: Box<dyn MessagePublisher>,

endpoint: &Endpoint,

route_name: &str,

) -> Result<Arc<dyn MessagePublisher>> {

for middleware in endpoint.middlewares.iter().rev() {

publisher = match middleware {

Middleware::Dlq(cfg) => Box::new(DlqPublisher::new(publisher, cfg, route_name).await?),

#[cfg(feature = "metrics")]

Middleware::Metrics(cfg) => {

Box::new(MetricsPublisher::new(publisher, cfg, route_name, "output"))

}

// This middleware is consumer-only

#[cfg(feature = "dedup")]

Middleware::Deduplication(_) => publisher,

#[allow(unreachable_patterns)]

_ => return Err(anyhow::anyhow!(

"[middleware:{}] Unsupported publisher middleware",

route_name

)),

};

}

Ok(publisher.into())

🤖 Prompt for AI Agents

In src/middleware/mod.rs around lines 53 to 79 the publisher middleware loop iterates forward causing the last configured middleware to become the outermost layer which contradicts the doc comment and the consumer behavior; change the loop to iterate in reverse (like apply_middlewares_to_consumer) so the first middleware in the config becomes the outermost layer for publishers as well, keeping middleware application order consistent between consumers and publishers and update only the iterator (no doc change needed).

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

src/route.rs (1)
198-223: Inconsistent return semantics between sequential and concurrent paths persists.

The return value semantics differ:

Sequential: Returns Ok(false) on end-of-stream (line 113) → outer loop breaks (graceful shutdown)

Concurrent: Returns Ok(shutdown_rx.is_empty()) (line 223) → on end-of-stream, is_empty() is true, so returns Ok(true) → matches _ => {} and loops/reconnects

This means end-of-stream in concurrent mode causes reconnection attempts instead of graceful shutdown.

The fix should align the semantics:
-        Ok(shutdown_rx.is_empty())
+        // Return false for graceful exit (end-of-stream), true only if shutdown was requested externally
+        Ok(!shutdown_rx.is_empty())
Or track the exit reason explicitly as suggested in the past review.
src/middleware/dlq.rs (1)

123-137: Bulk DLQ sends still lack retry logic (previously flagged).

The send_bulk method directly calls self.dlq_publisher.send_bulk(failed.clone()) without the retry/backoff logic used by send_to_dlq_with_retry in the single-message path. This creates inconsistent resilience behavior.

Consider implementing send_bulk_to_dlq_with_retry that applies similar retry logic to bulk operations, or document why bulk sends intentionally skip retries.

🧹 Nitpick comments (4)

src/models.rs (1)
124-126: Remove empty where clause.

The where clause on lines 125-126 is empty and serves no purpose. This appears to be a remnant from an earlier iteration.
-fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String>
-where
-{
+fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String> {
src/middleware/metrics.rs (1)
42-45: Consider reducing string allocations in hot path.

The route_name.clone() and endpoint_direction.clone() calls allocate new strings on every metric emission. While this works, it adds allocation overhead in the hot path.

Consider using Arc<str> or caching the counter/histogram handles during construction to avoid repeated allocations:
// In constructor
let counter = metrics::counter!("queue_messages_processed_total", 
    "route" => route_name.to_string(), 
    "endpoint" => endpoint_direction.to_string());
// Store as field and reuse
This is a minor optimization and can be deferred.
src/middleware/dlq.rs (1)
89-109: Return behavior after DLQ success is intentional but worth documenting.

When DLQ send succeeds, the method returns the original primary error (line 100). This is correct behavior to signal the route failure and trigger reconnection, while ensuring no data loss since the message is safely in the DLQ.

Consider adding a brief comment explaining this intentional behavior:
                 match self.send_to_dlq_with_retry(message, &error_msg).await {
                     Ok(()) => {
-                        // Message successfully sent to DLQ; return original error to signal route failure
+                        // Message safely in DLQ. Return original error to trigger route reconnect,
+                        // which may resolve transient primary destination issues.
                         Err(e)
                     }
src/endpoints/mod.rs (1)
112-176: Consider removing redundant type annotations for consistency.

The explicit as Box<dyn MessagePublisher> casts throughout this function may be redundant, as the return type signature should provide sufficient type inference. The corresponding create_base_consumer function (lines 36-91) doesn't use these explicit casts, suggesting they might not be necessary here either.

For consistency with create_base_consumer, consider removing the explicit type annotations. For example:
-            Ok(
-                Box::new(kafka::KafkaPublisher::new(&cfg.config, topic).await?)
-                    as Box<dyn MessagePublisher>,
-            )
+            Ok(Box::new(kafka::KafkaPublisher::new(&cfg.config, topic).await?))
Apply similar changes to the other endpoint types (NATS, AMQP, MQTT, File, HTTP, Static, Memory, MongoDB).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd78502 and 108a126.

📒 Files selected for processing (11)

src/endpoints/amqp.rs (2 hunks)
src/endpoints/memory.rs (2 hunks)
src/endpoints/mod.rs (3 hunks)
src/lib.rs (1 hunks)
src/middleware/deduplication.rs (1 hunks)
src/middleware/dlq.rs (1 hunks)
src/middleware/metrics.rs (1 hunks)
src/middleware/mod.rs (1 hunks)
src/models.rs (12 hunks)
src/route.rs (8 hunks)
tests/memory_test.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/lib.rs
tests/memory_test.rs
src/middleware/mod.rs

🧰 Additional context used

🧬 Code graph analysis (6)

src/middleware/deduplication.rs (2)

src/middleware/dlq.rs (2)

new (23-41)

as_any (143-145)

src/middleware/metrics.rs (4)

new (21-32)

new (88-99)

as_any (76-78)

as_any (136-138)

src/endpoints/mod.rs (2)

src/middleware/mod.rs (2)

apply_middlewares_to_consumer (27-53)

apply_middlewares_to_publisher (59-84)

src/models.rs (1)

new (107-112)

src/models.rs (2)

src/middleware/dlq.rs (1)

new (23-41)

src/endpoints/nats.rs (2)

new (23-56)

new (103-155)

src/middleware/dlq.rs (3)

src/endpoints/mod.rs (1)

create_publisher_from_route (94-106)

src/middleware/deduplication.rs (2)

new (23-38)

as_any (119-121)

src/traits.rs (4)

send (52-52)

send_bulk (56-77)

as_any (47-47)

as_any (82-82)

src/route.rs (2)

src/traits.rs (1)

send (52-52)

src/endpoints/mod.rs (2)

create_publisher_from_route (94-106)

create_consumer_from_route (28-34)

src/endpoints/memory.rs (2)

src/models.rs (1)

new (107-112)

src/canonical_message.rs (1)

new (18-24)

🔇 Additional comments (17)

src/models.rs (3)

52-104: Custom deserialize implementation looks correct.

The approach of buffering into a serde_json::Map to separate the middlewares field from the flattened EndpointType is a valid pattern for handling complex deserialization scenarios. The implementation correctly handles:

Missing middlewares (defaults to empty Vec)

Error propagation via map_err

196-203: DLQ middleware structure supports recursive endpoint configuration.

The change from EndpointType to Endpoint allows the DLQ endpoint to have its own middlewares, enabling flexible configuration. The default of 3 retry attempts is reasonable.

417-477: Test assertions are thorough but could use pattern matching simplification.

The test logic is correct and comprehensive. Minor observation: the boolean flags (has_dedup, has_metrics, has_dlq) approach works but could be simplified by using iterator methods or pattern matching. This is optional.

src/endpoints/amqp.rs (1)

152-154: Good security improvement: avoiding credential leakage in logs.

The log statement now avoids logging the connection URL which may contain embedded credentials. This is a sensible security hardening.

src/middleware/deduplication.rs (3)

61-62: Consider handling duration_since error more gracefully in the hot path.

Line 61 uses ? to propagate the error from duration_since(UNIX_EPOCH). While this error is extremely rare (only occurs if system time is before 1970), it will fail the entire receive call. The background cleanup task (lines 69-75) handles this more gracefully by returning early. Consider whether failing message receipt is the desired behavior for this edge case.

This is likely acceptable in practice since a system clock before 1970 indicates a severely misconfigured system, but verify this behavior aligns with your operational expectations.

64-112: Background cleanup task properly addresses previous review concerns.

The implementation now:

Handles duration_since failure gracefully (lines 69-75)

Validates timestamp byte length before conversion (lines 82-85)

Uses saturating_sub to prevent underflow (line 77)

Logs errors instead of panicking

The ~2% probabilistic cleanup is a reasonable approach to avoid running cleanup on every message.

44-59: Deduplication loop logic is correct.

The implementation correctly:

Passes through messages without message_id (can't deduplicate)

Commits duplicate messages before skipping (proper acknowledgment)

Uses a loop to automatically fetch the next message after skipping duplicates

src/endpoints/memory.rs (1)

132-147: send_bulk implementation is correct for atomic memory channel.

The updated return type correctly indicates that on success, there are no responses and no failed messages. The added comments clarify the atomic semantics.

src/route.rs (3)

32-51: Internal per-iteration shutdown channel is a good pattern.

Creating a new internal shutdown channel per iteration correctly avoids races where both the outer loop and inner task compete to consume the same external shutdown signal. The pattern ensures clean handoff and graceful task completion.

118-127: Sequential batch handling correctly processes partial success.

The implementation properly:

Commits successful messages before returning an error

Triggers reconnect on any failed messages

Maintains atomicity of the commit operation

158-179: Concurrent worker error handling is robust.

Workers correctly:

Commit successful messages on partial failure

Propagate errors to the main task via the error channel

Handle both partial failures and total failures consistently

src/middleware/metrics.rs (2)

49-74: Bulk metrics correctly handle partial failures.

The implementation:

Correctly calculates successful_count as total_count - failed.len()

Guards against division by zero with if successful_count > 0

Records average duration per successful message

The comment about adding failure metrics (line 66) is a good note for future enhancement.

102-139: Consumer metrics implementation is consistent and correct.

The MetricsConsumer follows the same patterns as MetricsPublisher:

Records metrics only on success

Guards against division by zero for bulk operations

Provides as_any for trait object compatibility

src/middleware/dlq.rs (2)

43-84: DLQ retry logic is well implemented.

The send_to_dlq_with_retry method provides:

Exponential backoff starting at 100ms, capped at 5s

Comprehensive logging at each stage (info on success, warn on retry, error on final failure)

Combined error context preserving both primary and DLQ failure details

22-41: Constructor correctly handles recursive async type with Box::pin.

The comment on lines 32-33 clearly explains why Box::pin is necessary to break the recursive async type definition chain. This is the correct approach for this pattern.

src/endpoints/mod.rs (2)

21-22: LGTM!

The new imports align with the refactored delegation flow, supporting the separation of base factory creation and middleware application.

27-34: LGTM!

The refactored delegation flow cleanly separates base consumer creation from middleware application, aligning with the new architecture.

coderabbitai · 2025-12-16T16:55:08Z

+            let stream_name = cfg.stream.as_deref().ok_or_else(|| {
+                anyhow!(
+                    "[route:{}] NATS consumer must specify a 'stream' or have a 'default_stream'",
+                    route_name
+                )
+            })?;


⚠️ Potential issue | 🟡 Minor

Error message mentions unsupported 'default_stream'.

The error message suggests users can have a default_stream, but the code only checks cfg.stream without any fallback logic. This could mislead users into thinking there's a default_stream configuration option.

Update the error message to reflect the actual requirement:

- anyhow!( - "[route:{}] NATS consumer must specify a 'stream' or have a 'default_stream'", - route_name - ) + anyhow!( + "[route:{}] NATS consumer must specify a 'stream'", + route_name + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let stream_name = cfg.stream.as_deref().ok_or_else(|| {

anyhow!(

"[route:{}] NATS consumer must specify a 'stream' or have a 'default_stream'",

route_name

)

})?;

let stream_name = cfg.stream.as_deref().ok_or_else(|| {

anyhow!(

"[route:{}] NATS consumer must specify a 'stream'",

route_name

)

})?;

🤖 Prompt for AI Agents

In src/endpoints/mod.rs around lines 49 to 54, the error text incorrectly mentions a 'default_stream' fallback even though the code only checks cfg.stream; update the error message to accurately state that a NATS consumer must specify a 'stream' (remove any reference to 'default_stream') so it reflects the actual requirement, or if a fallback was intended, implement the fallback logic before this check and keep the original message.

coderabbitai · 2025-12-16T16:55:08Z

+    // This function was partially refactored. It should create a base publisher
+    // and then apply middlewares, similar to `create_consumer_from_route`.
+    // However, since the middleware application logic already exists in `middleware/mod.rs`,
+    // we can simply call it after creating the base publisher.
+    // For now, let's fix the immediate compilation errors by creating the base publisher
+    // and wrapping it correctly. The middleware logic can be added back cleanly.


⚠️ Potential issue | 🟡 Minor

Remove stale comment block.

This comment block appears to be leftover from the refactoring process. The implementation is complete and follows the same clean pattern as create_consumer_from_route, so the explanatory comment is no longer needed.

Apply this diff to remove the stale comment:

- // This function was partially refactored. It should create a base publisher - // and then apply middlewares, similar to `create_consumer_from_route`. - // However, since the middleware application logic already exists in `middleware/mod.rs`, - // we can simply call it after creating the base publisher. - // For now, let's fix the immediate compilation errors by creating the base publisher - // and wrapping it correctly. The middleware logic can be added back cleanly. let publisher = create_base_publisher(route_name, &endpoint.endpoint_type).await?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// This function was partially refactored. It should create a base publisher

// and then apply middlewares, similar to `create_consumer_from_route`.

// However, since the middleware application logic already exists in `middleware/mod.rs`,

// we can simply call it after creating the base publisher.

// For now, let's fix the immediate compilation errors by creating the base publisher

// and wrapping it correctly. The middleware logic can be added back cleanly.

let publisher = create_base_publisher(route_name, &endpoint.endpoint_type).await?;

🤖 Prompt for AI Agents

In src/endpoints/mod.rs around lines 98 to 103, remove the stale multi-line comment block left from refactoring; the implementation is complete and mirrors create_consumer_from_route, so delete the explanatory comment entirely to avoid misleading future readers and keep the file concise.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/middleware/dlq.rs (1)

117-164: Bulk retry logic successfully addresses the previous review concern.

The inline retry loop with exponential backoff (lines 124-163) now provides consistent resilience for bulk DLQ sends, matching the single-message path. The intelligent handling of partial DLQ failures (only retrying the failed subset at line 142) optimizes retry efficiency.

Optional: Consider extracting retry logic to reduce duplication.

The retry pattern in lines 124-163 closely mirrors send_to_dlq_with_retry (lines 43-84). Extracting a send_bulk_to_dlq_with_retry helper method would reduce duplication and improve maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 108a126 and 058328b.

📒 Files selected for processing (1)

src/middleware/dlq.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/middleware/dlq.rs (3)

src/endpoints/mod.rs (1)

create_publisher_from_route (94-106)

src/middleware/deduplication.rs (2)

new (23-38)

as_any (119-121)

src/traits.rs (4)

send (52-52)

send_bulk (56-77)

as_any (47-47)

as_any (82-82)

🔇 Additional comments (3)

src/middleware/dlq.rs (3)

22-41: Constructor correctly handles recursive async with Box::pin.

The use of Box::pin to break the recursive async type cycle is appropriate, and the inline comment clearly explains the recursion chain. The initialization logic and logging are sound.

43-84: Retry logic with exponential backoff is well-implemented.

The exponential backoff strategy (100ms to 5s cap) and comprehensive error logging with combined context provide robust resilience for DLQ sends. The logic correctly exhausts retries before propagating failure.

89-109: Single-message send correctly integrates DLQ retry on primary failure.

The error-handling flow appropriately delegates to send_to_dlq_with_retry and preserves the original error when DLQ succeeds, ensuring callers are aware of the primary route failure.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/integration_test.rs (1)

6-75: Clarify intent: why all regular pipeline tests are commented out.

Lines 6-75 contain commented-out test functions for AMQP, Kafka, MQTT, and NATS pipelines. This removes all regular (non-performance) integration tests while only performance_direct variants remain under #[ignore].

The file header (lines 1-2) indicates these #[ignore] tests are designed to run explicitly via cargo test -- --ignored, but this approach only preserves performance-specific tests. The removal of regular pipeline tests (e.g., test_amqp_pipeline, test_kafka_pipeline) eliminates basic integration coverage for core messaging functionality.

If this change is intentional—such as migrating to a different testing strategy or CI/CD workflow—clarify the new approach. If temporary work-in-progress, consider keeping at least the regular pipeline tests active to maintain coverage during development.

🧹 Nitpick comments (3)

tests/integration/common.rs (1)

387-411: Consider handling negative numbers for future reuse.

The current implementation doesn't correctly handle negative numbers—the minus sign would be misplaced after the string reversal. While all current use cases involve positive metrics (message counts, durations, rates), consider adding negative number support to make this utility more robust for future reuse.

Additionally, note that fractional parts are truncated rather than rounded (Line 407). This is acceptable for test output, but consider whether rounding would be more appropriate for your use case.
src/middleware/dlq.rs (2)
45-84: Consider adding jitter to backoff for better distributed retry behavior.

The exponential backoff works correctly but lacks jitter. If multiple instances fail simultaneously, they may retry in lockstep, creating thundering herd effects. Adding a small random jitter (e.g., ±20%) would distribute retry attempts more evenly.

Example with jitter:
                     tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
-                    backoff_ms = (backoff_ms * 2).min(5000); // Cap backoff at 5s
+                    let jitter = (backoff_ms as f64 * 0.2 * rand::random::<f64>()) as u64;
+                    backoff_ms = ((backoff_ms * 2) + jitter).min(5000); // Cap backoff at 5s with jitter
117-164: Extract common DLQ retry logic to reduce duplication.

The retry logic in this partial-failure path (lines 124-163) is nearly identical to the complete-failure path (lines 172-210). The only differences are the initial messages_to_retry value and log messages. Consider extracting this into a helper method to improve maintainability.

Example:
async fn retry_dlq_bulk_send(
    &self,
    messages: Vec<CanonicalMessage>,
    context: &str,
) -> anyhow::Result<Vec<CanonicalMessage>> {
    let mut attempt = 0;
    let mut backoff_ms = 100u64;
    let mut messages_to_retry = messages;

    loop {
        attempt += 1;
        match self.dlq_publisher.send_bulk(messages_to_retry.clone()).await {
            Ok((_, dlq_failed)) if dlq_failed.is_empty() => {
                info!("Batch of {} messages successfully sent to DLQ on attempt {}. {}", messages.len(), attempt, context);
                return Ok(vec![]);
            }
            // ... rest of retry logic
        }
    }
}
Then call it from both paths.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 058328b and 82b6465.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

Cargo.toml (3 hunks)
src/canonical_message.rs (2 hunks)
src/endpoints/file.rs (2 hunks)
src/endpoints/kafka.rs (2 hunks)
src/endpoints/mongodb.rs (2 hunks)
src/endpoints/nats.rs (1 hunks)
src/middleware/dlq.rs (1 hunks)
tests/integration/common.rs (3 hunks)
tests/integration_test.rs (1 hunks)
tests/memory_test.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

src/endpoints/file.rs (1)

src/canonical_message.rs (1)

new (19-25)

src/middleware/dlq.rs (3)

src/endpoints/mod.rs (1)

create_publisher_from_route (94-106)

src/traits.rs (4)

send (52-52)

send_bulk (56-77)

as_any (47-47)

as_any (82-82)

tests/integration/common.rs (1)

as_any (323-325)

tests/memory_test.rs (1)

tests/integration/common.rs (1)

format_pretty (387-412)

tests/integration_test.rs (5)

tests/integration/amqp.rs (1)

test_amqp_performance_direct (50-83)

tests/integration/kafka.rs (1)

test_kafka_performance_direct (50-89)

tests/integration/mqtt.rs (1)

test_mqtt_performance_direct (51-107)

tests/integration/nats.rs (1)

test_nats_performance_direct (50-90)

tests/integration/mongodb.rs (1)

test_mongodb_performance_direct (50-98)

🔇 Additional comments (18)

tests/memory_test.rs (3)

4-4: LGTM!

The import and logging statement improve test observability without affecting functionality.

Also applies to: 17-17

18-18: Verify timeout adequacy for increased workload.

The test workload has increased significantly (6.67x from 1.5M to 10M messages). Ensure that the 20-second timeout on Line 47 remains adequate for the increased load, especially in slower CI environments.

59-62: LGTM!

The formatting changes improve readability by adding thousand separators and providing consistent formatting across all performance metrics.

tests/integration/common.rs (3)

9-9: LGTM!

The import is necessary for the generic type constraint in the format_pretty function.

305-306: LGTM!

The formatting improvements make performance metrics more readable by adding thousand separators to numeric values.

380-381: LGTM!

Consistent with the write performance formatting, improving readability of performance metrics.

tests/integration_test.rs (1)

105-116: LGTM: Consistent #[ignore] usage for performance tests.

The addition of #[ignore] attributes to the MongoDB and memory performance tests is consistent with the gating applied to other performance tests in this file. This ensures all performance tests require explicit execution.

Cargo.toml (2)

17-18: LGTM!

The addition of rand and bytes (with serde feature for serialization support) aligns with the payload type change to Bytes and broader middleware changes.

27-27: LGTM!

The dep:metrics syntax correctly prevents an implicit metrics feature from being created, and the feature flag mappings are properly structured.

Also applies to: 65-69

src/endpoints/kafka.rs (1)

137-137: LGTM!

The explicit slice syntax &message.payload[..] is necessary to obtain a &[u8] from Bytes for the rdkafka FutureRecord::payload() method.

src/endpoints/nats.rs (1)

39-40: LGTM!

The simplified comment accurately describes the subject filter behavior.

src/canonical_message.rs (1)

6-6: LGTM!

The migration to Bytes is a good optimization. It provides efficient reference-counted byte storage with zero-copy slicing, which is beneficial when payloads are passed through multiple middleware layers. The constructor maintains backward compatibility by accepting Vec<u8>.

Also applies to: 13-13, 22-22

src/endpoints/mongodb.rs (2)

43-43: LGTM!

The .into() call correctly converts the Vec<u8> from BSON Binary to Bytes for the CanonicalMessage payload field.

93-93: LGTM!

Using .to_vec() to convert Bytes back to Vec<u8> is the correct approach since BSON Binary requires owned bytes.

src/endpoints/file.rs (2)

53-56: LGTM: Correct handling of immutable Bytes.

The separate writes for payload and newline are appropriate given that Bytes is immutable. The BufWriter will efficiently handle the two consecutive writes.

98-110: LGTM: Correct newline handling in receive logic.

The read-and-trim approach correctly handles newline-delimited messages. The read_until includes the delimiter, and the subsequent pop removes it as expected.

src/middleware/dlq.rs (2)

23-41: LGTM: Correct handling of recursive async construction.

The use of Box::pin to break the recursive type definition is appropriate, and the comment clearly explains the cycle.

89-109: LGTM: Correct error propagation semantics.

The method correctly returns the primary send error even when DLQ succeeds (line 100), which appropriately signals route failure to the caller while ensuring the message is preserved in the DLQ.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

src/models.rs (4)
121-153: Empty where clause can be removed.

Lines 124-126 have an empty where clause which is unnecessary and can be cleaned up.
-fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String>
-where
-{
+fn deserialize_middlewares_from_value(value: serde_json::Value) -> Result<Vec<Middleware>, String> {
297-300: Missing newline between function and next section comment.

Minor formatting: add a blank line after default_amqp_persistent() before the MongoDB section comment for consistency with other sections.
 fn default_amqp_persistent() -> bool {
     true
 }
+
 // --- MongoDB Specific Configuration ---
500-530: Test uses unsafe block for set_var which is now required in Rust 2024 edition.

The use of unsafe { std::env::set_var(...) } is correct for Rust editions where set_var is unsafe. However, the test modifies global environment state without cleanup, which can cause flaky tests when run in parallel.

Consider using a test helper that restores environment variables after the test, or use a dedicated environment testing crate like temp-env to scope the variables.

551-561: Test assertions use else branches that could be simplified.

The pattern of checking if let followed by else { panic! } can be simplified using assert!(matches!(...)) or extracting with unwrap() after the type check.
-        if let EndpointType::Kafka(k) = &config.get("kafka_to_nats").unwrap().input.endpoint_type {
-            assert_eq!(k.topic, Some("input-topic".to_string()));
-            assert!(k.config.tls.required);
-        } else {
-            panic!("Expected Kafka endpoint");
-        }
+        let EndpointType::Kafka(k) = &config.get("kafka_to_nats").unwrap().input.endpoint_type else {
+            panic!("Expected Kafka endpoint");
+        };
+        assert_eq!(k.topic, Some("input-topic".to_string()));
+        assert!(k.config.tls.required);
src/endpoints/amqp.rs (1)
125-132: Inconsistent formatting in queue_declare call.

The closing parenthesis and .await? placement is unusual compared to the publisher's queue declaration. Consider aligning the formatting.
         channel
-            .queue_declare(
-                queue,
-                QueueDeclareOptions {
-                    durable: config.persistent,
-                    ..Default::default()
-                },
-                FieldTable::default())
+            .queue_declare(
+                queue,
+                QueueDeclareOptions {
+                    durable: config.persistent,
+                    ..Default::default()
+                },
+                FieldTable::default(),
+            )
             .await?;
src/endpoints/nats.rs (2)
221-229: Core NATS receive creates canonical message after commit closure.

In the Core path, create_canonical_message is called after creating the commit closure (line 228), which is fine since message is not moved into the closure. However, the ordering differs from the JetStream path (line 211 before closure). Consider aligning the order for consistency.
             NatsSubscription::Core(sub) => {
                 let message = futures::StreamExt::next(sub)
                     .await
                     .ok_or_else(|| anyhow!("NATS Core subscription ended"))?;
 
+                let msg = Self::create_canonical_message(&message);
                 // Core NATS has no ack, so the commit is a no-op.
                 let commit: CommitFunc = Box::new(move |_| Box::pin(async {}));
-                let msg = Self::create_canonical_message(&message);
                 (msg, commit)
             }
199-201: Trailing whitespace on line 200.

Minor formatting issue: there appears to be trailing whitespace on line 200 after the closing brace.
-        canonical_message
-    }        
-}
+        canonical_message
+    }
+}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82b6465 and 2942e59.

📒 Files selected for processing (3)

src/endpoints/amqp.rs (9 hunks)
src/endpoints/nats.rs (6 hunks)
src/models.rs (14 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/endpoints/nats.rs (2)

src/canonical_message.rs (1)

new (19-25)

src/traits.rs (1)

receive (30-30)

🔇 Additional comments (11)

src/models.rs (2)

196-203: Recursive Endpoint in DeadLetterQueueMiddleware may allow infinite nesting.

The DeadLetterQueueMiddleware contains an Endpoint, which itself can contain middlewares including another DLQ. This creates potential for unbounded recursion during deserialization. While Rust's stack will eventually overflow on deeply nested configs, consider whether you want to allow nested DLQ middlewares or validate against it.

Is nested DLQ middleware (DLQ inside a DLQ) an intended use case, or should the DLQ endpoint's middlewares be ignored/disallowed?

52-104: Custom Endpoint deserializer works correctly for standard configs but with known limitations.

The implementation correctly separates middlewares from the rest of the map using an intermediate serde_json::Value. This approach is sound for typical configuration formats where map keys are strings. However, be aware of two documented limitations:

Error context: Error messages lose positional information and become generic when deserializing from an intermediate Value, making debugging harder for invalid inputs.

YAML limitations: If YAML configs use non-string map keys, conversion to serde_json::Value will fail, though this is unlikely in practice for endpoint configurations.

For the middleware and endpoint type data structures in this codebase, this pattern is appropriate and safe.

src/endpoints/amqp.rs (4)

41-47: Queue durability is now tied to message persistence flag.

The queue is declared with durable: config.persistent. This is reasonable, but note that durable queues and persistent messages are distinct concepts in AMQP:

A durable queue survives broker restarts

A persistent message (delivery_mode: 2) survives broker restarts if in a durable queue

The current implementation couples them together, which is a sensible default. Just ensure this is the intended behavior.

72-77: Persistent message handling looks correct.

Setting delivery_mode(2) for persistent messages is the correct AMQP approach. Messages will survive broker restarts when combined with the durable queue declaration.

170-171: Good security improvement: credentials no longer logged.

Removing the URL from connection attempt logs prevents accidental credential leakage. This is a good security practice.

236-248: Improved error handling for message acknowledgment.

Replacing implicit panic (from ? or unwrap) with explicit error logging is more robust for production. The commit function now gracefully handles ack failures without crashing.

However, consider whether a failed ack should propagate the error back to the caller, as silently logging may cause the message to be redelivered unexpectedly.

Is silent ack failure the intended behavior? In some scenarios, failed acks should be surfaced to trigger retry logic or DLQ handling at a higher level.

src/endpoints/nats.rs (5)

16-26: Well-designed enum abstraction for Core vs JetStream modes.

The NatsClient enum cleanly separates the two operational modes. The comment on line 24 clarifies that await_ack only applies to JetStream, which is helpful.

50-51: Warning for missing stream in JetStream mode is appropriate.

This provides useful operational guidance without failing hard, allowing the user to proceed if the stream already exists.

55-57: Good UX: warning when await_ack is ignored in Core mode.

This prevents silent misconfiguration by informing users that their await_ack: true setting has no effect in Core mode.

176-177: Core NATS queue subscription uses stream_name for queue group.

The queue group is derived from stream_name, which is semantically a JetStream concept. In Core mode, this parameter is repurposed as a namespace identifier for the queue group. This works but the naming may be confusing.

Consider whether stream_name should be renamed to a more generic term (e.g., namespace or group_prefix) to avoid confusion when used in Core mode, or document that stream in the config doubles as the queue group name for Core NATS.

214-216: Ack failure uses unwrap_or_else with error logging, but error is swallowed.

Similar to the AMQP consumer, the JetStream ack failure is logged but not propagated. This means message redelivery behavior depends on the broker's ack timeout rather than explicit error handling.

Confirm this is intentional. If the ack fails, the message will eventually be redelivered by NATS (based on max_deliver and ack wait settings), but the application won't know the ack failed until it sees the duplicate.

coderabbitai · 2025-12-16T19:48:56Z

+                })
+                .await?;
+
+            tokio::time::sleep(Duration::from_millis(100)).await;


⚠️ Potential issue | 🟡 Minor

Unexplained 100ms sleep after consumer creation.

This hardcoded sleep appears to be a workaround for a timing issue but lacks documentation. Arbitrary sleeps can mask race conditions and may not be sufficient under load or slow networks.

Please document why this sleep is necessary. If it's working around a NATS client initialization race, consider using a more robust synchronization mechanism or at least add a comment explaining the workaround.

- tokio::time::sleep(Duration::from_millis(100)).await; + // Brief delay to allow consumer to be fully registered with the server. + // See: https://github.com/nats-io/nats.rs/issues/XYZ (if applicable) + tokio::time::sleep(Duration::from_millis(100)).await;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tokio::time::sleep(Duration::from_millis(100)).await;

// Brief delay to allow consumer to be fully registered with the server.

// See: https://github.com/nats-io/nats.rs/issues/XYZ (if applicable)

tokio::time::sleep(Duration::from_millis(100)).await;

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around line 167, there is an unexplained hardcoded tokio::time::sleep(Duration::from_millis(100)).await; after consumer creation; replace this fragile sleep with a robust readiness check (for example await a subscription/consumer confirmation, call the NATS client's flush/ping API, or use a synchronization primitive such as tokio::sync::Notify/oneshot that the consumer signals when ready) and add a short comment explaining why the wait is necessary and which readiness mechanism is used; if you must keep a fallback sleep, document it clearly with the reason and a TODO to remove it when replaced by a proper readiness/synchronization approach.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tests/integration/common.rs (1)

389-390: Fix documentation to match implementation.

The doc comment states "commas" but the implementation uses underscores as thousand separators. This was already flagged in a previous review.

src/endpoints/nats.rs (1)

167-167: Unexplained 100ms sleep after consumer creation.

This sleep lacks documentation explaining why it's necessary. If it's a workaround for a NATS client timing issue, add a comment. Otherwise, consider removing it or using a more robust synchronization mechanism.

🧹 Nitpick comments (6)

tests/integration/common.rs (1)
391-416: Consider handling negative numbers.

The current implementation would produce malformed output for negative numbers (e.g., -1000 becomes -_1_000 with underscore after the minus sign). While this is test utility code and the current use cases only format positive values, a small fix would make the function more robust.
 pub fn format_pretty<N: Display>(num: N) -> String {
     let s = num.to_string();
+    let (sign, s) = if s.starts_with('-') {
+        ("-", &s[1..])
+    } else {
+        ("", s.as_str())
+    };
     let mut parts = s.splitn(2, '.');
     let integer_part = parts.next().unwrap_or("");
     let fractional_part = parts.next();
     // ... rest unchanged ...
     match fractional_part {
         Some(frac) => {
             let truncated_frac = if frac.len() > 2 { &frac[..2] } else { frac };
-            format!("{}.{}", formatted_integer, truncated_frac)
+            format!("{}{}.{}", sign, formatted_integer, truncated_frac)
         }
-        None => formatted_integer,
+        None => format!("{}{}", sign, formatted_integer),
     }
 }
src/endpoints/nats.rs (2)
128-128: Move import to the top of the file.

The use std::any::Any; statement is placed after the struct definition. For consistency and readability, move it with the other imports at the top of the file.
 use tracing::{info, warn};
+use std::any::Any;
 
 enum NatsClient {
And remove line 128.

143-149: Consider extracting stream creation into a helper.

This stream creation logic is duplicated from NatsPublisher::new() (lines 42-48). Consider extracting it into a shared helper function to reduce duplication and ensure consistent stream configuration.
async fn ensure_stream_exists(jetstream: &jetstream::Context, stream_name: &str) -> anyhow::Result<()> {
    jetstream
        .get_or_create_stream(stream::Config {
            name: stream_name.to_string(),
            subjects: vec![format!("{}.>", stream_name)],
            ..Default::default()
        })
        .await?;
    Ok(())
}
src/endpoints/amqp.rs (1)

41-44: Consider the operational impact of queue durability changes.

Declaring queues as durable based on config.persistent is correct for persistent messaging. However, be aware that AMQP will reject attempts to redeclare an existing queue with different durability settings. This means:

Changing persistent configuration for an existing queue will cause connection failures

The queue must be manually deleted before changing durability settings

Multiple services accessing the same queue must use identical durability configuration

This same consideration applies to the consumer queue declaration at lines 125-132.
src/middleware/dlq.rs (2)
65-66: Consider adding jitter to exponential backoff.

The retry logic uses exponential backoff without jitter, which can cause thundering herd problems when multiple DLQ retries align. Adding jitter helps distribute retry attempts.

Apply this diff to add jitter:
                     tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
-                    backoff_ms = (backoff_ms * 2).min(5000); // Cap backoff at 5s
+                    let jitter = rand::random::<u64>() % (backoff_ms / 4); // 0-25% jitter
+                    backoff_ms = ((backoff_ms * 2) + jitter).min(5000); // Cap backoff at 5s
Note: This requires adding rand to dependencies. Apply the same pattern to lines 152, 160, 203, and 211.

124-171: Optional: Extract duplicated retry logic to reduce repetition.

The retry logic for bulk DLQ sends (lines 124-171, 180-222) is very similar to send_to_dlq_with_retry (lines 43-84). Extracting a generic retry_with_backoff helper would improve maintainability and reduce the risk of inconsistencies.

Consider creating a helper that accepts a closure for the operation to retry:
async fn retry_bulk_dlq(
    &self,
    messages: Vec<CanonicalMessage>,
    context: &str,
) -> anyhow::Result<Vec<CanonicalMessage>> {
    // Unified retry logic for bulk operations
    // Returns remaining failed messages or error
}
This refactor can be deferred if the current duplication is acceptable for your codebase.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee84573 and ea9d194.

📒 Files selected for processing (6)

src/endpoints/amqp.rs (9 hunks)
src/endpoints/nats.rs (6 hunks)
src/middleware/dlq.rs (1 hunks)
tests/integration/common.rs (3 hunks)
tests/integration_test.rs (1 hunks)
tests/memory_test.rs (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration_test.rs

🧰 Additional context used

🧬 Code graph analysis (3)

src/endpoints/amqp.rs (1)

src/traits.rs (1)

send (52-52)

tests/memory_test.rs (1)

tests/integration/common.rs (1)

format_pretty (391-416)

src/endpoints/nats.rs (2)

src/canonical_message.rs (1)

new (19-25)

src/traits.rs (1)

receive (30-30)

🔇 Additional comments (20)

tests/integration/common.rs (3)

9-9: LGTM!

The Display trait import is correctly added to support the generic bound in the new format_pretty function.

305-309: LGTM!

Good improvement to use the new formatting utility for consistent, readable output. Using {:.2?} for Duration is appropriate as it provides a human-readable representation.

382-386: LGTM!

Consistent application of format_pretty for read performance output, matching the write performance formatting style.

tests/memory_test.rs (3)

4-5: LGTM!

Correct import of the shared formatting utility.

16-17: Verify timeout adequacy for increased message count.

The message count increased from 1.5M to 10M (~6.7x), but the timeout on line 46 remains at 20 seconds. Ensure this is still sufficient for the increased workload, especially in CI environments that may be slower than local machines.

58-62: LGTM!

Good use of the shared format_pretty utility for consistent, human-readable output. Displaying duration in milliseconds is appropriate for this test scale.

src/endpoints/nats.rs (6)

16-19: Clean abstraction for dual-mode NATS support.

The NatsClient enum cleanly encapsulates the two operational modes, making the mode-specific logic explicit and easy to follow.

37-59: Well-structured initialization with appropriate warnings.

The dual-mode initialization is clean, with helpful warnings for potential misconfigurations. The idempotent get_or_create_stream is the right choice.

One observation: the subject filter format!("{}.>", stream_name) (line 45) assumes all subjects will be prefixed with the stream name. Ensure this matches your routing conventions.

68-79: LGTM!

The clone is indeed cheap for both async_nats::Client and jetstream::Context as they use Arc internally.

95-110: Correct dual-mode publish semantics.

The JetStream path properly handles the two-phase publish (send + optional ack await), while Core mode correctly uses fire-and-forget semantics.

186-202: LGTM!

Clean conversion logic. The choice to take the first header value when multiple exist is a reasonable simplification with appropriate documentation.

244-356: No changes in this segment.

This build_nats_options function appears to be existing code provided for context. No review needed for unchanged code.

src/endpoints/amqp.rs (6)

23-24: LGTM! New configuration fields properly added.

The addition of persistent and skip_ack fields enables the new AMQP modes mentioned in the PR objectives. These fields are appropriately initialized from config (lines 53-54) and propagated in with_routing_key (lines 63-64).

72-77: LGTM! Correct implementation of persistent message delivery.

Setting delivery mode to 2 for persistent messages is the correct AMQP protocol implementation. This ensures messages are written to disk and survive broker restarts when combined with durable queues.

102-105: LGTM! skip_ack provides intentional throughput vs. reliability tradeoff.

The skip_ack logic correctly bypasses waiting for broker acknowledgment when enabled. This improves message throughput at the cost of at-least-once delivery guarantees. Messages may be lost if the broker fails before processing when skip_ack=true. This is an appropriate design choice for high-throughput scenarios where occasional message loss is acceptable.

171-172: LGTM! Improved logging sanitization prevents credential exposure.

Removing the URL from connection attempt logs prevents credentials from being logged during retry attempts. This is a good security improvement.

239-250: LGTM! Improved error handling and observability.

Replacing panic behavior with error logging is a significant resilience improvement. When acknowledgment fails, the broker will redeliver the message (preserving at-least-once semantics), while the error log provides operational visibility. The debug-level success logging adds useful observability without excessive log volume.

83-83: ShortString conversion is correct for lapin 3.7.

The explicit conversion to ShortString::from(key) properly satisfies the FieldTable::insert API requirement. This pattern is standard for AMQP field tables in lapin 3.7.x.

src/middleware/dlq.rs (2)

89-109: LGTM: Single-message send logic is correct.

The implementation properly attempts DLQ delivery on primary failure, preserves error context, and returns appropriate errors to signal route failure.

34-38: No action required. The code is correct as written. create_publisher_from_route returns Result<Arc<dyn MessagePublisher>>, and .await? unwraps the Result to yield Arc<dyn MessagePublisher>, which matches the field type on line 18. The Box::pin() wraps the future itself, not the result type—there is no type mismatch here.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-12-16T20:05:16Z

+                let commit: CommitFunc = Box::new(move |_response| {
+                    Box::pin(async move {
+                        message.ack().await.unwrap_or_else(|e| {
+                            tracing::error!("Failed to ACK NATS message: {:?}", e)
+                        });
+                    }) as BoxFuture<'static, ()>
+                });


⚠️ Potential issue | 🟡 Minor

Ack failures are silently swallowed.

The unwrap_or_else logs the error but doesn't propagate it. If acknowledgment fails, the message may be redelivered, but the caller has no way to know the ack failed. Consider whether the CommitFunc signature should allow returning a Result to surface these errors.

If propagation isn't feasible due to the trait signature, consider at minimum adding a metric or more prominent warning:

- message.ack().await.unwrap_or_else(|e| { - tracing::error!("Failed to ACK NATS message: {:?}", e) - }); + if let Err(e) = message.ack().await { + tracing::error!(error = ?e, "Failed to ACK NATS message - message may be redelivered"); + }

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around lines 214 to 220 the CommitFunc currently swallows ACK failures by calling message.ack().await.unwrap_or_else(...) which only logs the error; update the implementation so ACK errors are surfaced: preferably change the CommitFunc signature to return a Result wrapped in the BoxFuture (e.g. BoxFuture<'static, Result<(), E>>), update callers to handle the Result and propagate errors, and replace unwrap_or_else with returning Err(e) on failure; if changing the signature is not feasible, at minimum increment a dedicated metric/counter and emit a higher-severity log (tracing::warn!/error!) and return a future that records the failure so operators can alert on it.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/traits.rs (1)

64-75: Fix indentation and remove unused helper method, or document its intended use case.

Line 64 has inconsistent indentation (6 spaces instead of 4). More importantly, receive_bulk_helper is unused throughout the codebase—all MessageConsumer implementations implement receive_bulk() directly or rely on the default receive() implementation. This helper creates dead code and potential for circular dependency issues if called without proper implementation of the required receive_bulk() method. Either remove it or add clear documentation explaining when and why implementors should use it.

♻️ Duplicate comments (4)

tests/integration/common.rs (1)

421-423: Fix documentation to match implementation.

The comment mentions "commas" but the implementation uses underscores as thousand separators.
src/route.rs (1)
224-224: End-of-stream return value is still inconsistent with sequential runner.

The return value Ok(shutdown_rx.is_empty()) doesn't correctly distinguish between shutdown and end-of-stream:

Scenario Sequential Concurrent (shutdown_rx.is_empty())

Shutdown signal Ok(true) (line 106) Ok(true) (channel empty after recv)

End-of-stream Ok(false) (line 131) Ok(true) (channel was never used)

Both concurrent paths return Ok(true), but the outer loop at line 55 only triggers graceful shutdown on Ok(false). This means end-of-stream in concurrent mode will cause the route to continue/reconnect instead of shutting down gracefully.

Apply this diff to track the exit reason explicitly:
     async fn run_concurrently(
         &self,
         name: &str,
         shutdown_rx: async_channel::Receiver<()>,
     ) -> anyhow::Result<bool> {
         let publisher = create_publisher_from_route(name, &self.output).await?;
         let mut consumer = create_consumer_from_route(name, &self.input).await?;
         let (err_tx, err_rx) = bounded(1);
         const BATCH_SIZE: usize = 128;
         let work_capacity = self.concurrency.saturating_mul(BATCH_SIZE);
         let (work_tx, work_rx) =
             bounded::<(Vec<crate::CanonicalMessage>, BulkCommitFunc)>(work_capacity);
+        let mut graceful_exit = false;

         // --- Worker Pool ---
         // ... (worker code unchanged)

         loop {
             select! {
                 biased;

                 Ok(err) = err_rx.recv() => {
                     error!("A worker reported a critical error. Shutting down route.");
                     return Err(err);
                 }

                 _ = shutdown_rx.recv() => {
                     info!("Shutdown signal received in concurrent runner for route '{}'.", name);
                     break;
                 }

                 res = consumer.receive_bulk(BATCH_SIZE) => {
                     let (messages, commit) = match res {
                         Ok(val) => val,
                         Err(e) => {
                             warn!("Consumer returned an error, likely end-of-stream: {}. Shutting down route.", e);
+                            graceful_exit = true;
                             break;
                         }
                     };
                     // ... rest unchanged
                 }
             }
         }

         // --- Graceful Shutdown ---
         drop(work_tx);
         for handle in worker_handles {
             let _ = handle.await;
         }
-        Ok(shutdown_rx.is_empty())
+        Ok(!graceful_exit)
     }
src/endpoints/nats.rs (2)

175-175: Unexplained 100ms sleep remains unaddressed.

This hardcoded sleep was flagged in a previous review. Consider adding a comment explaining why this delay is necessary (e.g., waiting for consumer registration with the server) or replacing it with a more robust readiness check.

222-228: Ack failures still silently swallowed.

This was flagged in a previous review. The unwrap_or_else logs the error but doesn't propagate it, meaning the caller has no visibility into ack failures. Consider whether this should at least emit a warning-level log indicating potential message redelivery.

🧹 Nitpick comments (1)

src/middleware/deduplication.rs (1)
61-62: Consider propagating time retrieval error instead of using ?.

The ? operator on line 61 will propagate the error to the caller, preventing message processing if SystemTime::now().duration_since(UNIX_EPOCH) fails. While this is extremely unlikely, it differs from the cleanup task's approach (which just logs and returns).

For consistency with the cleanup task's resilience, consider handling this more gracefully:
-            let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
+            let now = SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .map(|d| d.as_secs())
+                .unwrap_or_else(|e| {
+                    warn!("Failed to get system time: {}. Using 0 for dedup timestamp.", e);
+                    0
+                });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea9d194 and 7175143.

📒 Files selected for processing (14)

src/endpoints/amqp.rs (11 hunks)
src/endpoints/file.rs (5 hunks)
src/endpoints/http.rs (4 hunks)
src/endpoints/kafka.rs (9 hunks)
src/endpoints/memory.rs (6 hunks)
src/endpoints/mongodb.rs (5 hunks)
src/endpoints/mqtt.rs (3 hunks)
src/endpoints/nats.rs (6 hunks)
src/endpoints/static_endpoint.rs (3 hunks)
src/middleware/deduplication.rs (1 hunks)
src/route.rs (7 hunks)
src/traits.rs (2 hunks)
tests/integration/common.rs (7 hunks)
tests/memory_test.rs (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (12)

src/endpoints/static_endpoint.rs (1)

src/traits.rs (3)

into_bulk_commit_func (33-41)

send_bulk (83-86)

send_bulk_helper (109-136)

src/endpoints/amqp.rs (2)

src/traits.rs (5)

send (88-99)

send_bulk (83-86)

send_bulk_helper (109-136)

receive_bulk (48-51)

into_bulk_commit_func (33-41)

src/middleware/deduplication.rs (1)

receive_bulk (119-125)

src/endpoints/file.rs (2)

src/traits.rs (3)

send_bulk (83-86)

send_bulk_helper (109-136)

into_bulk_commit_func (33-41)

src/canonical_message.rs (1)

new (19-25)

src/endpoints/nats.rs (2)

src/traits.rs (1)

into_bulk_commit_func (33-41)

src/canonical_message.rs (1)

new (19-25)

src/endpoints/mqtt.rs (2)

src/traits.rs (4)

into_bulk_commit_func (33-41)

send_bulk (83-86)

send_bulk_helper (109-136)

receive_bulk (48-51)

src/middleware/deduplication.rs (1)

receive_bulk (119-125)

src/route.rs (3)

src/traits.rs (2)

send (88-99)

send_bulk (83-86)

src/middleware/metrics.rs (2)

send (37-48)

send_bulk (49-74)

src/endpoints/mod.rs (2)

create_publisher_from_route (94-106)

create_consumer_from_route (28-34)

src/endpoints/http.rs (8)

src/endpoints/file.rs (2)

receive_bulk (105-123)

send_bulk (61-67)

src/endpoints/kafka.rs (2)

receive_bulk (334-340)

send_bulk (178-184)

src/endpoints/memory.rs (2)

receive_bulk (168-195)

send_bulk (119-134)

src/endpoints/mongodb.rs (2)

receive_bulk (213-219)

send_bulk (103-109)

src/endpoints/mqtt.rs (2)

receive_bulk (132-138)

send_bulk (59-65)

src/endpoints/nats.rs (2)

receive_bulk (246-252)

send_bulk (115-121)

src/endpoints/static_endpoint.rs (2)

receive_bulk (78-84)

send_bulk (36-42)

src/traits.rs (4)

receive_bulk (48-51)

into_bulk_commit_func (33-41)

send_bulk (83-86)

send_bulk_helper (109-136)

src/endpoints/memory.rs (5)

src/endpoints/http.rs (3)

new (37-90)

new (182-199)

receive_bulk (95-111)

src/endpoints/mqtt.rs (1)

receive_bulk (132-138)

src/middleware/deduplication.rs (1)

receive_bulk (119-125)

src/traits.rs (1)

receive_bulk (48-51)

tests/integration/common.rs (1)

receive_bulk (338-345)

src/endpoints/kafka.rs (2)

src/traits.rs (4)

into_bulk_commit_func (33-41)

send_bulk (83-86)

send_bulk_helper (109-136)

receive_bulk (48-51)

src/middleware/deduplication.rs (1)

receive_bulk (119-125)

src/middleware/deduplication.rs (1)

src/traits.rs (3)

into_bulk_commit_func (33-41)

receive (54-62)

receive_bulk (48-51)

tests/memory_test.rs (3)

tests/integration/common.rs (1)

format_pretty (423-448)

src/route.rs (1)

run_until_err (78-90)

src/endpoints/memory.rs (2)

fill_messages (47-54)

close (57-59)

src/endpoints/mongodb.rs (3)

src/traits.rs (4)

into_bulk_commit_func (33-41)

send_bulk (83-86)

send_bulk_helper (109-136)

receive_bulk (48-51)

src/endpoints/mqtt.rs (2)

send_bulk (59-65)

receive_bulk (132-138)

src/middleware/deduplication.rs (1)

receive_bulk (119-125)

🔇 Additional comments (45)

src/endpoints/static_endpoint.rs (2)

36-42: LGTM!

The send_bulk implementation correctly delegates to send_bulk_helper with a callback that reuses the existing send method. This follows the established pattern across other endpoints.

78-84: LGTM!

The receive_bulk implementation correctly delegates to receive and wraps the single-message commit function using into_bulk_commit_func. This is consistent with the pattern used in mqtt.rs and mongodb.rs.

src/endpoints/mqtt.rs (2)

59-65: LGTM!

The send_bulk implementation correctly uses send_bulk_helper to process messages sequentially through the existing send method. This maintains MQTT QoS guarantees while adding bulk API support.

132-138: LGTM!

The receive_bulk implementation follows the established pattern, delegating to receive and converting the commit function appropriately. This is consistent with the implementation in src/middleware/deduplication.rs.

src/endpoints/mongodb.rs (2)

103-109: LGTM!

The send_bulk implementation correctly uses send_bulk_helper to delegate to the existing send method. This maintains the MongoDB-specific logic (ObjectId generation, metadata handling) while adding bulk API support.

213-219: LGTM!

The receive_bulk implementation follows the established pattern across all endpoints, delegating to receive and wrapping the commit function with into_bulk_commit_func.

tests/memory_test.rs (2)

44-57: LGTM!

The concurrent execution pattern using tokio::join! is correct. The route runs until the input channel is closed and empty, while the sender fills messages and then closes the channel to signal completion. The expect call properly surfaces any pipeline errors as test failures.

17-21: Sensible conditional message counts for build modes.

Using 1M messages for debug builds and 10M for release builds is a pragmatic approach that balances test coverage with execution time during development.

tests/integration/common.rs (1)

338-345: LGTM!

The receive_bulk implementation for MockConsumer correctly mirrors the blocking behavior of receive, maintaining consistent test semantics.

src/route.rs (3)

32-51: LGTM! Per-iteration internal shutdown channel pattern.

The introduction of a per-iteration internal shutdown channel correctly decouples external shutdown signaling from the inner task, preventing race conditions where both the outer loop and inner task compete for the same shutdown signal.

54-68: Shutdown semantics are now consistent across runners.

The updated match logic correctly handles:

Ok(Ok(false)) → graceful shutdown (end-of-stream)

Ok(Err(e)) → error with reconnect delay

Err(e) → panic with reconnect delay

_ => {} → continue (only reached for Ok(Ok(true)) from shutdown signal)

This aligns with run_sequentially returning Ok(false) on end-of-stream (line 131), addressing the previous review concern about inconsistent behavior.

118-127: LGTM! Partial failure handling commits successful messages.

The pattern correctly commits successful messages before propagating the error for failed ones. Since failed messages aren't committed, they'll be redelivered by the consumer after reconnection.

src/endpoints/kafka.rs (7)

23-27: LGTM! Renamed await_ack to skip_ack.

The field rename provides clearer semantics - skip_ack: true means "skip waiting for acknowledgment" (fire-and-forget mode).

61-61: SASL configuration key fixed.

Now correctly uses "sasl.mechanism" (singular) to match librdkafka's expected configuration key.

137-137: Payload slice conversion aligns with bytes::Bytes type change.

Using &message.payload[..] correctly converts bytes::Bytes to a &[u8] slice for the Kafka record payload.

159-174: LGTM! Fire-and-forget vs awaited delivery logic.

The skip_ack flag correctly gates between:

!skip_ack: Awaits delivery confirmation for at-least-once semantics

skip_ack: Uses send_result for high-throughput fire-and-forget, with flush() in Drop ensuring delivery before shutdown

178-184: LGTM! Bulk send follows established pattern.

Correctly delegates to send_bulk_helper with per-message send callback, consistent with other endpoints (AMQP, HTTP, MongoDB, etc.).

244-244: Consumer SASL configuration also fixed.

Both publisher (line 61) and consumer (line 244) now consistently use "sasl.mechanism" (singular).

334-340: LGTM! Bulk receive follows established pattern.

Correctly wraps single receive() result in a vector and converts CommitFunc to BulkCommitFunc using into_bulk_commit_func, consistent with other endpoints.

src/middleware/deduplication.rs (4)

16-38: LGTM! DeduplicationConsumer initialization.

The struct correctly wraps an inner consumer with sled-backed deduplication state and configurable TTL. The constructor logs activation and opens the sled DB.

44-59: LGTM! Deduplication receive loop with proper skip handling.

Messages without message_id are correctly passed through (can't deduplicate), and duplicates are skipped by calling commit(None) before continuing the loop.

64-112: Robust error handling in cleanup task.

The background cleanup task now properly handles all error cases:

SystemTime::duration_since failure (lines 69-74)

Invalid timestamp length check (lines 82-85)

try_into conversion (lines 89-95)

DB removal errors (lines 100-103)

This addresses the concerns from the previous review about potential panics.

119-125: LGTM! Bulk receive follows established pattern.

Correctly delegates to single receive() and converts the commit function using into_bulk_commit_func.

src/endpoints/http.rs (2)

95-111: LGTM! HTTP consumer bulk receive implementation.

The implementation correctly:

Receives a single HTTP request from the channel

Creates a commit function that sends the response back to the HTTP client

Wraps the single message in a vector and converts to BulkCommitFunc

The trait's default receive() method will call this with max_messages=1.

256-263: LGTM! HTTP publisher bulk send.

Correctly delegates to send_bulk_helper with per-message send callback. The comment "not a real bulk, but fast enough" accurately reflects that HTTP requests are sent sequentially.

src/endpoints/amqp.rs (10)

23-24: LGTM! New persistent and skip_ack fields.

These fields enable:

persistent: Durable queues and delivery mode 2 for message persistence

skip_ack: Fire-and-forget publishing without waiting for broker confirmation

41-44: LGTM! Queue durability controlled by config.

Using durable: config.persistent ensures queue survives broker restarts when persistence is enabled.

72-77: LGTM! Delivery mode 2 for persistent messages.

Setting delivery_mode(2) marks messages as persistent, ensuring they're written to disk by the broker when persistent: true.

83-83: LGTM! Header key type conversion.

Using ShortString::from(key) correctly converts the string key to lapin's expected type for AMQP header field names.

102-105: LGTM! Publisher confirmation gated by skip_ack.

When skip_ack is false, the code awaits the broker's publisher confirmation for at-least-once delivery guarantees.

109-116: LGTM! Bulk send follows established pattern.

Correctly delegates to send_bulk_helper with per-message send callback, consistent with other endpoints.

134-141: LGTM! Consumer queue durability matches publisher.

Both publisher and consumer declare the queue with the same durability setting, ensuring consistency.

180-181: Good practice: Avoid logging credentials.

Removing the URL from the log message prevents accidental credential exposure when username/password are embedded in the connection string.

247-258: LGTM! Structured logging for message acknowledgment.

The improved error handling and structured logging with delivery_tag provides better observability for debugging acknowledgment issues.

226-262: LGTM! Bulk receive with proper commit handling.

The implementation correctly:

Receives a single delivery from the AMQP consumer stream

Extracts headers into message metadata

Creates a commit function that acknowledges the delivery

Wraps in vector and converts to BulkCommitFunc

src/endpoints/file.rs (2)

60-67: LGTM!

The send_bulk implementation correctly delegates to send_bulk_helper, which handles per-message sending and aggregates results consistently with the trait contract.

113-122: Good newline trimming logic.

The buffer handling correctly trims the trailing newline before constructing the message, and the commit function is properly wrapped using into_bulk_commit_func.

src/endpoints/memory.rs (2)

168-195: Buffer management logic is correct.

The reverse-split_off-reverse pattern correctly handles partial batch consumption:

Reversing makes oldest messages accessible at the end for efficient split_off

split_off(split_at) correctly extracts the last num_to_take elements (oldest)

Re-reversing restores original order before returning

This addresses the concern from the past review about split_off semantics.

119-133: LGTM!

The send_bulk implementation correctly sends the entire batch atomically and returns an empty failed_messages vector on success, which aligns with the atomic send semantics of the memory channel.

src/endpoints/nats.rs (4)

16-26: LGTM!

The NatsClient enum cleanly abstracts the Core vs JetStream client paths, and the skip_ack field is appropriately scoped to JetStream mode where acknowledgements are meaningful.

37-59: Good defensive warning for configuration edge cases.

The code appropriately warns when:

JetStream mode is used without a configured stream (line 50)

skip_ack is set in Core mode where acks aren't supported (lines 55-57)

This helps users identify misconfigurations early.

231-240: LGTM!

The Core NATS subscription correctly provides a no-op commit function since Core NATS doesn't support acknowledgements. The message creation is properly delegated to the shared helper.

246-252: LGTM!

The receive_bulk implementation correctly wraps the single-message receive using into_bulk_commit_func, maintaining consistency with other endpoint implementations (MQTT, HTTP).

src/traits.rs (2)

107-136: LGTM!

The send_bulk_helper function provides a clean abstraction for endpoints without native bulk sending. It correctly:

Clones messages before the callback (allowing failed messages to be collected)

Aggregates responses and failed messages

Returns None for responses when empty

88-99: LGTM!

The default send implementation correctly delegates to send_bulk and handles the response extraction appropriately. The error case for failed messages is properly surfaced.

coderabbitai · 2025-12-17T13:26:31Z

+                match consumer_clone.lock().await.receive_bulk(batch_size).await {
+                    Ok((msgs, commit)) => {
                        // Spawn the commit to a separate task to allow the worker
                        // to immediately start receiving the next message.
+                        final_count_clone.fetch_add(msgs.len(), std::sync::atomic::Ordering::Relaxed);
                        tokio::spawn(async move {
                            commit(None).await;


⚠️ Potential issue | 🟡 Minor

Consider acquiring multiple permits for bulk receives.

The current implementation acquires only one semaphore permit but may receive multiple messages via receive_bulk. This could cause the semaphore to undercount and allow more messages to be processed than num_messages.

Consider adjusting the permit logic to account for the actual number of messages received:

- if let Ok(permit) = semaphore_clone.try_acquire() { - permit.forget(); - } else { - break; - } - match consumer_clone.lock().await.receive_bulk(batch_size).await { Ok((msgs, commit)) => { + // Try to acquire permits for all messages received. + // If not enough permits remain, we've hit the limit. + if semaphore_clone.try_acquire_many(msgs.len() as u32).is_err() { + break; + } // Spawn the commit to a separate task to allow the worker // to immediately start receiving the next message. final_count_clone.fetch_add(msgs.len(), std::sync::atomic::Ordering::Relaxed);

Alternatively, since this is test code and the current bulk implementations only return one message at a time, you may choose to defer this fix.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tests/integration/common.rs around lines 385 to 391, the code only acquires a single semaphore permit before calling receive_bulk but may process multiple messages, causing the semaphore to undercount; update the logic to acquire permits equal to msgs.len() (e.g., use try_acquire_many or loop-acquire the needed number of permits based on the number of messages returned) before spawning the commit task so the semaphore correctly reflects concurrent in-flight messages, or explicitly document/keep as-is if you accept the current assumption that bulk returns at most one message in tests.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/integration/amqp.rs (1)

13-26: Config field name mismatch: await_ack in YAML vs skip_ack in struct.

Lines 19 and 23 use await_ack: true, but AmqpConfig defines the field as skip_ack (not await_ack). This causes deserialization to silently ignore the field or fail. Additionally, the semantics are inverted: await_ack: true means "await acknowledgment," while skip_ack: true means "skip acknowledgment." Change the YAML to use skip_ack: false to await acknowledgments as intended.
src/endpoints/memory.rs (1)
222-222: Test at line 222 also uses undefined send method.

Similarly, line 222 calls publisher.send(msg.clone()) which would fail if send was removed from the public API.

Apply this diff to fix:
-        publisher.send(msg.clone()).await.unwrap();
+        publisher.send_bulk(vec![msg.clone()]).await.unwrap();
Also applies to: 226-226

♻️ Duplicate comments (9)

tests/integration/common.rs (3)

426-437: Semaphore permits may undercount with bulk receives.

This issue was flagged in a previous review: the code acquires only one semaphore permit (line 426) but receive_bulk may return multiple messages (line 432-433), causing the semaphore to undercount.

Since current bulk implementations only return one message at a time, this is acceptable for now, but consider adjusting the permit logic if true bulk receives are implemented later:

-               if let Ok(permit) = semaphore_clone.try_acquire() {
-                   permit.forget();
-               } else {
-                   break;
-               }
-
                 match consumer_clone.lock().await.receive_bulk(batch_size).await {
                     Ok((msgs, commit)) => {
+                        // Acquire permits for all messages received
+                        if semaphore_clone.try_acquire_many(msgs.len() as u32).is_err() {
+                            break;
+                        }
                         final_count_clone.fetch_add(msgs.len(), ...);

344-353: Potential infinite loop on persistent errors and unnecessary clone.

This issue was flagged in a previous review and remains unaddressed:

If send_bulk consistently returns Err, the loop never breaks, causing an infinite loop.
messages_to_send.clone() is called on every iteration, which is wasteful since send_bulk takes ownership.

Consider adding a retry limit and using std::mem::take to avoid cloning:

                 let mut messages_to_send = batch;
+                let mut retry_count = 0;
+                const MAX_RETRIES: usize = 5;
                 loop {
-                    match publisher_clone.send_bulk(messages_to_send.clone()).await {
+                    match publisher_clone.send_bulk(std::mem::take(&mut messages_to_send)).await {
                         Ok((_, failed)) if failed.is_empty() => break,
-                        Ok((_, failed)) => messages_to_send = failed,
-                        Err(e) => eprintln!("Error sending bulk messages: {}", e),
+                        Ok((_, failed)) => {
+                            messages_to_send = failed;
+                            retry_count = 0;
+                        }
+                        Err(e) => {
+                            eprintln!("Error sending bulk messages: {}", e);
+                            retry_count += 1;
+                            if retry_count >= MAX_RETRIES {
+                                eprintln!("Max retries reached, giving up on batch");
+                                break;
+                            }
+                        }
                     }
                     tokio::time::sleep(Duration::from_millis(10)).await;
                 }

470-472: Fix documentation to match implementation.

The comment mentions "commas" but the implementation uses underscores as thousand separators (line 482). This was flagged in a previous review.

-/// Formats a number with commas as thousand separators.
+/// Formats a number with underscores as thousand separators.
 /// Handles both integers and floating-point numbers.

src/endpoints/kafka.rs (1)

63-63: SASL configuration key corrected.

Both publisher (line 63) and consumer (line 248) now consistently use "sasl.mechanism" (singular), which is the correct librdkafka key.

Also applies to: 248-248

src/endpoints/memory.rs (1)

168-195: LGTM: receive_bulk with corrected buffer logic.

The buffer handling is now correct:

Buffer is reversed for efficient split_off from end

split_off(split_at) returns the last num_to_take elements

Result is reversed back to original order

This addresses the logic error flagged in the previous review.

src/endpoints/nats.rs (2)

179-179: Unexplained 100ms sleep remains.

This hardcoded sleep is still present without documentation explaining its necessity. As noted in the previous review, this may mask race conditions.

Per the previous review, consider replacing with a robust synchronization mechanism and documenting the reason. If the sleep must remain, add a comment:

+            // Brief delay to allow consumer to be fully registered with the server.
+            // TODO: Replace with proper readiness check
             tokio::time::sleep(Duration::from_millis(100)).await;

226-232: ACK failures still silently swallowed.

As noted in the previous review, ACK errors are logged but not propagated, preventing callers from detecting redelivery scenarios.

Per the previous review, consider enhancing error visibility:

-                    Box::pin(async move {
-                        message.ack().await.unwrap_or_else(|e| {
-                            tracing::error!("Failed to ACK NATS message: {:?}", e)
-                        });
-                    }) as BoxFuture<'static, ()>
+                    Box::pin(async move {
+                        if let Err(e) = message.ack().await {
+                            tracing::error!(error = ?e, "Failed to ACK NATS message - message may be redelivered");
+                        }
+                    }) as BoxFuture<'static, ()>

src/traits.rs (2)

30-33: Incomplete docstring remains.

As noted in the previous review, the docstring ends abruptly with "It does so by" without completing the explanation.

Apply the suggested fix from the previous review:
 /// Converts a `CommitFunc` into a `BulkCommitFunc` by wrapping it.
 /// This allows a function that commits a single message to be used where a
-/// function that commits a batch of messages is expected. It does so by
+/// function that commits a batch of messages is expected. It extracts the first
+/// element from the response vector (if any) and passes it to the original commit.
 pub fn into_bulk_commit_func(commit: CommitFunc) -> BulkCommitFunc {
45-47: Duplicate docstring text remains.

As noted in the previous review, these lines contain redundant documentation with "Receives a batch of messages" repeated.

Apply the suggested fix from the previous review:
-    /// Receives a batch of messages. Needs to be implemented.
-    /// In doubt, just implement a receive_bulk that returns 1 message as vec
-    /// Receives a batch of messages.
+    /// Receives a batch of messages.
+    /// 
+    /// If your backend doesn't support native batching, implement this to return
+    /// a single message wrapped in a vec.
     async fn receive_bulk(

🧹 Nitpick comments (5)

src/endpoints/kafka.rs (1)

338-345: receive_bulk ignores _max_messages parameter.

The implementation always returns exactly one message regardless of the max_messages parameter. While this is consistent with the single-message nature of Kafka's consumer stream, consider documenting this limitation or implementing actual batching for better throughput.
src/endpoints/nats.rs (1)
39-61: Mode selection logic is sound, but warning could be more actionable.

The branching logic correctly handles JetStream and Core modes. However, the warning at line 52 could guide users more effectively.

Consider enhancing the warning:
-                warn!("NATS publisher is in JetStream mode but no 'stream' is configured. Publishing may fail if stream does not exist.");
+                warn!("NATS publisher is in JetStream mode but no 'stream' is configured. Either provide a stream name or enable 'no_jetstream' in config.");
src/traits.rs (1)
90-99: Send implementation is correct, error message could be more informative.

The default send implementation appropriately delegates to send_bulk and handles failures. The error message at line 93 is generic but acceptable for this context.

Optionally consider enhancing the error message:
         if !failed_msgs.is_empty() {
-            Err(anyhow::anyhow!("Failed to send message"))
+            Err(anyhow::anyhow!("Failed to send message: send_bulk reported failure"))
         } else if let Some(result) = result_vec {
src/middleware/deduplication.rs (2)
64-113: Consider rate-limiting cleanup task spawning.

At high message throughput, the ~2% probability (line 64) could spawn many concurrent cleanup tasks, each iterating the entire DB. For example, at 1000 messages/second, this spawns ~20 cleanup tasks per second.

Consider one of these approaches:

Use a timestamp to rate-limit spawning (e.g., max once per minute)

Spawn a single long-lived periodic cleanup task on consumer initialization

Use a semaphore to limit concurrent cleanup tasks

Example rate-limiting approach:
// In struct:
// last_cleanup: Arc<Mutex<SystemTime>>

// In receive():
let should_cleanup = {
    let mut last = self.last_cleanup.lock().unwrap();
    let now = SystemTime::now();
    if now.duration_since(*last).unwrap_or_default().as_secs() > 60 {
        *last = now;
        true
    } else {
        false
    }
};

if should_cleanup {
    // spawn cleanup task
}
119-126: receive_bulk always returns a single message.

The implementation ignores max_messages and always returns exactly one message by calling receive() once. This limits performance for bulk-oriented consumers, though implementing true bulk deduplication with atomic checks would be significantly more complex.

If bulk performance is important, consider:

Fetching multiple messages from inner consumer

Batch-checking deduplication keys (though atomicity becomes harder)

Documenting this limitation in the struct's documentation

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7175143 and 9762d1c.

📒 Files selected for processing (19)

src/endpoints/amqp.rs (11 hunks)
src/endpoints/file.rs (5 hunks)
src/endpoints/http.rs (4 hunks)
src/endpoints/kafka.rs (9 hunks)
src/endpoints/memory.rs (6 hunks)
src/endpoints/mongodb.rs (5 hunks)
src/endpoints/mqtt.rs (3 hunks)
src/endpoints/nats.rs (6 hunks)
src/endpoints/static_endpoint.rs (3 hunks)
src/middleware/deduplication.rs (1 hunks)
src/traits.rs (2 hunks)
tests/integration/amqp.rs (2 hunks)
tests/integration/common.rs (10 hunks)
tests/integration/kafka.rs (2 hunks)
tests/integration/mongodb.rs (3 hunks)
tests/integration/mqtt.rs (3 hunks)
tests/integration/nats.rs (3 hunks)
tests/integration_test.rs (1 hunks)
tests/memory_test.rs (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/nats.rs

🧰 Additional context used

🧬 Code graph analysis (11)

tests/integration/amqp.rs (1)

tests/integration/common.rs (7)

add_performance_result (37-39)

measure_read_performance (400-468)

measure_write_performance (297-371)

run_performance_pipeline_test (151-157)

run_pipeline_test (147-149)

new (49-53)

new (120-136)

src/endpoints/static_endpoint.rs (4)

src/traits.rs (4)

into_bulk_commit_func (33-41)

send_bulk (85-88)

send_bulk_helper (109-139)

receive_bulk (48-51)

src/endpoints/kafka.rs (2)

send_bulk (180-188)

receive_bulk (338-345)

src/middleware/deduplication.rs (1)

receive_bulk (119-126)

tests/integration/common.rs (1)

receive_bulk (385-393)

src/endpoints/mqtt.rs (1)

src/traits.rs (3)

into_bulk_commit_func (33-41)

send_bulk (85-88)

send_bulk_helper (109-139)

src/middleware/deduplication.rs (2)

src/traits.rs (1)

into_bulk_commit_func (33-41)

src/models.rs (1)

new (107-112)

src/endpoints/mongodb.rs (2)

src/traits.rs (4)

into_bulk_commit_func (33-41)

send_bulk (85-88)

send_bulk_helper (109-139)

receive_bulk (48-51)

src/middleware/deduplication.rs (1)

receive_bulk (119-126)

src/endpoints/nats.rs (1)

src/traits.rs (3)

into_bulk_commit_func (33-41)

send_bulk (85-88)

send_bulk_helper (109-139)

src/traits.rs (7)

src/endpoints/file.rs (5)

new (26-45)

new (89-101)

receive_bulk (107-129)

send_bulk (61-69)

send (51-58)

src/endpoints/http.rs (5)

new (37-90)

new (186-203)

receive_bulk (95-115)

send_bulk (261-269)

send (216-258)

src/endpoints/kafka.rs (4)

receive_bulk (338-345)

receive (291-336)

send_bulk (180-188)

send (138-178)

src/endpoints/static_endpoint.rs (4)

receive_bulk (80-87)

receive (71-78)

send_bulk (36-44)

send (30-34)

src/middleware/deduplication.rs (2)

receive_bulk (119-126)

receive (44-117)

src/middleware/metrics.rs (4)

receive_bulk (117-134)

receive (104-115)

send_bulk (49-74)

send (37-48)

src/middleware/dlq.rs (2)

send_bulk (111-225)

send (89-109)

tests/integration_test.rs (5)

tests/integration/amqp.rs (1)

test_amqp_performance_direct (52-92)

tests/integration/kafka.rs (1)

test_kafka_performance_direct (52-97)

tests/integration/mqtt.rs (1)

test_mqtt_performance_direct (53-108)

tests/integration/nats.rs (1)

test_nats_performance_direct (52-98)

tests/integration/mongodb.rs (1)

test_mongodb_performance_direct (52-106)

src/endpoints/amqp.rs (2)

src/traits.rs (5)

send (90-99)

send_bulk (85-88)

send_bulk_helper (109-139)

receive_bulk (48-51)

into_bulk_commit_func (33-41)

src/middleware/deduplication.rs (1)

receive_bulk (119-126)

tests/integration/common.rs (8)

src/endpoints/file.rs (3)

new (26-45)

new (89-101)

receive_bulk (107-129)

src/endpoints/http.rs (3)

new (37-90)

new (186-203)

receive_bulk (95-115)

src/endpoints/kafka.rs (3)

new (32-112)

new (217-286)

receive_bulk (338-345)

src/endpoints/memory.rs (4)

new (34-37)

new (102-108)

new (150-157)

receive_bulk (168-195)

src/route.rs (1)

bounded (147-147)

src/endpoints/static_endpoint.rs (1)

receive_bulk (80-87)

src/middleware/deduplication.rs (1)

receive_bulk (119-126)

src/traits.rs (1)

receive_bulk (48-51)

src/endpoints/http.rs (1)

src/traits.rs (4)

receive_bulk (48-51)

into_bulk_commit_func (33-41)

send_bulk (85-88)

send_bulk_helper (109-139)

🔇 Additional comments (72)

src/endpoints/amqp.rs (10)

2-2: LGTM on import updates.

The imports are correctly updated to include BulkCommitFunc and ShortString to support the new bulk operations and header key conversion.

Also applies to: 12-12

23-24: LGTM on new fields for publisher configuration.

Adding persistent and skip_ack fields allows proper control over message durability and acknowledgment behavior at the publisher level.

41-44: LGTM on queue durability and field initialization.

Queue durability is correctly tied to config.persistent, and the new fields are properly initialized from config values.

Also applies to: 53-54

63-64: LGTM on with_routing_key propagation.

The cloned publisher correctly propagates persistent and skip_ack to maintain consistent behavior.

72-88: LGTM on persistent delivery mode and metadata handling.

Delivery mode 2 is correctly applied for persistent messages, and metadata keys are properly converted to ShortString.

102-106: LGTM on conditional acknowledgment wait.

The skip_ack flag correctly gates whether to await broker confirmation, enabling fire-and-forget semantics when desired.

109-118: LGTM on send_bulk implementation.

The implementation correctly delegates to send_bulk_helper, which is consistent with other endpoints (HTTP, MQTT, static) in this PR.

136-143: LGTM on consumer queue durability alignment.

Consumer queue declaration correctly uses the same config.persistent setting for durability, ensuring consistency with the publisher.

182-183: Good security improvement: credentials are no longer logged.

Sanitizing the connection log to avoid exposing embedded credentials is a good security practice.

228-268: LGTM on receive_bulk implementation with proper commit conversion.

The implementation correctly:

Returns a single message wrapped in a Vec

Converts the single-message commit to BulkCommitFunc using into_bulk_commit_func

Adds structured logging for ack success/failure

This aligns with the pattern used in other endpoints (Kafka, deduplication middleware).

src/endpoints/static_endpoint.rs (3)

6-6: LGTM on import updates.

The imports correctly include into_bulk_commit_func and BulkCommitFunc to support the new bulk operations.

36-44: LGTM on send_bulk implementation.

The implementation correctly uses send_bulk_helper to process messages through the existing send path, consistent with other endpoints.

80-87: LGTM on receive_bulk implementation.

The implementation follows the established pattern: delegate to receive(), convert commit via into_bulk_commit_func, and return a single-element vector. This matches the pattern in src/middleware/deduplication.rs (lines 118-125) and src/endpoints/kafka.rs (lines 337-344).

tests/integration/mongodb.rs (3)

4-8: LGTM on import reorganization.

The imports are correctly reorganized to include PERF_TEST_MESSAGE_COUNT from the top-level common module and add_performance_result for aggregating test results.

76-82: LGTM on performance measurement variable capture.

Storing write_perf and read_perf in local variables enables proper aggregation via add_performance_result.

Also applies to: 91-97

98-103: LGTM on performance result aggregation.

The PerformanceResult is correctly constructed with test name and both performance metrics, consistent with other integration tests (AMQP, Kafka, MQTT, NATS).

tests/integration/amqp.rs (4)

2-8: LGTM on import updates.

The imports correctly include PERF_TEST_CONCURRENCY and add_performance_result for performance testing infrastructure.

58-59: LGTM on skip_ack usage in direct test.

The skip_ack: true setting correctly enables fire-and-forget mode for high-throughput performance testing.

71-72: LGTM on reduced sleep duration.

Reducing from 10s to 3s is reasonable for allowing the broker to process messages before reading.

85-89: LGTM on performance result aggregation.

The PerformanceResult is correctly constructed and follows the same pattern as other integration tests.

tests/integration/kafka.rs (4)

2-8: LGTM on import updates.

The imports correctly include PERF_TEST_CONCURRENCY and add_performance_result for performance testing infrastructure.

64-65: LGTM on skip_ack usage.

The skip_ack: true setting correctly enables fire-and-forget mode for high-throughput Kafka performance testing.

77-77: LGTM on reduced sleep duration.

Reducing from 5s to 3s is reasonable and aligns with the same change in other integration tests.

90-94: LGTM on performance result aggregation.

The PerformanceResult is correctly constructed with "Kafka Direct" as the test name and both performance metrics.

tests/integration/mqtt.rs (2)

2-8: LGTM!

The import reorganization correctly separates PERF_TEST_CONCURRENCY and adds the necessary add_performance_result and PerformanceResult imports for the updated performance test flow.

83-105: LGTM!

The performance test now properly:

Captures write and read performance metrics into local variables.

Disconnects the publisher after measurements complete.

Records consolidated results via add_performance_result.

This aligns with the broader test suite pattern for performance reporting.

tests/memory_test.rs (3)

17-21: Verify the debug/release message counts are intentional.

The current logic uses 1,000,000 messages in debug mode and 10,000,000 in release mode. While this may be intentional, typically debug builds are significantly slower due to lack of optimizations, so running 1M messages in debug could be very slow. Consider whether these values should be lower in debug (e.g., 100,000) or if this is the intended behavior for stress testing.

44-57: LGTM!

The refactored tokio::join! pattern is cleaner and properly handles concurrent execution of the route and message sending. The result checking via run_result.expect() ensures pipeline failures are properly surfaced.

64-78: LGTM!

The table-formatted output using format_pretty provides clear and consistent performance reporting aligned with other integration tests.

src/endpoints/mqtt.rs (3)

2-4: LGTM!

The imports correctly add the bulk operation support types needed for the new send_bulk and receive_bulk implementations.

61-69: LGTM!

The send_bulk implementation correctly uses the send_bulk_helper utility, following the same pattern as other endpoints (MongoDB, Kafka, etc.). This provides consistent bulk publish semantics across the codebase.

136-143: LGTM!

The receive_bulk implementation follows the established pattern: delegate to receive(), wrap the single message in a vector, and convert the commit function using into_bulk_commit_func. This is consistent with implementations in src/endpoints/kafka.rs, src/middleware/deduplication.rs, and other endpoints.

src/endpoints/mongodb.rs (5)

2-4: LGTM!

The imports correctly add the bulk operation support types consistent with other endpoint implementations.

43-46: LGTM!

The payload conversion using .into() correctly handles the transition to bytes::Bytes type for CanonicalMessage.payload, aligning with the broader PR changes.

93-96: LGTM!

Using .to_vec() correctly creates the byte vector needed for BSON Binary construction from the bytes::Bytes payload type.

105-113: LGTM!

The send_bulk implementation follows the established pattern using send_bulk_helper, consistent with other endpoints.

217-224: LGTM!

The receive_bulk implementation follows the standard pattern of delegating to receive() and converting to bulk format using into_bulk_commit_func.

tests/integration/common.rs (4)

24-39: LGTM!

The PerformanceResult struct and global collector provide a clean mechanism for aggregating performance test results across multiple tests. The thread-safe design using Lazy<Mutex<Vec<_>>> is appropriate for test code.

251-274: LGTM!

The Drop implementation for PerformanceSummaryPrinter provides a convenient way to print the consolidated performance summary at the end of test runs.

385-393: LGTM!

The MockConsumer::receive_bulk implementation correctly mirrors the receive behavior by blocking indefinitely, which is appropriate for tests that only exercise the publisher.

454-459: LGTM!

Good defensive handling for edge cases where duration might be zero, preventing division by zero errors.

src/endpoints/kafka.rs (5)

2-4: LGTM: Import updates for bulk support.

The imports correctly add BulkCommitFunc and into_bulk_commit_func to support the new bulk receive functionality.

28-28: LGTM: Field rename from await_ack to skip_ack.

The rename improves semantic clarity—skip_ack: true clearly indicates fire-and-forget mode. The field is correctly propagated in new() and with_topic().

Also applies to: 110-110, 118-118

139-139: LGTM: Payload slice handling for bytes::Bytes.

Using &message.payload[..] correctly converts the Bytes type to a slice for the Kafka record.

161-176: LGTM: Conditional acknowledgement logic.

The inverted condition !self.skip_ack correctly switches between awaited delivery (at-least-once) and fire-and-forget modes. The comments clearly explain each path.

180-188: LGTM: Bulk send implementation.

The send_bulk correctly delegates to send_bulk_helper, following the established pattern used by other endpoints.

src/endpoints/http.rs (3)

7-7: LGTM: Import updated for bulk support.

The import correctly replaces CommitFunc with BulkCommitFunc to align with the bulk API changes.

95-114: LGTM: receive_bulk implementation.

The implementation correctly wraps the single received message in a Vec and converts the commit function using into_bulk_commit_func. This maintains backward compatibility while supporting the bulk interface.

260-269: LGTM: send_bulk implementation with honest comment.

The comment "not a real bulk, but fast enough" correctly sets expectations. The implementation properly delegates to send_bulk_helper, which processes messages sequentially.

tests/integration_test.rs (1)

77-113: LGTM: Consolidated performance test launcher.

Good refactor consolidating individual performance tests into a single orchestrator:

#[ignore] correctly marks this as an explicitly-run performance test

PerformanceSummaryPrinter RAII pattern ensures summary output on completion

Feature gates allow selective execution based on enabled features

Sequential execution prevents resource contention during measurements

src/endpoints/file.rs (4)

53-56: LGTM: Separate payload and newline writes.

The comment correctly explains that Bytes is immutable, so writing payload and newline separately is the right approach. BufWriter efficiently batches these writes.

60-69: LGTM: send_bulk implementation.

Follows the established pattern using send_bulk_helper. The comment acknowledges this delegates to normal send, which is appropriate for file I/O.

107-128: LGTM: receive_bulk with improved binary handling.

Good changes:

read_until(b'\n') is more appropriate than read_line for potentially binary data

Newline trimming logic correctly handles the trailing delimiter

Proper conversion to BulkCommitFunc via into_bulk_commit_func

160-162: LGTM: Tests updated for bulk API.

Tests correctly exercise both send_bulk and receive_bulk paths, including EOF handling.

Also applies to: 177-178, 184-184

src/endpoints/memory.rs (3)

6-6: LGTM: Import updated for bulk support.

The import correctly adds BulkCommitFunc for the bulk consumer interface.

119-133: LGTM: send_bulk return type updated.

The return type correctly changed to (Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>). The implementation returns (None, Vec::new()) indicating atomic success with no responses and no failures.

246-250: Review comment is incorrect. The send method is not undefined—it's available through the MessagePublisher trait as a default implementation that wraps send_bulk. Line 250 calling publisher.send(msg3.clone()).await.unwrap() is valid and will compile successfully. The suggested diff is unnecessary.

Likely an incorrect or invalid review comment.

src/endpoints/nats.rs (6)

70-81: LGTM!

The with_subject method correctly handles both NatsClient variants and appropriately clones the underlying clients.

86-115: LGTM!

The send method correctly handles both NATS modes:

JetStream path respects skip_ack configuration

Core path publishes without acknowledgment (as expected for Core NATS)

Header conversion is appropriate

117-125: LGTM!

The send_bulk implementation correctly delegates to send_bulk_helper, following the standard pattern used by other endpoints.

198-214: LGTM!

The create_canonical_message helper appropriately converts NATS messages and handles header extraction consistently.

235-244: LGTM!

The Core NATS subscription path correctly uses a no-op commit function, as Core NATS does not support acknowledgments.

250-257: LGTM!

The receive_bulk implementation follows the standard pattern of delegating to receive() and using into_bulk_commit_func, consistent with other endpoints.

src/traits.rs (5)

20-28: LGTM!

The into_commit_func conversion correctly wraps a single message into a vector and delegates to the bulk commit function.

34-41: LGTM!

The implementation correctly extracts the first message from the response vector and delegates to the single-message commit function. The debug_assert appropriately validates the contract in debug builds.

54-64: LGTM!

The default receive implementation correctly delegates to receive_bulk(1) and converts the bulk commit function. The debug_assert appropriately validates the contract.

66-77: LGTM!

The receive_bulk_helper default implementation correctly wraps a single-message receive into the bulk interface.

107-139: LGTM!

The send_bulk_helper implementation correctly:

Iterates over messages and invokes the callback

Collects successful responses

Tracks failed messages

Returns the expected tuple format

This matches the documented behavior in the relevant code snippets.

src/middleware/deduplication.rs (4)

1-20: LGTM!

The imports and struct definition are well-organized. The use of Arc<Db> properly enables shared access for the background cleanup tasks.

22-39: LGTM!

The constructor properly initializes the deduplication middleware with appropriate logging and error handling.

128-130: LGTM!

Standard as_any() implementation for type downcasting support.

48-48: message_id type verification confirmed

The code correctly assumes message_id is Option<u64>. When pattern-matched with Some(id), the extracted id is u64, which has a to_be_bytes() method that returns [u8; 8]. The subsequent .to_vec() call is appropriate and type-safe.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/integration/mqtt.rs (1)
2-8: Minor import style inconsistency.

PERF_TEST_CONCURRENCY is imported using an absolute path (crate::integration::common), while other items from the same module use a relative path (super::common). Consider using a consistent style.
-use crate::integration::common::PERF_TEST_CONCURRENCY;
-
 use super::common::{
-    add_performance_result, measure_read_performance, measure_write_performance,
+    add_performance_result, measure_read_performance, measure_write_performance, PERF_TEST_CONCURRENCY,
     run_performance_pipeline_test, run_pipeline_test, run_test_with_docker, setup_logging,
     PERF_TEST_MESSAGE_COUNT,
 };

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbbce6c and fb2d88a.

📒 Files selected for processing (2)

src/endpoints/mqtt.rs (6 hunks)
tests/integration/mqtt.rs (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/endpoints/mqtt.rs (3)

src/traits.rs (2)

into_bulk_commit_func (33-41)

receive_bulk (48-51)

src/endpoints/memory.rs (4)

new (34-37)

new (102-108)

new (150-157)

receive_bulk (168-195)

src/middleware/deduplication.rs (1)

receive_bulk (119-126)

🔇 Additional comments (6)

tests/integration/mqtt.rs (2)

83-104: LGTM!

The refactoring to capture write and read performance into dedicated variables and record them using add_performance_result is clean and improves consistency with other performance tests. The structured PerformanceResult provides clear attribution of metrics.

26-26: LGTM!

The YAML syntax with the {out_capacity} placeholder is correct and properly replaced at runtime.

src/endpoints/mqtt.rs (4)

2-4: LGTM!

The imports are correctly extended to support the new bulk send/receive APIs.

68-76: LGTM!

The send_bulk implementation cleanly delegates to the shared helper while reusing the existing per-message send logic.

143-150: LGTM!

The receive_bulk implementation correctly adapts the single-message receive path to the bulk interface using into_bulk_commit_func, consistent with the pattern in deduplication.rs. While the _max_messages parameter is ignored, this is acceptable given that rumqttc doesn't provide native batch receive semantics.

35-41: Remove with_topic or fix shared eventloop lifecycle handling.

The with_topic method is currently unused in the codebase but exposes an unsafe pattern: it creates a clone sharing the same eventloop_handle Arc, yet the Drop implementation calls abort() on it. Any actual use of with_topic would cause early drops to terminate the eventloop for all clones. Either remove this method if it's not needed, or redesign the Drop implementation to handle shared ownership safely (e.g., use a reference count or dedicated lifecycle manager).

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/integration/nats.rs (1)

13-26: CONFIG_YAML uses invalid field name await_ack instead of skip_ack.

The NatsConfig struct (src/models.rs:250) defines the field as skip_ack, not await_ack. The struct has #[serde(deny_unknown_fields)] and no backward-compatible alias, so CONFIG_YAML (lines 19, 23) will fail to deserialize with an "unknown field" error at runtime. Update both occurrences to use skip_ack: false to match the struct definition. The direct test at line 59 correctly uses skip_ack: false.

tests/integration/amqp.rs (1)

13-26: The CONFIG_YAML uses await_ack but AmqpConfig only accepts skip_ack—deserialization will fail.

The AmqpConfig struct in src/models.rs (lines 287–298) has #[serde(deny_unknown_fields)] and only defines a skip_ack field. Using await_ack in the YAML config (lines 19, 23) will cause a deserialization error since it's an unknown field. Update the YAML to use skip_ack: false (semantically equivalent to the intended await_ack: true).

Also applies to: 56–60 in tests/integration/nats.rs

♻️ Duplicate comments (8)

src/middleware/deduplication.rs (1)

54-61: Race condition in deduplication check-and-insert still present.

The previous review flagged that the separate get() and insert() calls create a race condition when multiple consumers share the same sled DB. Two consumers could simultaneously pass the get() check (both see None) and then both insert(), allowing duplicate processing. This concern appears unaddressed.

Apply atomic compare-and-swap to eliminate the race:

-            if self.db.get(&key)?.is_some() {
-                info!("Duplicate message detected and skipped");
-                commit(None).await;
-                continue;
-            }
-
             let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
-            self.db.insert(&key, &now.to_be_bytes())?;
+            
+            // Atomically insert only if key doesn't exist
+            match self.db.compare_and_swap(
+                &key,
+                None as Option<&[u8]>,
+                Some(&now.to_be_bytes()[..])
+            )? {
+                Ok(_) => {
+                    // Successfully inserted - not a duplicate, proceed
+                }
+                Err(_) => {
+                    // Key already exists - duplicate detected
+                    info!("Duplicate message detected and skipped");
+                    commit(None).await;
+                    continue;
+                }
+            }

tests/integration/common.rs (2)

344-353: Potential infinite loop on persistent send_batch errors.

If send_batch consistently returns Err, the inner retry loop (lines 346-353) never breaks, causing an infinite loop. The code also clones messages_to_send on each iteration (line 347).

Apply this diff to limit retries and avoid unnecessary clones:

                 // Retry sending the batch if some messages fail.
                 let mut messages_to_send = batch;
+                let mut error_count = 0;
+                const MAX_RETRIES: usize = 5;
                 loop {
-                    match publisher_clone.send_batch(messages_to_send.clone()).await {
+                    match publisher_clone.send_batch(std::mem::take(&mut messages_to_send)).await {
                         Ok((_, failed)) if failed.is_empty() => break, // All sent successfully
-                        Ok((_, failed)) => messages_to_send = failed,  // Retry failed messages
-                        Err(e) => eprintln!("Error sending bulk messages: {}", e),
+                        Ok((_, failed)) => {
+                            messages_to_send = failed;
+                            error_count = 0; // Reset on partial success
+                        }
+                        Err(e) => {
+                            eprintln!("Error sending bulk messages: {}", e);
+                            error_count += 1;
+                            if error_count >= MAX_RETRIES {
+                                eprintln!("Max retries reached, giving up on batch");
+                                break;
+                            }
+                        }
                     }
                     tokio::time::sleep(Duration::from_millis(10)).await;
                 }

457-459: Fix documentation to match implementation.

The comment says "commas" but the implementation uses underscores as thousand separators.

Apply this diff:

-/// Formats a number with commas as thousand separators.
+/// Formats a number with underscores as thousand separators.
 /// Handles both integers and floating-point numbers.

src/endpoints/nats.rs (3)

25-26: Inconsistent naming in comment.

The comment references await_ack but the field is named skip_ack.

Apply this diff:
-    // await_ack is only used for JetStream
+    // skip_ack is only used for JetStream
     skip_ack: bool,
180-180: Unexplained 100ms sleep after consumer creation.

This hardcoded sleep appears to be a workaround for a timing issue but lacks documentation.

Please document why this sleep is necessary or replace it with a more robust synchronization mechanism:
+            // Brief delay to allow consumer to be fully registered with the server.
+            // TODO: Replace with proper readiness check (e.g., flush/ping API)
             tokio::time::sleep(Duration::from_millis(100)).await;
229-231: Ack failures are silently swallowed.

The code uses unwrap_or_else to log ack errors but doesn't propagate them (lines 229-231, 288-289). If acknowledgment fails, the message may be redelivered, but the caller has no way to know.

Consider at minimum using a more prominent warning:
-                        message.ack().await.unwrap_or_else(|e| {
-                            tracing::error!("Failed to ACK NATS message: {:?}", e)
-                        });
+                        if let Err(e) = message.ack().await {
+                            tracing::error!(error = ?e, "Failed to ACK NATS message - message may be redelivered");
+                        }
Similarly for lines 288-289.

Also applies to: 288-289

src/traits.rs (2)

30-33: Complete the truncated docstring.

Line 33's docstring ends mid-sentence with "It does so by" - the explanation is incomplete.

Apply this diff to complete the docstring:

 /// Converts a `CommitFunc` into a `BatchCommitFunc` by wrapping it.
 /// This allows a function that commits a single message to be used where a
-/// function that commits a batch of messages is expected. It does so by
+/// function that commits a batch of messages is expected. It extracts the first
+/// element from the response vector (if any) and passes it to the original commit.
 pub fn into_batch_commit_func(commit: CommitFunc) -> BatchCommitFunc {

45-47: Remove duplicate docstring lines.

Lines 45-47 contain redundant documentation - "Receives a batch of messages." appears twice.

Apply this diff to remove the redundancy:

-    /// Receives a batch of messages. Needs to be implemented.
-    /// In doubt, just implement a receive_batch that returns 1 message as vec
-    /// Receives a batch of messages.
+    /// Receives a batch of messages.
+    /// 
+    /// If your backend doesn't support native batching, implement this to return
+    /// a single message wrapped in a vec.
     async fn receive_batch(

🧹 Nitpick comments (12)

src/middleware/metrics.rs (2)
13-17: Consider using Arc<str> to avoid string cloning on every metric call.

The route_name and endpoint_direction fields are cloned on every send, send_batch, receive, and receive_batch call (lines 42-43, 62-63, 109-110, 127-128). This creates allocation overhead in the hot path.
+use std::sync::Arc;
+
 pub struct MetricsPublisher {
     inner: Box<dyn MessagePublisher>,
-    route_name: String,
-    endpoint_direction: String,
+    route_name: Arc<str>,
+    endpoint_direction: Arc<str>,
 }
Then use .clone() on Arc<str> which is cheap (just a refcount bump).

Also applies to: 80-84

57-70: Consider tracking failure metrics for observability.

The comments on lines 65-70 indicate that failure metrics are not currently recorded. For production observability, tracking failed message counts would be valuable for alerting and debugging.
         match &result {
             Ok((_, failed)) => {
                 let successful_count = total_count - failed.len();
                 if successful_count > 0 {
                     let avg_duration = duration.as_secs_f64() / successful_count as f64;
                     metrics::counter!("queue_messages_processed_total", "route" => self.route_name.clone(), "endpoint" => self.endpoint_direction.clone()).increment(successful_count as u64);
                     metrics::histogram!("queue_message_processing_duration_seconds", "route" => self.route_name.clone(), "endpoint" => self.endpoint_direction.clone()).record(avg_duration);
                 }
-                // We can add a new metric for failures here if desired
+                if !failed.is_empty() {
+                    metrics::counter!("queue_messages_failed_total", "route" => self.route_name.clone(), "endpoint" => self.endpoint_direction.clone()).increment(failed.len() as u64);
+                }
             }
             Err(_) => {
-                // On a total failure, we could increment an error counter for the whole batch
-                // For now, we just don't record success, which is implicitly correct.
+                metrics::counter!("queue_messages_failed_total", "route" => self.route_name.clone(), "endpoint" => self.endpoint_direction.clone()).increment(total_count as u64);
             }
         }
src/endpoints/mongodb.rs (1)
106-114: Consider native batch insert for better MongoDB performance.

The current implementation delegates each message to send() individually via send_batch_helper. MongoDB supports insert_many() which could be significantly more efficient for batch operations by reducing round trips.

This is optional since the current implementation is correct and consistent with other endpoints.
     async fn send_batch(
         &self,
         messages: Vec<CanonicalMessage>,
     ) -> anyhow::Result<(Option<Vec<CanonicalMessage>>, Vec<CanonicalMessage>)> {
-        crate::traits::send_batch_helper(self, messages, |publisher, message| {
-            Box::pin(publisher.send(message))
-        })
-        .await
+        // Build documents for all messages
+        let mut docs = Vec::with_capacity(messages.len());
+        let mut prepared_messages = Vec::with_capacity(messages.len());
+        
+        for message in messages {
+            // ... prepare each message similar to send() ...
+            // ... build doc and add to docs vector ...
+            // ... add prepared message to prepared_messages ...
+        }
+        
+        // Use insert_many for batch efficiency
+        self.collection.insert_many(&docs).await?;
+        Ok((Some(prepared_messages), Vec::new()))
     }
tests/integration/kafka.rs (2)
89-100: Consumer wrapped in Arc<Mutex> matches the pattern, but consider consistency with the publisher.

The consumer is wrapped in Arc<tokio::sync::Mutex> while the publisher uses just Arc. This is likely intentional since measure_read_performance requires &mut self access. However, the AMQP test (from relevant snippets) doesn't clone consumer_arc before passing it to measure_read_performance and then use it for disconnect.

Consider simplifying by not cloning consumer_arc since it's only used once after measurement:
         let read_perf = {
             let consumer = KafkaConsumer::new(&config, topic).unwrap();
             let consumer_arc = Arc::new(tokio::sync::Mutex::new(consumer));
             let read_perf = measure_read_performance(
                 "Kafka",
-                consumer_arc.clone(),
+                consumer_arc.clone(), // clone needed if measure takes ownership
                 PERF_TEST_MESSAGE_COUNT_DIRECT,
             )
             .await;
             consumer_arc.lock().await.disconnect();
             read_perf
         };
102-107: Final sleep after recording results may be unnecessary.

The 3-second sleep at line 107 occurs after all results are recorded and the test is essentially complete. Unless there's a specific reason (e.g., allowing Docker container cleanup), this sleep adds unnecessary test duration.
src/middleware/deduplication.rs (2)
63-66: Probabilistic cleanup may leave stale entries for extended periods.

The ~2% cleanup probability (rand::random::<u8>() < 5 where 5/256 ≈ 2%) is simple but unpredictable. With low message volume, cleanup may not run for extended periods, leaving expired entries.

Consider a time-based approach or tracking last cleanup time:
// Alternative: Track last cleanup time
let should_cleanup = {
    let elapsed = self.last_cleanup.elapsed();
    elapsed > Duration::from_secs(60) // Cleanup every ~60 seconds
};
Or document the expected cleanup frequency based on message throughput.

118-125: receive_batch ignores _max_messages parameter.

The receive_batch method delegates to single-message receive() and always returns exactly one message, ignoring the _max_messages parameter. This is consistent with other endpoints (MQTT, HTTP) but worth noting for API clarity.

Consider adding a doc comment explaining this behavior:
+    /// Note: Currently processes one message at a time for deduplication guarantees.
+    /// The `_max_messages` parameter is ignored.
     async fn receive_batch(
         &mut self,
         _max_messages: usize,
     ) -> anyhow::Result<(Vec<CanonicalMessage>, BatchCommitFunc)> {
src/endpoints/kafka.rs (1)

344-351: Consider implementing true batch receiving for Kafka.

The current implementation wraps single-message receive() via into_batch_commit_func, which doesn't leverage Kafka's native batching capabilities. Kafka consumers can efficiently fetch multiple messages in a single poll operation.

For better performance, consider using StreamConsumer::recv() in a loop or leveraging rdkafka's batching APIs to fetch multiple messages.
src/traits.rs (4)
54-64: Inconsistent assertion and error handling.

Line 56 has a debug_assert!(msg_vec.len() == 1), but lines 57-63 handle the case where the vector is empty. If an implementation returns 0 or >1 messages, the debug assertion fails but release builds continue with potentially incorrect behavior.

Consider either:

Removing the assertion if empty vectors are valid, or

Making it a hard requirement and returning a more specific error
     async fn receive(&mut self) -> anyhow::Result<(CanonicalMessage, CommitFunc)> {
         let (msg_vec, batch_commit) = self.receive_batch(1).await?;
-        debug_assert!(msg_vec.len() == 1);
-        if let Some(msg) = msg_vec.into_iter().next() {
+        if msg_vec.len() != 1 {
+            return Err(anyhow::anyhow!(
+                "receive_batch(1) returned {} messages, expected exactly 1",
+                msg_vec.len()
+            ));
+        }
+        if let Some(msg) = msg_vec.into_iter().next() {
             Ok((msg, into_commit_func(batch_commit)))
         } else {
-            Err(anyhow::anyhow!(
-                "Nothing received, receiver probably closed."
-            ))
+            unreachable!()
         }
     }
66-77: Document why _max_messages parameter is ignored.

The _max_messages parameter is prefixed with _ and unused in the implementation. Since this is a helper that delegates to receive() (which fetches only one message), add a doc comment explaining why the batch size is ignored.

Apply this diff:
+    /// Helper to adapt single-message `receive()` to the batch interface.
+    /// Note: `_max_messages` is ignored because this delegates to the single-message path.
     async fn receive_batch_helper(
90-99: Preserve error information in the send method.

Line 93 returns a generic error message when send_batch reports failed messages, losing any underlying error context from the batch operation. This makes debugging harder.

Apply this diff to preserve error details:
     async fn send(&self, message: CanonicalMessage) -> anyhow::Result<Option<CanonicalMessage>> {
         let (result_vec, failed_msgs) = self.send_batch(vec![message]).await?;
         if !failed_msgs.is_empty() {
-            Err(anyhow::anyhow!("Failed to send message"))
+            Err(anyhow::anyhow!(
+                "Failed to send message: {} failed out of 1",
+                failed_msgs.len()
+            ))
         } else if let Some(result) = result_vec {
             Ok(result.into_iter().next())
         } else {
107-139: Consider performance impact of cloning messages.

Line 123 clones each message before passing it to the callback, which could be expensive for large payloads (especially since CanonicalMessage.payload is now bytes::Bytes per the summary). The clone is necessary to store failed messages, but for high-throughput scenarios this overhead might be significant.

If performance becomes a concern, consider:

Documenting this behavior in the function's docstring

Exploring alternative error handling that avoids cloning (e.g., returning indices of failed messages)

Example documentation addition:
 /// A helper function to send messages in bulk by calling `send` for each one.
 /// This is useful for `MessagePublisher` implementations that don't have a native bulk sending mechanism.
+/// Note: Each message is cloned to preserve failed messages for error reporting.
 pub async fn send_batch_helper<P: MessagePublisher + ?Sized>(

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb2d88a and f93b669.

📒 Files selected for processing (21)

src/endpoints/amqp.rs (9 hunks)
src/endpoints/file.rs (5 hunks)
src/endpoints/http.rs (4 hunks)
src/endpoints/kafka.rs (9 hunks)
src/endpoints/memory.rs (6 hunks)
src/endpoints/mongodb.rs (5 hunks)
src/endpoints/mqtt.rs (6 hunks)
src/endpoints/nats.rs (5 hunks)
src/endpoints/static_endpoint.rs (3 hunks)
src/middleware/deduplication.rs (1 hunks)
src/middleware/dlq.rs (1 hunks)
src/middleware/metrics.rs (1 hunks)
src/route.rs (9 hunks)
src/traits.rs (1 hunks)
tests/integration/amqp.rs (2 hunks)
tests/integration/common.rs (8 hunks)
tests/integration/kafka.rs (2 hunks)
tests/integration/mongodb.rs (3 hunks)
tests/integration/mqtt.rs (3 hunks)
tests/integration/nats.rs (3 hunks)
tests/integration_test.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (14)

tests/integration/amqp.rs (1)

tests/integration/common.rs (1)

add_performance_result (37-39)

src/endpoints/kafka.rs (2)

src/traits.rs (4)

into_batch_commit_func (33-41)

send (90-99)

send_batch (85-88)

receive_batch (48-51)

src/middleware/deduplication.rs (1)

receive_batch (118-125)

src/endpoints/memory.rs (3)

src/endpoints/http.rs (2)

send_batch (260-268)

receive_batch (94-114)

src/traits.rs (2)

send_batch (85-88)

receive_batch (48-51)

tests/integration/common.rs (1)

receive_batch (385-393)

src/endpoints/mongodb.rs (2)

src/traits.rs (4)

into_batch_commit_func (33-41)

send_batch (85-88)

send_batch_helper (109-139)

receive_batch (48-51)

src/middleware/deduplication.rs (1)

receive_batch (118-125)

tests/integration/mqtt.rs (1)

tests/integration/common.rs (3)

add_performance_result (37-39)

measure_read_performance (400-455)

measure_write_performance (297-371)

src/middleware/deduplication.rs (1)

src/traits.rs (1)

into_batch_commit_func (33-41)

tests/integration/common.rs (3)

src/endpoints/memory.rs (2)

new (34-37)

receive_batch (168-195)

src/route.rs (1)

bounded (147-147)

src/traits.rs (1)

receive_batch (48-51)

src/endpoints/static_endpoint.rs (1)

src/traits.rs (3)

into_batch_commit_func (33-41)

send_batch (85-88)

send_batch_helper (109-139)

tests/integration_test.rs (5)

tests/integration/nats.rs (1)

test_nats_performance_direct (52-93)

tests/integration/amqp.rs (1)

test_amqp_performance_direct (52-87)

tests/integration/kafka.rs (1)

test_kafka_performance_direct (52-110)

tests/integration/mqtt.rs (1)

test_mqtt_performance_direct (53-102)

tests/integration/mongodb.rs (1)

test_mongodb_performance_direct (52-101)

src/middleware/metrics.rs (1)

src/traits.rs (6)

send (90-99)

send_batch (85-88)

as_any (78-78)

as_any (104-104)

receive (54-64)

receive_batch (48-51)

src/middleware/dlq.rs (4)

src/endpoints/mod.rs (1)

create_publisher_from_route (94-106)

src/middleware/deduplication.rs (1)

as_any (127-129)

src/endpoints/mongodb.rs (2)

send (69-104)

send_batch (106-114)

src/traits.rs (4)

send (90-99)

send_batch (85-88)

as_any (78-78)

as_any (104-104)

tests/integration/nats.rs (1)

tests/integration/common.rs (3)

add_performance_result (37-39)

measure_read_performance (400-455)

measure_write_performance (297-371)

tests/integration/mongodb.rs (1)

tests/integration/common.rs (3)

add_performance_result (37-39)

measure_read_performance (400-455)

measure_write_performance (297-371)

src/route.rs (4)

src/traits.rs (3)

send (90-99)

receive_batch (48-51)

send_batch (85-88)

src/endpoints/mod.rs (2)

create_publisher_from_route (94-106)

create_consumer_from_route (28-34)

src/middleware/deduplication.rs (1)

receive_batch (118-125)

tests/integration/common.rs (1)

receive_batch (385-393)

🔇 Additional comments (38)

src/middleware/metrics.rs (1)

34-47: LGTM!

The metrics instrumentation correctly measures duration before/after the inner call and records metrics only on success. The histogram metric for processing duration is useful for latency analysis.

Also applies to: 101-114

tests/integration/mongodb.rs (1)

76-98: LGTM!

The refactor to capture write_perf and read_perf in local variables and aggregate them via add_performance_result aligns with the patterns in other integration tests (MQTT, AMQP, NATS).

tests/integration/mqtt.rs (1)

83-99: LGTM!

The refactor to use local write_perf and read_perf variables with add_performance_result is consistent with the updated test infrastructure pattern across other integration tests.

tests/integration/nats.rs (1)

68-90: LGTM!

The refactor to store write_perf and read_perf in local variables and aggregate via add_performance_result follows the established pattern across the integration test suite.

tests/integration/amqp.rs (1)

63-84: LGTM!

The performance test refactoring follows the established pattern across the integration test suite. The 3-second sleep between write and read is consistent with other tests (MQTT, NATS).

src/endpoints/static_endpoint.rs (2)

36-44: LGTM!

The send_batch implementation correctly delegates to the send_batch_helper utility, which iterates over messages and calls the single-message send method. This is the expected pattern for endpoints that don't have native batch support.

80-87: LGTM!

The receive_batch implementation correctly delegates to receive() and wraps the single-message commit function using into_batch_commit_func. Ignoring _max_messages is appropriate for a static endpoint that always returns a single static message.

src/endpoints/mongodb.rs (3)

2-5: LGTM!

The import additions for into_batch_commit_func and BatchCommitFunc align with the batch-oriented API changes across the codebase.

218-225: LGTM!

The receive_batch implementation correctly delegates to receive() and wraps the commit function. Given MongoDB's change stream / polling approach for consuming messages, returning single messages at a time is appropriate.

44-48: LGTM!

The payload handling changes are appropriate:

Line 46: Using .into() for Bytes → Vec<u8> conversion when creating CanonicalMessage

Line 96: Using .to_vec() to convert Bytes payload to Vec<u8> for BSON serialization

Both changes align with the PR's switch to bytes::Bytes for CanonicalMessage.payload.

Also applies to: 94-98

tests/integration/kafka.rs (1)

68-82: Good use of scoped blocks for resource lifecycle management.

The scoped block pattern ensures the publisher is properly disconnected before proceeding to the consumer test. This aligns well with the patterns used in other endpoint tests (AMQP, NATS, MongoDB).

tests/integration_test.rs (1)

77-113: Good consolidation of direct performance tests.

The unified test_all_performance_direct test provides several benefits:

Sequential execution ensures consistent measurements without resource contention

PerformanceSummaryPrinter with RAII pattern ensures summary output even on panic

Feature-gated blocks allow running only enabled endpoint tests

The #[ignore] attribute appropriately marks this as an opt-in performance test.
src/endpoints/mqtt.rs (3)
49-57: Previous review concern addressed, but race window remains.

The strong_count check addresses the original concern about aborting while clones exist. However, as noted in the previous review, there's still a small race window between checking strong_count and the actual drop. This is acceptable given the complexity of alternatives, but worth documenting.

Consider adding a brief comment about the race window:
 impl Drop for MqttPublisher {
     fn drop(&mut self) {
         // When the publisher is dropped, abort its background eventloop task.
         // Only abort when this is the last reference to the eventloop handle.
+        // Note: There's a small race window here, but it's acceptable for this use case.
         if Arc::strong_count(&self.eventloop_handle) == 1 {
             self.eventloop_handle.abort();
         }
     }
 }
72-80: Batch implementations correctly delegate to helpers.

Both send_batch and receive_batch properly use the trait utilities (send_batch_helper and into_batch_commit_func) for consistent batch handling across endpoints.

Also applies to: 147-154

118-123: MqttConsumer Drop unconditionally aborts eventloop.

Unlike MqttPublisher, MqttConsumer's Drop doesn't check strong_count. This is intentional—consumers lack a with_topic clone mechanism and only exist as a single instance, so the eventloop can always be safely aborted on drop. Publishers use the strong_count check because with_topic creates multiple instances sharing the same eventloop.
src/middleware/deduplication.rs (1)

63-111: Cleanup task error handling is now robust.

The background cleanup task properly handles:

SystemTime errors (lines 68-74)

Invalid timestamp length (lines 81-84)

Conversion errors (lines 88-94)

Removal errors (lines 99-102)

Iteration errors (lines 105-108)

This addresses the previous review concern about unwrap() calls.

src/endpoints/http.rs (3)

94-114: receive_batch correctly wraps single-message path with batch interface.

The implementation properly:

Receives a single HTTP request as a message

Wraps the response-sending commit function with into_batch_commit_func

Returns a single-element vector

This pattern aligns with other endpoints (MQTT, deduplication middleware) that don't support true batching.

259-268: Honest documentation about non-bulk HTTP sending.

The comment "not a real bulk, but fast enough" appropriately sets expectations. The send_batch_helper will process messages sequentially, which is the correct behavior for HTTP where each request needs its own response.

6-6: CommitFunc removal is correct; BoxFuture remains properly imported and used.

CommitFunc is not referenced anywhere in this file, so removing it from the imports is safe. BoxFuture is correctly retained in the imports as it's explicitly used in the type cast at line 105 (as BoxFuture<'static, ()>). The trait bounds are satisfied—the commit closure properly returns a BoxFuture through an explicit type annotation rather than inference.

src/endpoints/amqp.rs (3)

73-90: LGTM! Persistent delivery mode correctly implemented.

The conditional delivery mode setting aligns with AMQP semantics: mode 2 makes messages durable when persistent is true. The metadata handling correctly converts keys to ShortString before insertion.

103-106: LGTM! skip_ack correctly gates publisher confirmations.

When skip_ack is false, the code awaits the broker's publisher confirmation, providing at-least-once delivery guarantees. When true, it skips the wait for higher throughput.

247-300: LGTM! Efficient bulk acknowledgment using AMQP's multiple flag.

The implementation correctly:

Waits for the first message (blocking)

Greedily consumes buffered messages up to max_messages

Uses BasicAckOptions { multiple: true } to acknowledge all messages up to last_delivery in a single operation

This is the recommended AMQP pattern for bulk acks.

src/endpoints/kafka.rs (2)

161-176: LGTM! skip_ack correctly controls delivery acknowledgment.

The logic correctly:

Awaits delivery report when skip_ack is false (at-least-once)

Uses fire-and-forget when skip_ack is true (higher throughput)

Properly handles both paths with appropriate error handling

139-139: The payload slice usage is correct. CanonicalMessage.payload is of type Bytes, which implements Deref<Target = [u8]>. The slice operation &message.payload[..] correctly produces &[u8], which is the accepted parameter type for FutureRecord::payload(). No changes needed.

src/route.rs (3)

32-50: LGTM! Internal shutdown channels prevent race conditions.

The per-iteration internal shutdown channel prevents both the outer loop and inner task from competing for the same external shutdown signal. The outer loop:

Creates a new internal channel per iteration

Spawns the inner task with the internal receiver

Can signal shutdown via internal_shutdown_tx when the external signal arrives

This decoupling is a solid pattern for managing nested async tasks.

118-127: LGTM! Partial batch failure handling commits successes before erroring.

When send_batch returns partial failures:

Line 123: Commits the successful messages via commit(response).await

Line 124: Returns an error with the failed count

This ensures successful messages are acknowledged before triggering a reconnect, preventing duplicate processing.

164-172: LGTM! Worker correctly propagates partial batch failures.

The worker:

Commits successful messages (line 165)

Creates a detailed error for the failed count (line 166)

Propagates the error to the main task via err_tx (line 169)

This aligns with the sequential path's handling and ensures route teardown on persistent failures.

src/middleware/dlq.rs (3)

32-35: LGTM! Box::pin breaks recursive async type definition.

The comment correctly explains that Box::pin is necessary here because:

create_publisher_from_route can apply middlewares

DLQ middleware calls create_publisher_from_route for the DLQ endpoint

This creates a potential recursive type definition

The Box::pin breaks the cycle by boxing the future.

124-171: LGTM! Bulk DLQ retry logic correctly implemented.

The partial failure path:

Retries only the failed messages (line 150)

Uses exponential backoff capped at 5s (lines 151-152)

Tracks retry attempts (line 130)

Returns successful responses with original failed messages (line 143)

Propagates errors after exhausting retries (lines 164, 168)

This addresses the past review concern about missing retry logic for bulk operations.

173-223: LGTM! Complete failure now sends all messages to DLQ.

When inner.send_batch returns Err (complete failure), the code:

Attempts to send all messages to the DLQ (line 187-190)

Uses retry logic with backoff (lines 181-221)

Returns original messages as failed after successful DLQ send (line 194)

This addresses the past critical review concern about bypassing DLQ on complete failures.

src/endpoints/file.rs (2)

53-57: LGTM! Correct handling of immutable Bytes payload.

Since message.payload is now bytes::Bytes (immutable), the code correctly:

Writes the payload as-is via write_all(&message.payload)

Writes the newline separately via write_all(b"\n")

The BufWriter efficiently handles these sequential writes.

113-123: LGTM! read_until with explicit newline trimming.

The change from read_line to read_until(b'\n', ...) with explicit trimming:

Makes the newline handling more explicit and clear

Correctly checks ends_with(b"\n") and pops the byte

Constructs the CanonicalMessage from the trimmed buffer

This is a more explicit approach that clearly shows the newline handling.

src/endpoints/memory.rs (2)

119-134: LGTM! Memory channel batch send is atomic.

The implementation correctly:

Sends the entire batch as a single atomic operation via sender.send(messages)

Returns (None, Vec::new()) to indicate no responses and all messages succeeded

This aligns with memory channel semantics where sends are all-or-nothing

173-191: LGTM! Efficient batch receive with split_off.

The buffer management correctly:

Waits for a new batch when empty and reverses it (lines 175-181)

Calculates split_at = buffer.len() - num_to_take (line 186)

Uses split_off(split_at) to extract the tail num_to_take messages (line 190)

Reverses back to original order (line 191)

Since the buffer is reversed, split_off from the tail gives us the oldest messages in FIFO order. This addresses the past review concern about incorrect split_off usage.

src/endpoints/nats.rs (3)

56-58: LGTM! Helpful warning for Core mode with skip_ack.

The warning correctly informs users that Core NATS doesn't support acknowledgements, so the skip_ack flag is ignored in this mode. This prevents confusion when the flag doesn't have the expected effect.

38-60: LGTM! Clear Core vs JetStream initialization.

The code correctly:

Selects Core or JetStream based on config.no_jetstream

Creates JetStream streams when in JetStream mode (lines 42-49)

Warns when no stream is configured in JetStream mode (line 51)

Logs the selected mode clearly (lines 55-56)

The multiplexing pattern is well-structured.

265-282: LGTM! Efficient greedy batch fetching for JetStream.

The batch receive logic:

Uses a 5-second timeout for the first message (line 266)

Greedily fetches additional messages via try_next() (line 275)

Stops when the buffer is empty or max_messages is reached

This pattern efficiently batches messages without blocking indefinitely.

src/traits.rs (1)

20-28: LGTM!

The conversion logic correctly wraps a single response into a vector and delegates to the batch commit function.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

tests/integration/common.rs (2)

344-354: Infinite loop risk on persistent errors.

If send_batch consistently returns Err, the inner loop never breaks. Additionally, messages_to_send.clone() on every iteration is wasteful.

🔎 Apply this diff to add retry limits and avoid unnecessary cloning:

                 // Retry sending the batch if some messages fail.
                 let mut messages_to_send = batch;
+                let mut retry_count = 0;
+                const MAX_RETRIES: usize = 5;
                 loop {
-                    match publisher_clone.send_batch(messages_to_send.clone()).await {
+                    match publisher_clone.send_batch(std::mem::take(&mut messages_to_send)).await {
                         Ok((_, failed)) if failed.is_empty() => break, // All sent successfully
-                        Ok((_, failed)) => messages_to_send = failed,  // Retry failed messages
-                        Err(e) => eprintln!("Error sending bulk messages: {}", e),
+                        Ok((_, failed)) => {
+                            messages_to_send = failed;
+                            retry_count = 0; // Reset on partial success
+                        }
+                        Err(e) => {
+                            eprintln!("Error sending bulk messages: {}", e);
+                            retry_count += 1;
+                            if retry_count >= MAX_RETRIES {
+                                eprintln!("Max retries reached, giving up on batch");
+                                break;
+                            }
+                        }
                     }
                     tokio::time::sleep(Duration::from_millis(10)).await; // Backoff before retry
                 }

458-460: Documentation mismatch: "commas" vs underscores.

The comment states "commas as thousand separators" but the implementation uses underscores on line 470.

🔎 Apply this diff:

-/// Formats a number with commas as thousand separators.
+/// Formats a number with underscores as thousand separators.
 /// Handles both integers and floating-point numbers.

🧹 Nitpick comments (3)

tests/integration/common.rs (1)

466-476: Negative numbers may format incorrectly.

The current logic treats the minus sign as part of the integer, which can result in misplaced underscores (e.g., -123456 could become -_123_456). Since this is test utility code and performance values are typically positive, this is a minor edge case.

🔎 Apply this diff to handle negative numbers correctly:

 pub fn format_pretty<N: Display>(num: N) -> String {
     let s = num.to_string();
+    let (sign, s) = if s.starts_with('-') {
+        ("-", &s[1..])
+    } else {
+        ("", s.as_str())
+    };
     let mut parts = s.splitn(2, '.');
     let integer_part = parts.next().unwrap_or("");
     let fractional_part = parts.next();
 
     // ... existing formatting logic ...
 
     match fractional_part {
         Some(frac) => {
             let truncated_frac = if frac.len() > 2 { &frac[..2] } else { frac };
-            format!("{}.{}", formatted_integer, truncated_frac)
+            format!("{}{}.{}", sign, formatted_integer, truncated_frac)
         }
-        None => formatted_integer,
+        None => format!("{}{}", sign, formatted_integer),
     }
 }

src/endpoints/kafka.rs (2)

271-289: Inefficient pattern creates Vec for single message.

The receive method creates a Vec and TopicPartitionList, processes one message through process_message, then calls unwrap() to extract it. While the unwrap() is technically safe (since process_message returns Err before reaching line 275 if it doesn't push), this pattern is unnecessarily complex and allocates a Vec for a single message.

🔎 Suggested refactor to avoid Vec allocation:

 async fn receive(&mut self) -> anyhow::Result<(CanonicalMessage, CommitFunc)> {
     let message = self.consumer.recv().await?;
-    let mut tpl = TopicPartitionList::new();
-    let mut messages = Vec::new();
-    process_message(message, &mut messages, &mut tpl)?;
-    let canonical_message = messages.pop().unwrap();
+    
+    let payload = message
+        .payload()
+        .ok_or_else(|| anyhow!("Kafka message has no payload"))?;
+    let mut canonical_message = CanonicalMessage::new(payload.to_vec());
+    if let Some(headers) = message.headers() {
+        if headers.count() > 0 {
+            let mut metadata = std::collections::HashMap::new();
+            for header in headers.iter() {
+                metadata.insert(
+                    header.key.to_string(),
+                    String::from_utf8_lossy(header.value.unwrap_or_default()).to_string(),
+                );
+            }
+            canonical_message.metadata = Some(metadata);
+        }
+    }
+    
+    let mut tpl = TopicPartitionList::new();
+    tpl.add_partition_offset(
+        message.topic(),
+        message.partition(),
+        Offset::Offset(message.offset() + 1),
+    ).map_err(|e| anyhow::anyhow!(e))?;

     // The commit function for Kafka needs to commit the offset of the processed message.
     // We can't move `self.consumer` into the closure, but we can commit by position.

348-378: Consider accepting &[u8] in CanonicalMessage::new to avoid allocation.

Line 357 calls CanonicalMessage::new(payload.to_vec()), which allocates a Vec then immediately converts it to Bytes. Instead, update CanonicalMessage::new to accept payload: &[u8] and use Bytes::copy_from_slice(payload) internally. This eliminates the intermediate Vec allocation while achieving the same result. Update the caller to pass payload directly.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f93b669 and d8745c1.

📒 Files selected for processing (2)

src/endpoints/kafka.rs (9 hunks)
tests/integration/common.rs (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/endpoints/kafka.rs (1)

src/traits.rs (3)

send (90-99)

send_batch (85-88)

send_batch_helper (109-139)

🔇 Additional comments (12)

tests/integration/common.rs (6)

1-22: LGTM!

The imports are well-organized and appropriate for the functionality being implemented.

24-42: LGTM!

The performance tracking infrastructure is well-designed for test utilities. Using Lazy<Mutex<Vec<...>>> is appropriate for collecting results across tests.

251-274: LGTM!

Good use of Drop for RAII-style summary printing. The table formatting is clean and handles the empty case appropriately.

373-398: LGTM!

The MockConsumer implementation appropriately blocks to prevent routes from exiting during publisher-only tests. The new receive_batch method correctly mirrors the existing receive behavior.

400-456: LGTM!

The batch-based read performance measurement is well-structured. The shadowing of num_messages on line 440 is intentional and correctly uses the actual received count for throughput calculation. The timeout handling and exit conditions are appropriate.

289-295: LGTM!

The performance test constants are reasonable, and the generate_message helper is straightforward.

src/endpoints/kafka.rs (6)

2-2: LGTM! Import additions support batch APIs.

The new imports (BatchCommitFunc, TryStreamExt, Arc) are correctly added to support the batch receive implementation and related functionality.

Also applies to: 6-6, 17-17

24-24: LGTM! Field rename improves clarity.

The rename from await_ack to skip_ack provides clearer semantics: skip_ack=true means fire-and-forget, skip_ack=false means await delivery confirmation. All usages are consistent with this new naming.

Also applies to: 106-106, 114-114

134-134: LGTM! Correct slice creation from Bytes.

The payload handling correctly creates a byte slice &message.payload[..] from the bytes::Bytes type, which is the appropriate way to pass payload data to FutureRecord.

156-172: LGTM! Acknowledgment logic is correct.

The conditional logic correctly implements the skip_ack semantics: when skip_ack is false, the code awaits delivery confirmation for at-least-once guarantees; when true, it uses fire-and-forget for higher throughput.

175-183: LGTM! Batch send properly delegates to helper.

The send_batch implementation correctly uses the send_batch_helper from traits.rs, which processes messages individually through the single-message send() method and collects successful and failed messages.

291-341: LGTM! Batch receive implementation is well-structured.

The receive_batch implementation correctly:

Blocks on the first message to ensure at least one is available

Greedily consumes additional buffered messages up to the limit using try_next()

Tracks offsets properly in last_offset_tpl

Includes defensive check for messages_len > 0 before committing (handles edge case where stream ends before first message)

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

tests/integration/common.rs (2)

428-437: Address infinite loop risk in retry logic.

The retry loop can spin indefinitely if send_batch consistently returns Err, as there's no retry limit or break condition. Additionally, messages_to_send.clone() on each iteration is wasteful.

Suggested fix

 // Retry sending the batch if some messages fail.
 let mut messages_to_send = batch;
+let mut retry_count = 0;
+const MAX_RETRIES: usize = 5;
 loop {
-    match publisher_clone.send_batch(messages_to_send.clone()).await {
+    match publisher_clone.send_batch(std::mem::take(&mut messages_to_send)).await {
         Ok((_, failed)) if failed.is_empty() => break, // All sent successfully
-        Ok((_, failed)) => messages_to_send = failed,  // Retry failed messages
-        Err(e) => eprintln!("Error sending bulk messages: {}", e),
+        Ok((_, failed)) => {
+            messages_to_send = failed;
+            retry_count = 0; // Reset on partial success
+        }
+        Err(e) => {
+            eprintln!("Error sending bulk messages: {}", e);
+            retry_count += 1;
+            if retry_count >= MAX_RETRIES {
+                eprintln!("Max retries reached, giving up on batch");
+                break;
+            }
+        }
     }
     tokio::time::sleep(Duration::from_millis(10)).await; // Backoff before retry
 }

542-544: Fix documentation to match implementation.

The comment mentions "commas" but the implementation uses underscores as thousand separators (line 554).

-/// Formats a number with commas as thousand separators.
+/// Formats a number with underscores as thousand separators.
 /// Handles both integers and floating-point numbers.

🧹 Nitpick comments (2)

src/endpoints/amqp.rs (2)
110-119: Consider true batch publishing for better performance.

The current implementation delegates to per-message send() via the helper. While correct, AMQP supports publishing multiple messages more efficiently within a single channel operation.

Note

For now, the helper pattern maintains consistency with other endpoints. A future optimization could batch multiple basic_publish calls before waiting for confirmations.

227-243: Consider handling additional AMQP header value types.

The helper only extracts LongString values from AMQP headers. Other types like integers, booleans, or timestamps are silently ignored, which may result in metadata loss.
Suggested enhancement

If metadata preservation is important, consider converting other AMQPValue variants to strings:
 for (key, value) in headers.inner().iter() {
-    if let lapin::types::AMQPValue::LongString(s) = value {
-        metadata.insert(key.to_string(), s.to_string());
-    }
+    let value_str = match value {
+        lapin::types::AMQPValue::LongString(s) => s.to_string(),
+        lapin::types::AMQPValue::ShortString(s) => s.to_string(),
+        lapin::types::AMQPValue::Boolean(b) => b.to_string(),
+        lapin::types::AMQPValue::LongInt(i) => i.to_string(),
+        // ... other types as needed
+        _ => continue,
+    };
+    metadata.insert(key.to_string(), value_str);
 }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8745c1 and 2ed6caf.

📒 Files selected for processing (8)

src/endpoints/amqp.rs (9 hunks)
src/endpoints/kafka.rs (9 hunks)
tests/integration/amqp.rs (2 hunks)
tests/integration/common.rs (9 hunks)
tests/integration/kafka.rs (2 hunks)
tests/integration/mongodb.rs (2 hunks)
tests/integration/mqtt.rs (3 hunks)
tests/integration/nats.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/mqtt.rs

🧰 Additional context used

🧬 Code graph analysis (6)

src/endpoints/amqp.rs (3)

src/middleware/dlq.rs (2)

send (89-109)

send_batch (111-225)

src/traits.rs (3)

send (90-99)

send_batch (85-88)

send_batch_helper (109-139)

src/endpoints/memory.rs (4)

send_batch (119-134)

new (34-37)

new (102-108)

new (150-157)

tests/integration/mongodb.rs (2)

tests/integration/common.rs (4)

add_performance_result (39-41)

run_direct_perf_test (305-371)

new (51-55)

new (122-138)

src/endpoints/mongodb.rs (2)

new (58-64)

new (129-170)

tests/integration/kafka.rs (2)

tests/integration/common.rs (4)

add_performance_result (39-41)

run_direct_perf_test (305-371)

new (51-55)

new (122-138)

src/endpoints/kafka.rs (2)

new (28-108)

new (201-258)

tests/integration/nats.rs (1)

tests/integration/common.rs (2)

add_performance_result (39-41)

run_direct_perf_test (305-371)

tests/integration/amqp.rs (1)

tests/integration/common.rs (2)

add_performance_result (39-41)

run_direct_perf_test (305-371)

src/endpoints/kafka.rs (1)

src/traits.rs (2)

send (90-99)

send_batch (85-88)

🔇 Additional comments (20)

src/endpoints/amqp.rs (4)

2-2: LGTM: Import additions align with batch-oriented refactor.

The new imports (BatchCommitFunc, TryStreamExt, ShortString) directly support the batch operations and AMQP header handling introduced in this file.

Also applies to: 6-6, 13-13

20-26: LGTM: Field separation clarifies publisher behavior.

Splitting persistence and acknowledgment concerns into separate persistent and skip_ack fields improves clarity over a single flag.

72-108: LGTM: Send logic correctly implements persistence and acknowledgment control.

The delivery mode is properly set for persistent messages, metadata headers are converted to ShortString keys as required by AMQP, and the acknowledgment wait is correctly gated by skip_ack.

247-301: LGTM: Batch receive implementation correctly uses AMQP multiple acknowledgment.

The implementation properly:

Waits for the first message (blocking)

Greedily consumes buffered messages up to max_messages (non-blocking)

Uses AMQP's multiple: true flag to acknowledge all messages up to the last delivery tag in a single operation

This is an efficient pattern for AMQP batch consumption.

src/endpoints/kafka.rs (5)

120-127: LGTM: Best-effort flush in Drop is appropriate.

The non-blocking 5-second flush attempt in Drop provides a reasonable cleanup mechanism. Errors are properly swallowed since Drop cannot propagate them.

132-172: LGTM: Send logic correctly implements fire-and-forget and at-least-once modes.

The payload slice and skip_ack branching properly distinguish between:

skip_ack: false: Await delivery report for at-least-once guarantees

skip_ack: true: Enqueue message without waiting for high throughput

174-182: LGTM: Batch helper leverages Kafka's internal batching.

While this delegates to per-message send(), Kafka's producer automatically batches based on linger.ms and other settings, so the helper pattern is reasonable.

263-269: LGTM: Consumer cleanup via unsubscribe is appropriate.

Calling unsubscribe() in Drop ensures the consumer cleanly leaves the group.

294-381: LGTM: Batch receive correctly tracks offsets and commits.

The implementation properly:

Waits for the first message, then greedily consumes buffered messages

Tracks the next offset to commit (message.offset() + 1) per Kafka semantics

Commits only when messages were received

The process_message helper correctly extracts payload, metadata, and updates the offset list.

tests/integration/mongodb.rs (1)

50-86: LGTM: Performance test refactoring improves consistency.

The migration to run_direct_perf_test with closure-based factories consolidates the performance testing pattern across endpoints.

tests/integration/nats.rs (1)

49-80: LGTM: NATS test follows consolidated performance pattern.

The refactoring aligns with other endpoints and properly configures skip_ack: false for acknowledgment-based performance testing.

tests/integration/amqp.rs (1)

49-75: LGTM: AMQP test refactoring enables fire-and-forget performance testing.

The use of skip_ack: true appropriately tests high-throughput fire-and-forget mode for AMQP.

tests/integration/kafka.rs (1)

49-81: LGTM: Kafka test appropriately tunes producer for throughput.

The custom producer options and skip_ack: true configuration properly test Kafka's fire-and-forget high-throughput mode.

tests/integration/common.rs (7)

24-44: LGTM: Performance result collection infrastructure is well-designed.

The global collector with thread-safe access and simple add function provides a clean way to aggregate results across integration tests.

253-281: LGTM: Summary printing via Drop ensures consolidated output.

Using Drop to print the performance table guarantees the summary appears after all tests complete, providing a clean consolidated view.

296-371: LGTM: Direct performance test runner provides excellent abstraction.

The function cleanly orchestrates single and batch performance tests with proper setup delays and resource cleanup, eliminating code duplication across endpoint tests.

469-477: LGTM: MockConsumer batch support maintains consistency.

The receive_batch implementation matches the existing receive() pattern of blocking forever for publisher-only tests.

484-540: LGTM: Batch read measurement correctly tracks message count.

The loop properly:

Calculates remaining messages to fetch

Calls receive_batch with appropriate limit

Updates the atomic counter for accurate throughput calculation

Handles timeouts and errors gracefully

571-626: LGTM: Single-message write performance measurement is correct.

The implementation properly sends individual messages with per-message retry logic and accurate throughput calculation.

628-661: LGTM: Single-message read measurement is straightforward and correct.

The loop properly calls receive() for individual messages, spawns commits asynchronously, and handles timeouts.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/endpoints/nats.rs (2)

180-180: Unexplained sleep still lacks documentation.

This hardcoded 100ms sleep remains undocumented. As flagged in previous reviews, arbitrary sleeps can mask race conditions and may not be sufficient under load.

227-233: ACK failures still silently swallowed.

As previously noted, unwrap_or_else logs the error but doesn't propagate it. The CommitFunc signature prevents returning Result, but consider incrementing a dedicated metric counter for ACK failures to enable operational alerting.

🧹 Nitpick comments (3)

src/endpoints/nats.rs (2)

25-26: Consider clarifying the flag semantics.

The comment now correctly references delayed_ack, but the flag's meaning could be clearer. When delayed_ack is false, the code waits for acknowledgment (line 102-104). Consider rephrasing to: "If false, wait for JetStream acknowledgment; if true, fire-and-forget."

116-126: Consider implementing true batching for better performance.

The current implementation calls send() sequentially for each message rather than leveraging NATS batch publish capabilities. For high-throughput scenarios, consider using NATS client batch APIs if available.

Also, the comment "not a real bulk, but fast enough" appears twice (lines 120 and 122).

tests/integration/common.rs (1)

614-622: Consider adding retry limit for consistency.

The inner retry loop lacks a maximum retry count. While the outer loop will eventually terminate when the channel closes, adding a MAX_RETRIES limit (similar to measure_write_performance) would provide more predictable behavior.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ed6caf and a117227.

📒 Files selected for processing (11)

src/endpoints/amqp.rs (9 hunks)
src/endpoints/kafka.rs (9 hunks)
src/endpoints/nats.rs (5 hunks)
src/middleware/deduplication.rs (1 hunks)
src/models.rs (15 hunks)
tests/integration/amqp.rs (2 hunks)
tests/integration/common.rs (9 hunks)
tests/integration/kafka.rs (2 hunks)
tests/integration/mongodb.rs (3 hunks)
tests/integration/mqtt.rs (3 hunks)
tests/integration/nats.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (10)

src/middleware/deduplication.rs (2)

src/traits.rs (3)

into_batch_commit_func (33-41)

receive (54-64)

receive_batch (48-51)

src/models.rs (1)

new (107-112)

tests/integration/nats.rs (1)

tests/integration/common.rs (7)

add_performance_result (39-41)

run_direct_perf_test (305-372)

run_performance_pipeline_test (153-159)

run_pipeline_test (149-151)

run_test_with_docker (285-294)

new (51-55)

new (122-138)

tests/integration/mongodb.rs (2)

tests/integration/common.rs (4)

add_performance_result (39-41)

run_direct_perf_test (305-372)

new (51-55)

new (122-138)

src/endpoints/mongodb.rs (2)

new (58-64)

new (129-170)

src/endpoints/amqp.rs (1)

src/traits.rs (3)

send (90-99)

send_batch (85-88)

send_batch_helper (109-139)

tests/integration/mqtt.rs (2)

tests/integration/common.rs (4)

add_performance_result (39-41)

run_direct_perf_test (305-372)

new (51-55)

new (122-138)

src/endpoints/mqtt.rs (2)

new (23-35)

new (95-115)

src/endpoints/nats.rs (1)

src/traits.rs (2)

send_batch (85-88)

send_batch_helper (109-139)

tests/integration/amqp.rs (2)

tests/integration/common.rs (6)

add_performance_result (39-41)

run_direct_perf_test (305-372)

run_test_with_docker (285-294)

setup_logging (227-251)

new (51-55)

new (122-138)

src/endpoints/amqp.rs (2)

new (29-57)

new (131-165)

tests/integration/kafka.rs (1)

tests/integration/common.rs (2)

add_performance_result (39-41)

run_direct_perf_test (305-372)

src/endpoints/kafka.rs (1)

src/traits.rs (3)

send (90-99)

send_batch (85-88)

send_batch_helper (109-139)

src/models.rs (2)

src/endpoints/nats.rs (2)

new (30-67)

new (144-198)

src/middleware/dlq.rs (1)

new (23-41)

🔇 Additional comments (32)

tests/integration/mongodb.rs (1)

60-88: LGTM! Clean refactoring to unified performance test pattern.

The migration to run_direct_perf_test with lazy publisher/consumer creation via closures is consistent with the broader test refactoring. The collection drop before the test ensures a clean state.

tests/integration/mqtt.rs (1)

52-82: LGTM! Proper resource sizing and unique identifiers.

The test properly sizes the queue capacity (PERF_TEST_MESSAGE_COUNT * 2) to accommodate both batch and single message tests, and uses UUIDs for client IDs to prevent conflicts in concurrent test scenarios.

tests/integration/kafka.rs (1)

49-77: LGTM! Appropriate configuration for performance testing.

The use of delayed_ack: true aligns with the high-throughput goals of the performance test, and the closure-based publisher/consumer creation follows the established pattern.

src/middleware/deduplication.rs (2)

54-70: LGTM! Race condition properly addressed with atomic compare-and-swap.

The previous get-then-insert race condition has been correctly fixed using compare_and_swap, ensuring that only one consumer can claim a message ID even under concurrent access. The success/failure branching properly handles both new and duplicate messages.

76-121: LGTM! Robust error handling in cleanup task.

The background cleanup task now properly handles all error cases:

SystemTime errors are caught and logged (lines 77-82)

Invalid byte lengths are validated and logged (lines 90-93)

Conversion errors are handled explicitly (lines 97-103)

Database removal errors are logged (lines 108-111)

The probabilistic cleanup trigger (~2% chance) is reasonable for amortizing cleanup cost.

tests/integration/nats.rs (1)

49-80: LGTM! Proper configuration for NATS direct performance testing.

The test uses delayed_ack: false for reliable delivery guarantees, and the stream/subject naming (perf_nats_direct / perf_nats_direct.subject) is clear and consistent.

tests/integration/amqp.rs (1)

49-71: LGTM! Appropriate configuration trade-off for performance testing.

The direct performance test correctly uses delayed_ack: true to maximize throughput, while pipeline tests use delayed_ack: false for reliability. This configuration difference is intentional and appropriate for the different test scenarios.

src/endpoints/amqp.rs (3)

42-56: LGTM! Proper queue durability configuration.

The queue durability is correctly derived from !config.no_persistence, ensuring that persistent queues survive broker restarts while non-persistent queues prioritize performance.

73-107: LGTM! Correct persistent message handling.

The conditional delivery_mode = 2 for persistent messages is the correct AMQP standard. The metadata headers are properly converted to ShortString, and the delayed acknowledgment gating is appropriate for throughput optimization.

252-306: LGTM! Efficient batch consumption with proper acknowledgment.

The batch receive implementation:

Blocks for the first message (line 261)

Greedily consumes buffered messages (lines 269-282)

Optimizes bulk ack by setting multiple: true only when needed (line 290)

This design maximizes throughput while maintaining correctness.

src/endpoints/kafka.rs (3)

119-125: LGTM! Proper non-blocking cleanup on drop.

The Drop implementation performs a 5-second non-blocking flush, which is a reasonable best-effort cleanup. The comment correctly notes this is not a guarantee, directing users to explicit disconnect() for guaranteed delivery.

288-338: LGTM! Efficient batch consumption with greedy buffering.

The batch receive implementation:

Blocks for the first message (line 299)

Greedily consumes buffered messages (lines 310-322)

Properly tracks offsets for batch commit

The pattern maximizes throughput while maintaining at-least-once delivery semantics.

369-376: Offset + 1 is correct Kafka semantics.

The code correctly commits message.offset() + 1 to mark the next offset to read. When a consumer reads a message with its offset, it commits the next offset to indicate all previous records were processed; the committed offset refers to the next offset the consumer intends to read. For instance, if the message at offset 25 is read, offset 26 is then committed. The implementation correctly collects offsets during message processing and commits the batch using CommitMode::Async, which enqueues the commit request in a local work queue and returns immediately, appropriate for batch processing.

src/endpoints/nats.rs (4)

17-20: Clean internal multiplexing design.

The NatsClient enum provides a clear abstraction for handling both Core and JetStream modes.

38-60: Good defensive programming with clear warnings.

The conditional initialization properly handles both JetStream and Core modes with appropriate warnings for misconfigurations.

96-111: LGTM!

The pattern matching cleanly separates JetStream (with optional ack) from Core NATS (fire-and-forget) publishing logic.

284-293: Batch ACK error handling is consistent with single-message approach.

The batch commit logs ACK failures and continues with remaining messages. While errors aren't propagated (due to trait constraints), the approach is consistent with the single-message path.

tests/integration/common.rs (6)

24-41: LGTM!

The global performance result collector is well-designed for test infrastructure, with appropriate use of Lazy and Mutex.

253-281: Clean use of Drop trait for summary printing.

The consolidated performance table printed on drop provides a nice developer experience for viewing test results.

296-372: Well-designed generic test orchestration.

The function provides good abstraction for performance testing. Minor note: the sleep at line 363 is after dropping consumer/publisher, so it's likely a cleanup delay rather than "Allow consumer setup."

429-453: Previous retry loop issues have been addressed.

The code now includes MAX_RETRIES and uses std::mem::take to avoid unnecessary clones, resolving the concerns from previous reviews.

500-556: Robust batch read performance measurement.

The implementation correctly handles timeouts, empty batches, and calculates throughput based on actual received messages rather than requested count.

558-585: Documentation now matches implementation.

The comment correctly states "underscores" as thousand separators, resolving the previous documentation mismatch.

src/models.rs (9)

35-37: LGTM!

The default of 3 retry attempts for DLQ is reasonable.

52-104: Complex but necessary custom deserialization.

The custom Deserialize implementation correctly separates middlewares from the flattened EndpointType, enabling flexible YAML and environment variable formats.

121-153: Clever handling of environment variable format.

The function correctly handles both YAML sequences and environment variable maps with numeric keys (e.g., MIDDLEWARES__0__, MIDDLEWARES__1__), preserving order by sorting.

172-179: Well-designed Middleware enum.

Boxing the Dlq variant is appropriate given that DeadLetterQueueMiddleware contains a nested Endpoint, preventing excessive stack usage.

196-203: LGTM!

The DeadLetterQueueMiddleware structure is clean with appropriate defaults.

226-228: Consistent naming across endpoints.

The delayed_ack flag provides consistent semantics with NATS configuration.

257-262: Sensible defaults for NATS configuration.

Both flags default to the safer option: waiting for acknowledgments and using JetStream (persistent) mode.

294-297: LGTM!

AMQP configuration flags follow the same pattern as other endpoints with safe defaults.

392-562: Comprehensive test coverage for new config structure.

The tests correctly validate both YAML and environment variable deserialization for the new Middleware enum and nested DLQ endpoint structure.

coderabbitai · 2025-12-18T13:40:04Z

+            NatsSubscription::Core(sub) => {
+                let mut messages = Vec::new();
+                // Core NATS has no ack, so the commit is a no-op.
+                let commit_closure: Box<dyn FnOnce() -> BoxFuture<'static, ()> + Send> =
+                    Box::new(|| Box::pin(async {}));
+
+                if let Some(message) = sub.next().await {
+                    messages.push(Self::create_canonical_message(&message));
+                }
+                (messages, commit_closure)
+            }


⚠️ Potential issue | 🟠 Major

Core NATS batch receive ignores the max_messages parameter.

The Core NATS path only fetches a single message (line 303-305), ignoring the max_messages parameter entirely. This inconsistency with the JetStream path could significantly degrade performance when using Core NATS with batch-based processing.

🔎 Apply this diff to fetch up to `max_messages` in Core mode:

NatsSubscription::Core(sub) => { let mut messages = Vec::new(); // Core NATS has no ack, so the commit is a no-op. let commit_closure: Box<dyn FnOnce() -> BoxFuture<'static, ()> + Send> = Box::new(|| Box::pin(async {})); - if let Some(message) = sub.next().await { - messages.push(Self::create_canonical_message(&message)); + // Fetch up to max_messages + for _ in 0..max_messages { + if let Ok(Some(message)) = tokio::time::timeout( + Duration::from_millis(100), + sub.next() + ).await { + messages.push(Self::create_canonical_message(&message)); + } else { + break; // No more messages available + } } (messages, commit_closure) }

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around lines 297 to 307, the Core NATS branch currently only pulls a single message and ignores the max_messages parameter; change it to loop awaiting sub.next() and push messages into the Vec until you have collected up to max_messages or the stream returns None, preserving the existing no-op commit_closure for Core NATS; ensure you use the function's max_messages argument for the loop limit and handle potential await timeouts/None appropriately so the behavior matches the JetStream batch path.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (4)

src/endpoints/nats.rs (4)

179-179: Unexplained 100ms sleep after consumer creation.

This hardcoded sleep was flagged in a previous review and remains unaddressed. It appears to be a workaround for a timing issue but lacks documentation.

226-232: Ack failures are silently swallowed.

This issue was flagged in a previous review and remains unaddressed. The unwrap_or_else logs the error but doesn't propagate it, potentially hiding delivery failures.

287-289: Batch ack failures are logged but not surfaced.

Similar to the single-message receive method, batch ack failures are logged but not propagated. This is consistent with the existing implementation but shares the same limitation of hiding delivery confirmation failures from callers.

302-304: Core NATS batch receive ignores the max_messages parameter.

This issue was flagged in a previous review and remains unaddressed. The Core NATS path only fetches a single message, completely ignoring the max_messages parameter and degrading batch processing performance.

🧹 Nitpick comments (3)

tests/integration_test.rs (1)

6-75: Consider removing commented-out code.

This large block of commented-out tests (~70 lines) clutters the file. Since version control preserves history, consider removing it entirely. If kept for reference purposes, add a brief comment explaining why it's retained.

src/endpoints/mongodb.rs (1)

220-224: Change stream path doesn't leverage batch processing.

When using change streams (replica sets/sharded clusters), receive_batch falls back to single-message receive, ignoring the max_messages parameter. This means batch processing efficiency is not available for production MongoDB deployments.

Consider implementing true batch receive for change streams in a future iteration, or document this limitation for users.

src/endpoints/nats.rs (1)

264-265: Consider configurable or shorter timeout for batch operations.

The 5-second timeout on the first message may introduce unnecessary latency in batch processing scenarios where low latency is important. Consider making this configurable or using a shorter default (e.g., 100-500ms) to improve responsiveness when fewer messages are available.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a117227 and 2bed90e.

📒 Files selected for processing (3)

src/endpoints/mongodb.rs (7 hunks)
src/endpoints/nats.rs (5 hunks)
tests/integration_test.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/endpoints/mongodb.rs (1)

src/traits.rs (3)

into_batch_commit_func (33-41)

send_batch (85-88)

send_batch_helper (109-139)

tests/integration_test.rs (5)

tests/integration/nats.rs (1)

test_nats_performance_direct (49-82)

tests/integration/amqp.rs (1)

test_amqp_performance_direct (49-73)

tests/integration/kafka.rs (1)

test_kafka_performance_direct (49-79)

tests/integration/mqtt.rs (1)

test_mqtt_performance_direct (49-84)

tests/integration/mongodb.rs (1)

test_mongodb_performance_direct (50-91)

src/endpoints/nats.rs (1)

src/traits.rs (2)

send_batch (85-88)

send_batch_helper (109-139)

🔇 Additional comments (17)

tests/integration_test.rs (1)

77-113: LGTM!

The consolidated performance test function is well-structured:

The #[ignore] attribute correctly marks this as an opt-in performance test

The RAII pattern with PerformanceSummaryPrinter ensures the summary prints on drop, even if a test panics

Sequential execution with feature-gated blocks is appropriate for accurate performance measurements

The underscore prefix on _summary_printer idiomatically indicates the variable exists solely for its drop behavior

src/endpoints/mongodb.rs (8)

43-43: LGTM: Efficient payload conversion.

The .into() conversion from Vec<u8> to bytes::Bytes is idiomatic and efficient.

93-93: LGTM: Correct conversion for BSON Binary.

The to_vec() method correctly converts bytes::Bytes to Vec<u8> as required by the BSON Binary constructor. While this creates a copy, it's necessary for MongoDB's binary type.

103-111: LGTM: Standard batch send implementation.

The batch send delegates to the single-message send method via send_batch_helper, which is consistent with other endpoints in the codebase.

250-259: LGTM: Correct available message filter logic.

The filter properly identifies unlocked messages using MongoDB's $or operator to check for missing, null, or expired locked_until values.

302-306: LGTM: Correct atomic claim with race condition prevention.

The update filter correctly extends the base availability filter, ensuring that documents are only claimed if they're still available. This prevents race conditions where another consumer claims documents between the find and update operations.

382-399: LGTM: Straightforward document retrieval.

The helper correctly retrieves full documents by ObjectId list using MongoDB's $in operator.

401-440: LGTM: Correct bulk commit implementation.

The helper properly converts BSON documents to canonical messages and creates an efficient bulk commit function that deletes all claimed documents in a single operation. The error handling is appropriate for fire-and-forget commit semantics.

331-331: LGTM: Good refactoring for code reuse.

Extracting the available message filter into a shared helper improves maintainability and ensures consistency between single and batch receive paths.

src/endpoints/nats.rs (8)

17-20: LGTM: Clean enum design for dual-mode support.

The NatsClient enum effectively multiplexes between Core and JetStream modes with clear variant naming.

22-27: LGTM: Struct fields support dual-mode operation.

The refactored NatsPublisher struct cleanly accommodates both Core and JetStream modes through the NatsClient enum.

69-80: LGTM: Efficient cloning with helpful comment.

The with_subject method correctly clones the inner client (which is cheap due to Arc-like internals) and creates a new publisher instance.

85-114: LGTM: Dual-mode send implementation is correct.

The send method properly handles both JetStream (with conditional ack waiting) and Core NATS (fire-and-forget) modes.

132-135: LGTM: Clean subscription enum design.

The NatsSubscription enum appropriately multiplexes between Core and JetStream subscription types, with boxing for the larger JetStream stream type.

198-214: LGTM: Clean message conversion helper.

The create_canonical_message helper correctly converts NATS messages to canonical format, with appropriate handling of multi-value headers.

116-125: The current implementation sending messages individually rather than in a true batch is appropriate for async_nats. The standard pattern in async_nats is to publish messages in a loop and call flush to ensure all messages are sent, which is what send_batch_helper does. While batch publishing capabilities exist in the jetstream-extra extension crate, this would require an additional dependency. The current approach is idiomatic for the async_nats library.

309-319: Response handling in batch commit is intentional and consistent with framework design.

The NATS endpoint correctly ignores the responses parameter because per-message JetStream acknowledgment is the primary commitment mechanism for this implementation. This approach aligns with Kafka, another message queue endpoint that similarly ignores responses. The framework supports optional response handling through conversion utilities (into_batch_commit_func, into_commit_func), allowing different endpoints to implement according to their protocol semantics. Middleware chains do not depend on response propagation through batch commit.

coderabbitai · 2025-12-18T14:12:33Z

+        while let Some(result) = cursor.next().await {
+            if let Ok(doc) = result {
+                if let Ok(id) = doc.get_object_id("_id") {
+                    ids_to_claim.push(id);
+                }
+            }
+        }


⚠️ Potential issue | 🟡 Minor

Silent error handling may hide data issues.

The cursor loop silently ignores documents that fail to parse or don't have a valid _id field. While this prevents the entire batch from failing due to one malformed document, it could hide data corruption or schema issues.

Consider logging a warning when documents are skipped.

🔎 Apply this diff to add warning logs:

let mut ids_to_claim = Vec::new(); while let Some(result) = cursor.next().await { - if let Ok(doc) = result { - if let Ok(id) = doc.get_object_id("_id") { - ids_to_claim.push(id); + match result { + Ok(doc) => match doc.get_object_id("_id") { + Ok(id) => ids_to_claim.push(id), + Err(e) => warn!(error = %e, "Skipping document with invalid _id"), + }, + Err(e) => warn!(error = %e, "Failed to read document from cursor"), - } } }

🤖 Prompt for AI Agents

In src/endpoints/mongodb.rs around lines 287 to 293, the cursor loop currently ignores errors and documents missing a valid "_id" without any trace; update the loop to emit warning logs when a document fails to deserialize (log the Err returned by result) and when get_object_id("_id") fails (log that the document is missing/invalid _id and include doc preview or identifier if available), keeping the existing behavior of continuing to the next document; use the project's standard logger (e.g., tracing::warn! or the module logger) and include enough context (collection name/batch id) and the error details so skipped documents can be investigated.

coderabbitai · 2025-12-18T14:12:33Z

+            if !config.delayed_ack {
+                warn!("'delayed_ack' is false but NATS is in Core mode, which does not support acknowledgements. The flag will be ignored.");
+            }


⚠️ Potential issue | 🟡 Minor

Fix misleading warning logic for delayed_ack in Core mode.

The current warning only triggers when delayed_ack is false, but Core NATS ignores the delayed_ack flag regardless of its value. The warning should indicate that the flag is always ignored in Core mode, not just when it's false.

🔎 Apply this diff to fix the warning logic:

- if !config.delayed_ack { - warn!("'delayed_ack' is false but NATS is in Core mode, which does not support acknowledgements. The flag will be ignored."); - } + if !config.delayed_ack { + warn!("NATS is in Core mode, which does not support acknowledgements. The 'delayed_ack' flag will be ignored."); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if !config.delayed_ack {

warn!("'delayed_ack' is false but NATS is in Core mode, which does not support acknowledgements. The flag will be ignored.");

}

// In Core mode, acknowledgements are not supported, so delayed_ack is always ignored

warn!("NATS is in Core mode, which does not support acknowledgements. The 'delayed_ack' flag will be ignored.");

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around lines 56 to 58, the warning message currently only fires when config.delayed_ack is false, but Core NATS ignores the delayed_ack flag regardless of its value; update the condition to always emit the warning when running in Core mode (remove the delyed_ack boolean check) and change the warning text to state that delayed_ack is ignored in Core mode regardless of its configured value so the log is accurate.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (4)

src/endpoints/nats.rs (4)
56-58: Fix misleading warning logic for delayed_ack in Core mode.

Core NATS ignores the delayed_ack flag regardless of its value, but the warning only fires when delayed_ack is false. This is misleading—the flag is always ignored in Core mode.
🔎 Apply this diff to warn unconditionally in Core mode
-            if !config.delayed_ack {
-                warn!("'delayed_ack' is false but NATS is in Core mode, which does not support acknowledgements. The flag will be ignored.");
-            }
+            warn!("NATS is in Core mode, which does not support acknowledgements. The 'delayed_ack' flag will be ignored.");
179-179: Unexplained 100ms sleep after consumer creation.

This hardcoded sleep appears to be a workaround for a timing issue but lacks documentation. Arbitrary sleeps can mask race conditions and may not be sufficient under load or in slow network conditions.

Document why this sleep is necessary or replace it with a robust synchronization mechanism. If it's working around a NATS client initialization race, consider using the client's flush/ping API or a proper readiness check.
-            tokio::time::sleep(Duration::from_millis(100)).await;
+            // Brief delay to allow consumer to be fully registered with the server.
+            // TODO: Replace with proper readiness check (e.g., flush/ping or subscription confirmation)
+            tokio::time::sleep(Duration::from_millis(100)).await;
226-232: Ack failures are silently swallowed.

The unwrap_or_else logs the error but doesn't propagate it. If acknowledgment fails, the message may be redelivered, but the caller has no way to detect the ack failure. The CommitFunc signature currently doesn't allow returning errors.

At minimum, improve the logging to indicate redelivery risk:
-                        message.ack().await.unwrap_or_else(|e| {
-                            tracing::error!("Failed to ACK NATS message: {:?}", e)
-                        });
+                        if let Err(e) = message.ack().await {
+                            tracing::error!(error = ?e, "Failed to ACK NATS message - message may be redelivered");
+                        }
Consider whether CommitFunc should be updated to return Result to allow error propagation.

301-310: Core NATS batch receive ignores the max_messages parameter.

The Core NATS path only fetches a single message (lines 306-308), completely ignoring the max_messages parameter. This inconsistency with the JetStream path significantly degrades batch processing performance when using Core NATS.
🔎 Apply this diff to fetch up to `max_messages` in Core mode
             NatsSubscription::Core(sub) => {
                 let mut messages = Vec::new();
                 // Core NATS has no ack, so the commit is a no-op.
-                let commit_closure: BatchCommitFunc = Box::new(|_| Box::pin(async {}));
+                let commit_closure: BatchCommitFunc = Box::new(|_| Box::pin(async {}));

-                if let Some(message) = sub.next().await {
-                    messages.push(Self::create_canonical_message(&message));
+                // Fetch up to max_messages
+                for _ in 0..max_messages {
+                    match tokio::time::timeout(
+                        Duration::from_millis(100),
+                        sub.next()
+                    ).await {
+                        Ok(Some(message)) => {
+                            messages.push(Self::create_canonical_message(&message));
+                        }
+                        _ => break, // Timeout or no more messages
+                    }
                 }
                 Ok((messages, commit_closure))
             }

🧹 Nitpick comments (1)

src/endpoints/nats.rs (1)

285-290: Consider making batch ack concurrency configurable.

The hardcoded limit of 1000 concurrent acknowledgments is reasonable for most cases, but very large batches or resource-constrained environments might benefit from a configurable limit.

Consider adding a max_ack_concurrency field to NatsConfig with a default of 1000, allowing operators to tune this based on their NATS server capacity and network conditions.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bed90e and 59bfe8d.

📒 Files selected for processing (2)

src/endpoints/nats.rs (5 hunks)
tests/integration_test.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/endpoints/nats.rs (2)

src/endpoints/http.rs (1)

send_batch (260-268)

src/traits.rs (2)

send_batch (85-88)

send_batch_helper (109-139)

🔇 Additional comments (5)

tests/integration_test.rs (1)

77-113: LGTM: Clean aggregation of performance tests.

The implementation properly consolidates feature-specific performance tests into a single orchestrator. The use of #[ignore] ensures these tests run only when explicitly requested, and the RAII pattern with _summary_printer cleanly handles result reporting at test completion.

The sequential execution prevents measurement interference between tests, which is appropriate for performance benchmarking.

src/endpoints/nats.rs (4)

69-80: LGTM!

The with_subject implementation correctly clones the appropriate client variant. The comment about cheap cloning is helpful context.

85-114: LGTM!

The send implementation correctly handles both JetStream (with conditional ack) and Core modes. The conditional await of ack_future based on delayed_ack properly implements fire-and-forget vs. confirmed delivery semantics.

116-125: LGTM!

The send_batch implementation follows the standard pattern using send_batch_helper. The comment correctly notes this isn't a native bulk operation but processes messages individually, which is appropriate given NATS doesn't have a native batch publish API.

198-214: LGTM!

The create_canonical_message helper correctly converts NATS messages to the canonical format, properly handling optional headers and multi-value header entries.

coderabbitai · 2025-12-19T07:10:48Z

+                // Use a short timeout to make the batch fetch non-blocking if no messages are available.
+                let message_stream = stream.next().await;
+
+                // Process the first message if it exists
+                if let Some(Ok(first_message)) = message_stream {
+                    canonical_messages.push(Self::create_canonical_message(&first_message));
+                    jetstream_messages.push(first_message);
+
+                    // Greedily fetch the rest of the batch
+                    while canonical_messages.len() < max_messages {
+                        if let Ok(Some(message)) = stream.try_next().await {
+                            canonical_messages.push(Self::create_canonical_message(&message));
+                            jetstream_messages.push(message);
+                        } else {
+                            break; // No more messages in the buffer
+                        }
+                    }
+                }


⚠️ Potential issue | 🟡 Minor

JetStream batch receive may block indefinitely on first message.

The comment on line 263 states the batch fetch should be "non-blocking if no messages are available," but the first message fetch (line 264) uses stream.next().await which blocks indefinitely until a message arrives. Only subsequent messages use the non-blocking try_next().

Consider wrapping the first message fetch in a timeout to make the entire batch operation non-blocking:

- // Use a short timeout to make the batch fetch non-blocking if no messages are available. - let message_stream = stream.next().await; + // Use a short timeout to make the batch fetch non-blocking if no messages are available. + let message_stream = tokio::time::timeout( + Duration::from_millis(100), + stream.next() + ).await.ok().flatten(); // Process the first message if it exists - if let Some(Ok(first_message)) = message_stream { + if let Some(Ok(first_message)) = message_stream {

This aligns the behavior with the comment and prevents indefinite blocking when no messages are available.

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around lines 263 to 280, the initial call to stream.next().await blocks indefinitely which contradicts the "non-blocking if no messages are available" intent; wrap the first fetch in a tokio::time::timeout with the same short duration used for non-blocking behavior, treat a timeout as "no message" (i.e., skip processing and return an empty batch), and only proceed to greedily call stream.try_next() when the timeout yields an Ok(Some(message)); add the necessary use/import for tokio::time::timeout and handle the Result/Elapsed error branch by not blocking and breaking out gracefully.

coderabbitai · 2025-12-19T07:10:48Z

+                            futures::stream::iter(jetstream_messages)
+                                .for_each_concurrent(
+                                    Some(1000),
+                                    |message| async move {
+                                        if let Err(e) = message.ack().await {
+                                            tracing::error!("Failed to ACK NATS message: {:?}", e);
+                                        }
+                                    })
+                                .await;


⚠️ Potential issue | 🟡 Minor

Batch ack failures are logged but not propagated.

Similar to the single-message receive, batch ack failures are logged but the BatchCommitFunc signature doesn't allow returning errors. Failed acks may result in message redelivery without the caller being aware.

Improve the error message to indicate redelivery risk:

|message| async move { if let Err(e) = message.ack().await { - tracing::error!("Failed to ACK NATS message: {:?}", e); + tracing::error!(error = ?e, "Failed to ACK NATS message in batch - message may be redelivered"); } })

Consider whether BatchCommitFunc should return Result to enable error propagation across the batch.

🤖 Prompt for AI Agents

In src/endpoints/nats.rs around lines 287 to 295, the batch ack loop currently logs ack failures but neither signals them to the caller nor mentions the risk of redelivery; update the error logging to include the error details and an explicit note that a failed ack may cause redelivery (e.g., "Failed to ACK NATS message: {e}; message may be redelivered"), and then modify the BatchCommitFunc signature and callers to return a Result (propagating any ack failures) so callers can handle or retry errors — change the async closure to collect/report errors and return Err on any failed ack, update function types to Result<(), ErrorType>, and adjust upstream call sites to handle the Result accordingly.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/endpoints/nats.rs (3)
116-125: Batch send follows the established pattern.

The implementation uses send_batch_helper consistently with other endpoints. The comment correctly notes this isn't a true bulk operation. For future optimization, consider NATS pipelining or concurrent publishes if throughput becomes a bottleneck.

167-171: Durable consumer name construction could be more robust.

The durable name uses subject.replace('.', '-') to sanitize the subject, but NATS subjects can contain other characters that might not be valid in consumer names (e.g., wildcards * and >). Consider using a more comprehensive sanitization function or validate the subject beforehand.
🔎 Suggested improvement for name sanitization
                 durable_name: Some(format!(
                     "mq-bridge-{}-{}",
                     stream_name,
-                    subject.replace('.', "-")
+                    subject.chars()
+                        .map(|c| if c.is_alphanumeric() || c == '-' { c } else { '-' })
+                        .collect::<String>()
                 )),
287-289: Consider lowering the concurrency limit for batch acks.

The concurrency limit of 1000 for concurrent acknowledgments seems quite high. While the comment acknowledges this is a balance, spawning up to 1000 concurrent tasks could:

Overwhelm the NATS server with simultaneous ack requests

Consume significant memory for task state

Create scheduler pressure on the Tokio runtime

Consider starting with a more conservative limit (e.g., 50-100) and making it configurable if needed.
🔎 Suggested adjustment
                         futures::stream::iter(jetstream_messages)
                             .for_each_concurrent(
-                                Some(1000),
+                                Some(100), // More conservative concurrency limit
                                 |message| async move {

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59bfe8d and 77c7ce5.

📒 Files selected for processing (1)

src/endpoints/nats.rs (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/endpoints/nats.rs (3)

src/endpoints/http.rs (4)

send_batch (260-268)

as_any (116-118)

as_any (270-272)

receive_batch (94-114)

src/endpoints/static_endpoint.rs (4)

send_batch (36-44)

as_any (50-52)

receive (71-78)

receive_batch (80-87)

src/traits.rs (4)

send_batch (85-88)

send_batch_helper (109-139)

receive (54-64)

receive_batch (48-51)

🔇 Additional comments (9)

src/endpoints/nats.rs (9)

17-20: Clean abstraction for JetStream/Core multiplexing.

The NatsClient enum provides a clear separation between JetStream and Core NATS modes. Both variants wrap Arc-based clients internally, making clones cheap as noted in the with_subject method.

38-60: JetStream/Core initialization logic is sound.

The conditional initialization correctly handles both modes:

JetStream mode ensures the stream exists (idempotent)

Core mode logs appropriately and warns about ignored flags

The warning at line 51 properly alerts when no stream is configured in JetStream mode.

69-80: with_subject correctly preserves publisher state.

The method properly clones the multiplexed client and preserves the delayed_ack configuration. The comment accurately notes that cloning is cheap due to Arc-based internals.

96-111: Send dispatch logic correctly handles both modes.

The implementation properly branches on NatsClient variant:

JetStream path respects the delayed_ack flag for synchronous vs. fire-and-forget semantics

Core path publishes without acknowledgment (as expected for Core NATS)

132-135: NatsSubscription enum is well-designed.

The enum cleanly separates Core and JetStream subscription types. Boxing the JetStream stream is appropriate for size optimization.

151-194: Consumer initialization handles both modes correctly.

The branching logic properly sets up either JetStream (with stream and consumer creation) or Core (with queue subscription) modes. The JetStream path correctly uses durable consumers for persistence.

198-214: Message canonicalization is correct.

The method properly converts NATS messages to CanonicalMessage format. Taking the first value for multi-value headers is a reasonable simplification given the target schema.

220-245: Receive implementation correctly dispatches to JetStream/Core paths.

The method properly handles both subscription types with appropriate commit semantics (JetStream ack vs. Core no-op). The message conversion uses the shared create_canonical_message helper.

259-280: JetStream batch fetch correctly implements greedy collection.

The batch logic properly:

Fetches the first message

Greedily collects additional messages up to max_messages using try_next()

Returns empty batch if no messages available

changes after review

dcf3314

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

Comment thread src/route.rs

marcomq added 3 commits December 16, 2025 15:14

add middleware

3f5348a

fixes & add middlewares as features

0f805cb

make bulk_send more robust

fd78502

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

marcomq added 2 commits December 16, 2025 17:47

stabilize errors & fmt

6056fa5

compile fix

108a126

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

add dlq retry

058328b

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

Comment thread src/middleware/dlq.rs Outdated

switch from vec<u8> to bytes and fix some tests

82b6465

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

Comment thread src/endpoints/kafka.rs

Comment thread src/middleware/dlq.rs Outdated

Comment thread tests/integration_test.rs Outdated

Comment thread tests/integration/common.rs Outdated

disable jetstream optionally

2942e59

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

marcomq added 3 commits December 16, 2025 20:52

renam await_ack to skip_ack

3691e5d

fix kafka sasl key

ee84573

fmt

ea9d194

coderabbitai Bot reviewed Dec 16, 2025

View reviewed changes

switch to prefer bulk

7175143

coderabbitai Bot reviewed Dec 17, 2025

View reviewed changes

fmt

9762d1c

coderabbitai Bot reviewed Dec 17, 2025

View reviewed changes

Comment thread src/endpoints/nats.rs Outdated

Comment thread src/middleware/deduplication.rs Outdated

marcomq added 2 commits December 17, 2025 15:12

clippy

dbbce6c

fix mqtt error messages

fb2d88a

coderabbitai Bot reviewed Dec 17, 2025

View reviewed changes

Comment thread src/endpoints/mqtt.rs

marcomq added 2 commits December 17, 2025 23:33

rewrote read direct test for bulk

e98be87

rename bulk to batch

f93b669

coderabbitai Bot reviewed Dec 17, 2025

View reviewed changes

add receive_batch for kafka

d8745c1

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

Comment thread src/endpoints/kafka.rs Outdated

Comment thread src/endpoints/kafka.rs Outdated

add tests for separate single and batch read / write performance

2ed6caf

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

address some coderabbit issues

a117227

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

add batch read for mongodb

2bed90e

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

try to optimize nats receive_batch (wip)

59bfe8d

coderabbitai Bot reviewed Dec 19, 2025

View reviewed changes

marcomq added 2 commits December 19, 2025 08:18

add comment

77c7ce5

change warning log

00619c9

marcomq merged commit e54acb9 into main Dec 19, 2025
2 checks passed

coderabbitai Bot reviewed Dec 19, 2025

View reviewed changes

coderabbitai Bot mentioned this pull request Dec 20, 2025

Add ComputeHandler as Middleware #3

Merged

Scenario	Sequential	Concurrent (`shutdown_rx.is_empty()`)
Shutdown signal	`Ok(true)` (line 106)	`Ok(true)` (channel empty after recv)
End-of-stream	`Ok(false)` (line 131)	`Ok(true)` (channel was never used)

Conversation

marcomq commented Dec 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marcomq commented Dec 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 16, 2025 •

edited

Loading