Skip to content

chore: Release polish — code quality, bug fixes, and dependency cleanup#136

Open
mcurrier2 wants to merge 1 commit into
mainfrom
chore/release-polish
Open

chore: Release polish — code quality, bug fixes, and dependency cleanup#136
mcurrier2 wants to merge 1 commit into
mainfrom
chore/release-polish

Conversation

@mcurrier2

Copy link
Copy Markdown
Collaborator

Summary

Comprehensive code quality review and polish pass across the entire codebase in preparation for initial release. Addresses accumulated technical debt, documentation drift, dead code, and several latent bugs discovered during systematic file-by-file review.

Closes #135

Changes

Bug Fixes (8)

  • Frame corruption in blocking UDSwritev partial-write fallback used incorrect offset calculation (written.saturating_sub(4)) that could corrupt message framing when fewer than 4 bytes were written
  • Duration parsing truncationparse_duration_micros silently truncated fractional values (e.g. "1.5ms" became "1ms") due to as u64 cast
  • SHM-direct OOB read — No validation of payload_len from shared memory before copy_nonoverlapping
  • SHM-direct timestamp accuracyreceive_blocking_timed captured a new timestamp instead of using already-captured msg.receive_time_ns
  • SHM blocking zero-length framesread_data_blocking did not reject data_len == 0
  • TCP server panicpanic! in multi-server accept loop when try_clone() fails
  • Hardcoded system metadataget_memory_gb() returned 16.0; rust_version was hardcoded

Dependency Cleanup (4)

  • Remove dead dependencies: statistics, crossbeam, rand
  • Remove unused nix "time" feature
  • Fix placeholder repository URL → actual redhat-performance/rusty-comms
  • Remove dead docs.rs documentation URL

Code Cleanup (5)

  • Remove dead current_timestamp_ns() from utils.rs
  • Consolidate identical Message::new/new_for_blocking into inline alias
  • Remove dead if true scaffolding in blocking round-trip loop
  • Move misplaced test into mod tests block
  • Replace println! with tracing::trace! in backpressure tests

Documentation Fixes (8)

  • Fix incorrect clock source, readiness signal, and CLI flag in README
  • Fix stale "no streaming support" comments
  • Fix "zero-copy serialization" and "async I/O throughout" claims in lib.rs
  • Fix "JIT compilation" warmup comment
  • Rewrite logging.rs module docs to match actual exports
  • Fix SHM-direct docs (repr attribute, max payload size)
  • Mark archived blocking plan as HISTORICAL/COMPLETE
  • Fix dashboard README example commands

Performance (1)

  • .with_context(|| format!(...)) for lazy allocation in main.rs

Validation

  • 476 tests passing (386 unit + 48 integration + 42 doc tests)
  • Zero clippy warnings (-D warnings)
  • cargo fmt clean
  • MSRV check passing (Rust 1.70)
  • Release build successful

Test plan

  • cargo test — all 476 tests pass
  • cargo clippy --all-targets -- -D warnings — zero warnings
  • cargo fmt -- --check — no formatting changes
  • cargo build --release — builds successfully
  • MSRV check (Rust 1.70) — passes
  • Pre-commit hooks pass (audit, clippy, fmt, tests, MSRV)

Stats

21 files changed, 241 insertions(+), 291 deletions(-) (net -50 lines)

Made with Cursor

Bug fixes:
- Fix frame corruption in blocking UDS writev partial-write handling
- Fix parse_duration_micros truncating fractional values (e.g. "1.5ms")
- Add payload_len bounds check in SHM-direct to prevent OOB reads
- Fix SHM-direct receive_blocking_timed using wrong timestamp
- Add zero-length frame rejection in shared_memory_blocking
- Replace panic with graceful error handling in TCP multi-server accept
- Fix hardcoded memory_gb (16.0) with actual /proc/meminfo read
- Fix hardcoded rust_version with env!(CARGO_PKG_RUST_VERSION)

Performance:
- Use .with_context(|| format!(...)) for lazy allocation in main.rs

Dependency cleanup:
- Remove dead dependencies: statistics, crossbeam, rand
- Remove unused nix "time" feature (code uses libc::clock_gettime directly)
- Fix placeholder repository URL to actual redhat-performance/rusty-comms
- Remove dead docs.rs documentation URL (crate not published)

Code cleanup:
- Remove dead current_timestamp_ns() function from utils.rs
- Consolidate identical Message::new/new_for_blocking into inline alias
- Remove dead `if true` scaffolding in benchmark_blocking round-trip loop
- Move misplaced test into mod tests block in ipc/mod.rs
- Replace println! with tracing::trace! in all backpressure tests

Documentation fixes:
- Fix incorrect clock source, readiness signal, and CLI flag in README
- Fix stale "no streaming support" comments (streaming is implemented)
- Fix "zero-copy serialization" and "async I/O throughout" claims in lib.rs
- Fix "JIT compilation" warmup comment (Rust is AOT compiled)
- Rewrite logging.rs module docs to match actual exports
- Fix SHM-direct docs (repr, max payload size)
- Mark archived blocking plan as HISTORICAL/COMPLETE
- Fix dashboard README example commands to use correct CLI flags

All 476 tests passing. Zero clippy warnings.

AI-assisted-by: Claude Opus 4.6
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added support for preserving fractional duration inputs in the CLI.
    • Improved blocking-mode output so streaming latency results can be shown during runs.
  • Bug Fixes

    • Made shared-memory and socket transports handle invalid or partial data more safely.
    • Prevented some connection setup failures from stopping multi-client server runs.
    • Improved memory and version info reporting in generated results.
  • Documentation

    • Updated README examples and benchmark guidance to match current behavior and output format.

Walkthrough

This release-polish PR fixes eight correctness bugs (UDS write corruption, duration truncation, SHM-direct OOB read, SHM timestamp accuracy, SHM-blocking zero-length frames, TCP server panics, and hardcoded system metadata), removes dead dependencies/code, replaces println! with tracing in tests, and corrects documentation across README, lib.rs, logging.rs, and archived/dashboard docs.

Changes

Correctness Fixes Across Transports and System Info

Layer / File(s) Summary
UDS blocking partial-write fix
src/ipc/unix_domain_socket_blocking.rs
Fixed frame-corruption fallback in send_blocking by correctly distinguishing partial writes within the length prefix vs. the payload, and erroring on zero bytes written.
Duration parsing fix
src/cli.rs
parse_duration_micros uses Duration::from_secs_f64 for us/ms units to preserve fractional values instead of truncating them.
SHM-direct OOB read and timestamp fix
src/ipc/shared_memory_direct.rs
RawSharedMessage layout changes to #[repr(C)]; receive_blocking validates payload_len against MAX_PAYLOAD_SIZE before copying; receive_blocking_timed reuses the captured receive_time_ns; docs updated for the 8KB payload cap.
SHM blocking zero-length frame fix
src/ipc/shared_memory_blocking.rs
read_data_blocking now rejects data_len == 0 alongside oversized frames with an updated error message.
TCP multi-server accept loop panic fix
src/ipc/tcp_socket.rs
Socket setup failures (into_std, try_clone, from_std) now log warnings and skip the connection instead of panicking the server.
System memory/version metadata fix
src/results.rs, src/results_blocking.rs
get_memory_gb() reads /proc/meminfo on Linux instead of returning a hardcoded value; SystemInfo::default() sources rust_version from env!(CARGO_PKG_RUST_VERSION).

Dependency and Code Cleanup

Layer / File(s) Summary
Manifest cleanup
Cargo.toml
Fixed repository URL, removed dead documentation field, moved nix to Linux-only target deps, removed unused crossbeam/rand/statistics.
Dead helper removal
src/utils.rs
Removed unused current_timestamp_ns() and its SystemTime/UNIX_EPOCH import.
Message constructor consolidation and test simplification
src/ipc/mod.rs
Message::new_for_blocking becomes an inline alias for Message::new; size-analysis and timestamp-offset tests simplified.
Dead scaffolding removal
src/benchmark_blocking.rs, src/main.rs
Removed unconditional if true wrapper and placeholder comment; updated blocking-mode docs; switched to lazy error context.
Tracing in backpressure tests
src/ipc/posix_message_queue.rs, src/ipc/shared_memory.rs, src/ipc/tcp_socket.rs, src/ipc/unix_domain_socket.rs
Replaced println! with tracing::trace! in backpressure-detection test branches.

Documentation Accuracy

Layer / File(s) Summary
README corrections
README.md
Fixed clock-source, CLI example, and readiness-signal descriptions; added section spacing.
Archived plan marked historical
docs/archive/blocking-mode-implementation-plan.md
Added a historical-document notice and changed status to COMPLETE (archived).
lib.rs and logging.rs doc fixes
src/lib.rs, src/logging.rs
Corrected serialization/execution-mode and warmup-iteration doc claims; rewrote logging module docs.
Dashboard README example fix
utils/dashboard/README.md
Updated example command and output file names to match current CLI flags and JSON output naming.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SharedMemoryDirect as receive_blocking
    participant SharedMemory as SHM buffer

    Client->>SharedMemoryDirect: request message
    SharedMemoryDirect->>SharedMemory: read payload_len
    alt payload_len > MAX_PAYLOAD_SIZE
        SharedMemoryDirect->>SharedMemory: clear ready flag, signal condvar
        SharedMemoryDirect-->>Client: return error
    else valid length
        SharedMemoryDirect->>SharedMemory: copy_nonoverlapping payload
        SharedMemoryDirect-->>Client: return Message with receive_time_ns
    end
Loading

Related Issues: #135

Suggested labels: bug, dependencies, documentation, cleanup

Suggested reviewers: redhat-performance/rusty-comms maintainers familiar with IPC transports and results reporting

🐰 hops through the code, sweeping bugs from every burrow,
trims stale deps and dusty docs, tomorrow's release grows surer,
println gone, tracing sings, the metadata rings true —
a tidier warren, ready now, for release review.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately reflects the main release-polish focus on bug fixes and dependency cleanup.
Description check ✅ Passed The description is detailed and covers summary, changes, testing, and validation, even though it doesn't mirror the template headings exactly.
Linked Issues check ✅ Passed The PR addresses the listed bugs, dependency cleanup, code/docs cleanup, and logging consistency, matching the linked release-polish scope.
Out of Scope Changes check ✅ Passed The changes stay within release-polish scope and update bug fixes, cleanup, and documentation without unrelated feature work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli.rs (1)

838-857: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Non-finite input causes panic in Duration::from_secs_f64.

num_str.parse::<f64>() accepts "inf", "infinity", and "nan" (case-insensitive). These pass the num < 0.0 check (both comparisons are false), but Duration::from_secs_f64 panics on non-finite values. A CLI argument like "infs" or "nans" will crash the process.

🐛 Proposed fix
     // Validate that the number is non-negative
     if num < 0.0 {
         return Err("Duration cannot be negative".to_string());
     }
+
+    if !num.is_finite() {
+        return Err("Duration must be a finite number".to_string());
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli.rs` around lines 838 - 857, The duration parsing logic in the
`src/cli.rs` duration parser can panic because `num_str.parse::<f64>()` allows
non-finite values that reach `Duration::from_secs_f64`. Update the parsing path
to explicitly reject non-finite numbers right after parsing `num` (before the
unit match), and return a normal error for `"inf"`, `"infinity"`, and `"nan"`
inputs so the `Duration::from_secs_f64` calls remain safe.
🧹 Nitpick comments (3)
src/ipc/mod.rs (1)

351-375: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Stale doc on Message::new about new_for_blocking timing.

Lines 351-355 still claim new_for_blocking() "captures the timestamp using a monotonic clock right before serialization," but the new implementation (Lines 366-374) is a pure alias to Message::new, so the timestamp is captured at construction time exactly like new(). The "right before serialization" precision actually comes from set_timestamp_now() called separately in blocking send paths (see unix_domain_socket_blocking.rs). This doc note is now misleading.

📝 Proposed fix
     /// ## Note for Blocking Mode
     ///
-    /// For accurate IPC latency measurement in blocking mode, use
-    /// `new_for_blocking()` instead which captures the timestamp using
-    /// a monotonic clock right before serialization.
+    /// `new_for_blocking()` is a semantic alias for this constructor used
+    /// in blocking transport code. For the most accurate IPC latency
+    /// measurement, call `set_timestamp_now()` right before serialization.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ipc/mod.rs` around lines 351 - 375, Update the stale documentation on
Message::new and Message::new_for_blocking so it no longer says the blocking
constructor captures time “right before serialization.” The current
new_for_blocking method is only a semantic alias to Message::new and captures
the timestamp at construction time, so revise the comments on both constructors
to reflect that behavior. If you want to mention serialization timing, point
readers to the blocking send path that calls set_timestamp_now() instead of
implying new_for_blocking does it itself.
Cargo.toml (1)

10-10: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Repository URL fixed here, but downstream references still stale.

The Containerfile (per provided context) still embeds https://github.com/your-org/ipc-benchmark in OCI image labels. Worth a follow-up to keep metadata consistent across the repo.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Cargo.toml` at line 10, The repository URL has been updated in Cargo.toml,
but the Containerfile still contains the old OCI metadata reference. Update the
image labels in the Containerfile to use the same repository URL as the package
metadata so the repo-wide references stay consistent. Use the Containerfile
label definitions and the existing repository setting in Cargo.toml as the
source of truth when making the change.
src/results_blocking.rs (1)

1119-1147: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Centralize the system metadata helpers.

The memory parser and MSRV helper duplicate ResultsManager’s logic in src/results.rs. A shared helper keeps async/blocking result metadata from drifting again.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/results_blocking.rs` around lines 1119 - 1147, The system metadata
helpers here duplicate the logic already used by ResultsManager in
src/results.rs, so they should be centralized to avoid drift. Move or reuse the
shared memory and MSRV helper logic from ResultsManager so both the blocking and
async paths call the same implementation, keeping get_memory_gb and
get_rust_version aligned with the existing metadata source.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ipc/shared_memory_blocking.rs`:
- Around line 426-430: The read path in read_data_blocking() only checks
data_len against capacity, but it must also verify the full frame is currently
published before copying. Add the same incomplete-frame guard used in
read_data() in shared_memory_blocking.rs, right before the memcpy/copy loop, so
data_len + header size does not exceed the available published bytes and
read_pos is not advanced past write_pos on corrupted frames.

In `@src/results.rs`:
- Around line 1262-1265: The default system-info fallback in the non-Linux
branch now returns 0.0, but test_system_info_default still assumes memory_gb is
always positive. Update the test in test_system_info_default to match the new
contract from the system info logic in results::system_info (or its helper), so
it accepts 0.0 on non-Linux targets while preserving the existing assertion for
Linux.

---

Outside diff comments:
In `@src/cli.rs`:
- Around line 838-857: The duration parsing logic in the `src/cli.rs` duration
parser can panic because `num_str.parse::<f64>()` allows non-finite values that
reach `Duration::from_secs_f64`. Update the parsing path to explicitly reject
non-finite numbers right after parsing `num` (before the unit match), and return
a normal error for `"inf"`, `"infinity"`, and `"nan"` inputs so the
`Duration::from_secs_f64` calls remain safe.

---

Nitpick comments:
In `@Cargo.toml`:
- Line 10: The repository URL has been updated in Cargo.toml, but the
Containerfile still contains the old OCI metadata reference. Update the image
labels in the Containerfile to use the same repository URL as the package
metadata so the repo-wide references stay consistent. Use the Containerfile
label definitions and the existing repository setting in Cargo.toml as the
source of truth when making the change.

In `@src/ipc/mod.rs`:
- Around line 351-375: Update the stale documentation on Message::new and
Message::new_for_blocking so it no longer says the blocking constructor captures
time “right before serialization.” The current new_for_blocking method is only a
semantic alias to Message::new and captures the timestamp at construction time,
so revise the comments on both constructors to reflect that behavior. If you
want to mention serialization timing, point readers to the blocking send path
that calls set_timestamp_now() instead of implying new_for_blocking does it
itself.

In `@src/results_blocking.rs`:
- Around line 1119-1147: The system metadata helpers here duplicate the logic
already used by ResultsManager in src/results.rs, so they should be centralized
to avoid drift. Move or reuse the shared memory and MSRV helper logic from
ResultsManager so both the blocking and async paths call the same
implementation, keeping get_memory_gb and get_rust_version aligned with the
existing metadata source.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 8d7885ae-cbae-46a7-be7a-186e5388cdc8

📥 Commits

Reviewing files that changed from the base of the PR and between d586320 and 5285f2c.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • Cargo.toml
  • README.md
  • docs/archive/blocking-mode-implementation-plan.md
  • src/benchmark_blocking.rs
  • src/cli.rs
  • src/ipc/mod.rs
  • src/ipc/posix_message_queue.rs
  • src/ipc/shared_memory.rs
  • src/ipc/shared_memory_blocking.rs
  • src/ipc/shared_memory_direct.rs
  • src/ipc/tcp_socket.rs
  • src/ipc/unix_domain_socket.rs
  • src/ipc/unix_domain_socket_blocking.rs
  • src/lib.rs
  • src/logging.rs
  • src/main.rs
  • src/results.rs
  • src/results_blocking.rs
  • src/utils.rs
  • utils/dashboard/README.md
💤 Files with no reviewable changes (1)
  • src/utils.rs

Comment on lines +426 to +430
// Validate data length (reject zero or oversized frames)
if data_len == 0 || data_len > capacity {
libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
return Err(anyhow!(
"Invalid data length: {} exceeds capacity {}",
"Invalid data length: {} (capacity: {}, min: 1)",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate against published bytes, not just capacity.

This still trusts any corrupted data_len that is <= capacity. If data_len + 4 exceeds the bytes currently published in the ring, the copy at Lines 448-461 will read stale data and then advance read_pos past write_pos, permanently desynchronizing the buffer. read_data() already has the missing incomplete-frame check; read_data_blocking() needs the same guard before copying.

Patch
         if data_len == 0 || data_len > capacity {
             libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
             return Err(anyhow!(
                 "Invalid data length: {} (capacity: {}, min: 1)",
                 data_len,
                 capacity
             ));
         }
+        if self.available_read_data() < data_len + 4 {
+            libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
+            return Err(anyhow!(
+                "Incomplete message: length {} exceeds published bytes",
+                data_len
+            ));
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Validate data length (reject zero or oversized frames)
if data_len == 0 || data_len > capacity {
libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
return Err(anyhow!(
"Invalid data length: {} exceeds capacity {}",
"Invalid data length: {} (capacity: {}, min: 1)",
// Validate data length (reject zero or oversized frames)
if data_len == 0 || data_len > capacity {
libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
return Err(anyhow!(
"Invalid data length: {} (capacity: {}, min: 1)",
data_len,
capacity
));
}
if self.available_read_data() < data_len + 4 {
libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
return Err(anyhow!(
"Incomplete message: length {} exceeds published bytes",
data_len
));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ipc/shared_memory_blocking.rs` around lines 426 - 430, The read path in
read_data_blocking() only checks data_len against capacity, but it must also
verify the full frame is currently published before copying. Add the same
incomplete-frame guard used in read_data() in shared_memory_blocking.rs, right
before the memcpy/copy loop, so data_len + header size does not exceed the
available published bytes and read_pos is not advanced past write_pos on
corrupted frames.

Comment thread src/results.rs
Comment on lines +1262 to +1265
#[cfg(not(target_os = "linux"))]
{
0.0
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Align the default-system-info test with the new fallback.

Line 1262 returns 0.0 on non-Linux targets, but test_system_info_default still asserts memory_gb > 0.0 at Line 1949. That will fail outside Linux. If the fallback is intended, relax the assertion to match the documented contract.

Proposed test fix
-        assert!(info.memory_gb > 0.0);
+        assert!(info.memory_gb.is_finite());
+        assert!(info.memory_gb >= 0.0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/results.rs` around lines 1262 - 1265, The default system-info fallback in
the non-Linux branch now returns 0.0, but test_system_info_default still assumes
memory_gb is always positive. Update the test in test_system_info_default to
match the new contract from the system info logic in results::system_info (or
its helper), so it accepts 0.0 on non-Linux targets while preserving the
existing assertion for Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release polish: code quality, bug fixes, dependency cleanup, and documentation accuracy

1 participant