Skip to content

Fix/streaming timestamps#105

Merged
dustinblack merged 4 commits into
mainfrom
fix/streaming-timestamps
Apr 13, 2026
Merged

Fix/streaming timestamps#105
dustinblack merged 4 commits into
mainfrom
fix/streaming-timestamps

Conversation

@mcurrier2

@mcurrier2 mcurrier2 commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

Description

Brief description of changes

Type of Change

  • [ x] Bug fix
  • New feature
  • Breaking change
  • Documentation update

Testing

  • [ x] Tests pass locally
  • [ x] Added tests for new functionality
  • [ x] Updated documentation

Checklist

  • [ x] Code follows style guidelines
  • [ x] Self-review completed
  • Comments added for complex code
  • [ x] Documentation updated
  • No breaking changes (or marked as breaking)

PR: Fix Streaming Output Timestamps Across All Code Paths

Summary

Streaming output timestamps (timestamp_ns in JSON/CSV per-message
records) were inaccurate across multiple code paths. Instead of
recording when each message was actually sent, timestamps were captured
at record-creation time — either post-test during batch iteration or
at file-read time. This caused all timestamps within a test run to
cluster into the same second, making time-series analysis of streaming
data meaningless.

Because this tool's primary purpose is precision measurement, incorrect
timestamps in the streaming output undermine the validity of per-message
latency records for any downstream analysis or visualization.

Branch: fix/streaming-timestamps
Base: main
Files changed: 11 (496 insertions, 60 deletions)


Root Cause

Commit e0ff7fa (Oct 8, 2025, PR #82) refactored the benchmark runner
to collect latencies inside spawned async futures, then batch-create
MessageLatencyRecord objects after the futures completed. This moved
the SystemTime::now() call from inside the hot loop (where it
reflected each message's send time) to the post-test iteration (where
it reflected the moment the record was created — all within the same
second).

The initial fix on this branch (68afe07) changed the
MessageLatencyRecord API to accept send_timestamp_ns as a
parameter, but only the blocking round-trip path was actually capturing
timestamps at send time. Three other code path families remained broken.


Problems and Fixes

1. Async round-trip timestamps captured post-test

Problem: The client future returned Vec<Duration>. After the
future completed, the post-test loop called current_timestamp_ns()
at record-creation time. All records received the same timestamp.

Fix: Changed the future to capture current_timestamp_ns() before
each send() and return Vec<(Duration, u64)>. The post-test loop
uses the captured timestamp.

// Inside the client future, before each send:
let wall_ts = MessageLatencyRecord::current_timestamp_ns();
let send_time = Instant::now();
client_transport.send(&message).await?;
// ...
latencies.push((send_time.elapsed(), wall_ts));

// Post-test loop:
for (i, (latency, wall_ts)) in latencies.iter().enumerate() {
    let record = MessageLatencyRecord::new(
        i as u64, mechanism, msg_size,
        LatencyType::RoundTrip, *latency, *wall_ts,
    );
}

Applies to: Both duration-based and count-based async round-trip
loops in src/benchmark.rs.

2. Async combined timestamps captured post-test

Problem: Same pattern as round-trip — the combined test (one-way +
round-trip in a single run) collected latency durations inside the
spawned future and created records post-test with stale timestamps.

Fix: Changed one_way_latencies to Vec<(Duration, u64)> to
carry the wall-clock timestamp alongside each measurement. The
timestamp is captured before each send() inside the future.

Applies to: Both duration-based and count-based async combined
loops in src/benchmark.rs.

3. One-way timestamps from file-read time instead of send time

Problem: One-way tests use a server process that measures receive
latency and writes results to a temporary file. The original format
was one latency_ns value per line. The client read this file
post-test and called current_timestamp_ns() at file-read time —
all records received the same timestamp.

Fix (server side): Both async and blocking server loops now write
wall_send_ns,latency_ns per line. The wall-clock send time is
computed as SystemTime::now() - latency_ns, approximating when the
message entered the IPC channel.

// Server, after measuring latency:
let wall_now_ns = SystemTime::now()
    .duration_since(UNIX_EPOCH)
    .unwrap_or_default()
    .as_nanos() as u64;
let wall_send_ns = wall_now_ns.saturating_sub(latency_ns);
writeln!(file, "{},{}", wall_send_ns, latency_ns).ok();

Fix (client side): A new parse_latency_file_line() function
parses the two-field format. Both async and blocking file readers
use the parsed wall_send_ns as send_timestamp_ns.

Applies to: src/main.rs (both server loops), src/benchmark.rs
(async reader), src/benchmark_blocking.rs (blocking reader).


Known Limitations

Wall-clock / monotonic clock mixing in one-way path

For one-way tests, the server computes the send timestamp by
subtracting the measured monotonic latency from its current wall-clock
time. This mixes two clock domains:

  • latency_ns = monotonic_receive - monotonic_send
    (from message timestamp)
  • wall_send_ns = SystemTime::now() - latency_ns

If NTP adjusts the system clock between message send and receive, the
computed wall_send_ns will be slightly off. This is the best
approximation available without clock synchronization between the
client and server processes. The error is bounded by the magnitude
of any NTP adjustment during the test (typically microseconds).

Timestamp capture ordering

In the round-trip and combined futures, current_timestamp_ns() is
captured one instruction before Instant::now(). The wall-clock
timestamp therefore slightly predates the monotonic measurement start.
The gap is single-digit nanoseconds — orders of magnitude below the
IPC latencies being measured.


Tests Added

Unit tests for parse_latency_file_line (7 tests)

Test Coverage
test_parse_latency_file_line_valid Happy path: "170...,42000"
test_parse_latency_file_line_zeros Edge case: "0,0"
test_parse_latency_file_line_missing_comma Error: single value
test_parse_latency_file_line_empty Error: empty string
test_parse_latency_file_line_non_numeric_first Error: "abc,789"
test_parse_latency_file_line_non_numeric_second Error: "123,xyz"
test_parse_latency_file_line_extra_commas Error: "1,2,3"

Enhanced end-to-end streaming tests (2 tests)

Both test_one_way_streaming_captures_send_timestamp and
test_round_trip_streaming_captures_send_timestamp now validate:

  1. All timestamp_ns values fall within the test execution window
    (before_ns <= ts <= after_ns)
  2. Timestamps are not all identical (which was the original bug)

Pre-existing timestamp API tests (3 tests in results.rs)

  • test_new_uses_provided_send_timestamp
  • test_new_combined_uses_provided_send_timestamp
  • test_current_timestamp_ns_returns_recent_value

Documentation Added

README.md: Streaming Output Columns

Added a new "Streaming Output Columns" section documenting all six
per-message streaming columns with types, descriptions, and nullable
semantics:

Column Type Description
timestamp_ns u64 Wall-clock time message was sent
message_id u64 Zero-based message identifier
mechanism string IPC mechanism name
message_size u64 Payload size in bytes
one_way_latency_ns u64/null One-way latency
round_trip_latency_ns u64/null Round-trip latency

Includes a note on timestamp_ns accuracy for one-way tests
(wall-clock / monotonic clock mixing).


Validation

Unit and Integration Tests

$ cargo test
test result: ok. 265 passed; 0 failed; 1 ignored
(+ 42 doc tests, 27 integration tests — 334 total)

$ cargo clippy --all-targets -- -D warnings
# zero warnings

$ cargo fmt --check
# clean

$ scripts/msrv-check.sh
[MSRV] Rust 1.70 build/tests passed.

Functional Verification

3-second duration benchmarks across all mechanisms with streaming
output enabled. Timestamps validated to span the full test window:

Mechanism Mode First Timestamp Last Timestamp Delta (s) Messages
UDS Round-Trip 13:13:48.101628 13:13:51.101572 3.000 213,424
TCP Round-Trip 13:13:55.154037 13:13:58.153894 3.000 132,423
SHM One-Way 13:17:32.079542 13:17:35.079503 3.000 1,317,468
PMQ Round-Trip 13:14:15.407160 13:14:18.362355 2.955 42,687

All mechanisms show timestamps distributed across the full test
duration. PMQ delta is 45ms short due to backpressure from shallow
system queue depth (typically 10 messages).

Before/After: Timestamp Distribution

Before fix (main branch): All timestamp_ns values in streaming
output were within the same second, regardless of test duration. A
3-second test with 200,000+ messages would show all timestamps
clustering around a single epoch second.

After fix: Timestamps span the full test duration. Each message's
timestamp_ns reflects the approximate wall-clock time it was sent,
enabling meaningful time-series analysis of per-message latency data.


Files Changed

File Change
src/benchmark.rs Capture wall_ts inside async round-trip and combined futures before each send(); update one-way file reader to parse new wall_send_ns,latency_ns format; add parse_latency_file_line() with 7 unit tests; enhance 2 end-to-end streaming tests with timestamp validation
src/benchmark_blocking.rs Update blocking one-way file reader to parse new format using parse_latency_file_line()
src/main.rs Both async and blocking server loops write wall_send_ns,latency_ns per line (was latency_ns only)
README.md Add "Streaming Output Columns" section with column definitions and accuracy note
src/results.rs MessageLatencyRecord::new() and new_combined() accept send_timestamp_ns parameter; add current_timestamp_ns() helper; 3 API tests
src/ipc/tcp_socket.rs Refactor is_some()/unwrap() to if let (clippy fix)
src/ipc/unix_domain_socket.rs Same clippy fix
Cargo.toml Pin uuid to <1.21 for MSRV compatibility
.cargo/audit.toml Ignore known MSRV-pinned dependency advisories
CONFIG.md Document sequential test execution
utils/dashboard/README.md Note sequential test execution

Risk Assessment

  • Low risk. The core latency measurement logic (monotonic clock,
    get_monotonic_time_ns()) is untouched. Only the wall-clock
    metadata timestamp in streaming records is changed.
  • Backward compatible. No CLI changes. Streaming output JSON/CSV
    schema is unchanged (same columns, same types). The internal
    server-to-client latency file format changed from latency_ns to
    wall_send_ns,latency_ns, but this file is ephemeral (created and
    deleted within a single benchmark run) and never exposed to users.
  • All 334 tests pass with zero clippy warnings.
  • MSRV 1.70 verified via containerized pre-commit check.

@dustinblack dustinblack left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I closed #92 because fresh testing showed the problem was no longer reproducible. Issue #106 was then created the following day as the basis for this PR, but it contains no evidence of an actual problem — no reproduction steps, no sample output showing incorrect timestamps, and no before/after data.

I'd suggest closing this PR and #106 unless concrete evidence can be provided in #106 demonstrating that the specific code paths being fixed here actually produce incorrect timestamps on the current main branch — similar to what I provided in #92.

@mcurrier2

Copy link
Copy Markdown
Collaborator Author

These are valid issues/problems. I added to issue #106 showing the problems and before/after scenarios.

@dustinblack dustinblack left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the evidence to #106 — the before/after data clearly demonstrates the problem across round-trip, combined, and one-way code paths. The bugs are real and the fixes are technically sound. Withdrawing my earlier suggestion to close.

Technical assessment of the changes:

The fixes are correct. Moving SystemTime::now() capture from record-creation time to send time (via the send_timestamp_ns parameter on new()/new_combined()) is the right approach. The one-way server-side approximation (wall_now - latency) mixes clock domains but is clearly documented and is the best available without clock sync. The parse_latency_file_line() parser is clean, and the test coverage is thorough.

Before this can be approved, it needs:

  1. Rebase on main. This branch is based on e826b20, missing PRs #103 and #104. The clippy refactors and .cargo/audit.toml are already on main and will drop out. CI failures should resolve.

  2. Coordinate file format with PR #109. This PR changes the server latency file from latency_ns to wall_send_ns,latency_ns. PR #109's write_latency_buffer() writes the old single-field format. Whichever merges second needs to match. Since #109 is closer to approval, consider whether this PR should rebase on top of #109 or vice versa.

  3. Remove documentation that overlaps with PR #108. The "Streaming Output Columns" and "Test Execution Order" README sections appear in both PRs. Whichever PR owns those sections, the other should drop them to avoid merge conflicts. Since those sections are most relevant here, I'd suggest keeping them in this PR and removing them from #108.

mcurrier2 added a commit that referenced this pull request Mar 25, 2026
Remove sections that belong in the streaming-timestamps PR (#105),
not this buffer-sizing PR:

- "Streaming Output Columns" table and timestamp_ns accuracy note
- "Test Execution Order" section
- Streaming JSON/CSV description rewording
- Round-trip CLI example comment expansion

README diff now only contains buffer-sizing documentation changes:
auto-sizing table, error prevention updates, and example buffer
size correction.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
ewchong pushed a commit that referenced this pull request Mar 31, 2026
…Hangs (#108)

* fix: Use fixed 64KB buffer for standalone SHM to enable streaming

Previously, standalone SHM mode calculated buffer size to fit ALL
messages (e.g., 6.4MB for 50k messages). This caused the writer to
dump everything instantly while the reader slowly drained using
pthread_cond_timedwait with 500us polling timeouts, leading to huge
accumulated latencies (~489ms for 64B).

The fix uses a fixed 64KB buffer (or 2x message size if larger),
matching container behavior. This enables proper streaming where the
writer blocks when the buffer is full.

Before: 489ms mean latency for 64B/50k messages
After:  1.95ms mean latency for 64B/10k messages (blocking)
        15.85ms mean latency for 64B/10k messages (async)

Also updates test_transport_config_buffer_size_logic to expect the
new fixed-buffer behavior for SHM while keeping the message-count
sizing for TCP/UDS.

Cherry-picked from container-to-container-ipc branch (3b49877).

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor

* test/docs: Add buffer sizing and SHM backpressure tests, update documentation

- Add 9 new tests covering SHM buffer sizing, backpressure, and
  condvar timed-wait behavior:
  - test_shm_large_message_buffer_sizing: verifies 2x msg size path
    when messages exceed 32KB (async)
  - test_shm_duration_mode_uses_fixed_buffer: verifies SHM gets 64KB
    in duration mode, not 1GB TCP/UDS default (async)
  - test_blocking_transport_config_buffer_size_logic: full buffer
    sizing test for blocking mode (SHM, TCP, PMQ, duration)
  - test_blocking_shm_duration_mode_uses_fixed_buffer: SHM 64KB in
    blocking duration mode
  - test_blocking_shm_large_message_buffer_sizing: 2x path (blocking)
  - test_backpressure_with_small_buffer: exercises timed condvar wait
    with 1KB buffer and 20 messages
  - test_payload_integrity_under_backpressure: byte-level payload
    verification through backpressure-induced blocking writes
  - test_ring_buffer_wrap_around_under_backpressure: write_pos wraps
    the circular buffer multiple times under backpressure
  - test_shutdown_detected_during_blocked_write: server closes while
    client is blocked waiting for buffer space
- Update README.md Buffer Size Configuration with per-mechanism
  auto-sizing table and updated error prevention guidance
- Update CONFIG.md SHM defaults from 8192 to 64KB (auto) and add
  automatic buffer sizing explanation
- All tests passing, clippy clean

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor

* test/docs: Add coverage tests and fix stale buffer sizing documentation

- Add test_user_buffer_size_overrides_shm_default (async + blocking):
  verifies user-provided --buffer-size overrides SHM's 64KB default
- Add test_shm_buffer_sizing_at_32kb_boundary (async + blocking):
  tests exact transition where 2*(msg_size+64) crosses 64KB threshold
- Add test_high_volume_condvar_stress: 100 messages through 512-byte
  buffer to stress pthread_cond_timedwait retry loop
- Update create_transport_config_internal doc comment in benchmark.rs
  to describe per-mechanism buffer sizing (SHM, PMQ, TCP/UDS)
- Update Adaptive Buffer Sizing doc in benchmark_blocking.rs to
  describe per-mechanism behavior instead of vague description
- Fix README.md example output: SharedMemory buffer size 10240000 ->
  65536 to reflect new fixed 64KB auto-sizing
- All 340 tests passing, zero clippy warnings

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor

* fix(msrv): Pin transitive dependencies to maintain Rust 1.70 compatibility

The CI MSRV job removes Cargo.lock and resolves fresh dependencies.
Several transitive dependencies recently bumped their MSRV above 1.70:

- uuid 1.21+ requires Rust 1.85 → pinned to <1.21
- tempfile 3.25+ pulls getrandom >=0.3,<0.5 which resolves to 0.4.x
  (edition 2024, unparseable by Rust 1.70's cargo) → pinned to <3.25
- zmij 1.0.20+ requires Rust 1.71 → pinned to =1.0.19
- quote 1.0.45+ requires Rust 1.71 → pinned to =1.0.44
- syn 2.0.115+ requires Rust 1.71 → pinned to =2.0.114
- unicode-ident 1.0.23+ requires Rust 1.71 → pinned to =1.0.22

Verified: MSRV builds and tests pass both with and without Cargo.lock
in a Rust 1.70 container. Local clippy, fmt, and tests all clean.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor

* ci: trigger CI rebuild with updated MSRV dependency pins

No code changes. Forces new CI run to pick up dependency pins
from commit cd28295 (uuid <1.21, tempfile <3.25, zmij =1.0.19,
quote =1.0.44, syn =2.0.114, unicode-ident =1.0.22).

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor

* fix: Remove out-of-scope condvar/polling code per review

Remove container-IPC code that was out of scope for issue #107
(buffer sizing fix). This scopes the PR to items 4-5 only.

Removed:
- write_data_polling() and read_data_polling() fallback functions
- pthread_cond_timedwait (reverted to pthread_cond_wait)
- Broken-condvar detection (100-iteration/10ms heuristic)
- Mutex-lock-failure fallbacks to polling paths
- 30-second timeout counters (wait_count > 60000)
- test_high_volume_condvar_stress test

Restored from main (PR #104):
- write_data_blocking() signature with timestamp_offset parameter
  so latency measurement excludes backpressure wait time
- read_data_blocking() with clean pthread_cond_wait
- Proper pthread_cond_signal in both write and read paths

The two functional regressions cited in review are resolved:
1. Timestamp regression: write_data_polling() lacked timestamp_offset,
   but that function no longer exists. The only write path now refreshes
   the timestamp after the condvar wait.
2. Missing condvar signal: write_data_polling() never called
   pthread_cond_signal(&data_ready), but that function no longer exists.
   The only write path signals after every write.

All tests passing (42/42). No clippy warnings.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor

* docs: Remove out-of-scope README sections per review

Remove sections that belong in the streaming-timestamps PR (#105),
not this buffer-sizing PR:

- "Streaming Output Columns" table and timestamp_ns accuracy note
- "Test Execution Order" section
- Streaming JSON/CSV description rewording
- Round-trip CLI example comment expansion

README diff now only contains buffer-sizing documentation changes:
auto-sizing table, error prevention updates, and example buffer
size correction.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor

* fix: Unify message overhead constant across buffer sizing paths

Replace hardcoded 64 and 32 values with the MESSAGE_OVERHEAD
constant in both benchmark.rs and benchmark_blocking.rs:

- TCP/UDS msg-count sizing: was hardcoded 64, now MESSAGE_OVERHEAD
- SHM logging/validation: was hardcoded 32, now MESSAGE_OVERHEAD
- Add comment explaining what MESSAGE_OVERHEAD covers: 8 (id) +
  8 (timestamp) + 8 (bincode vec length) + 1 (message type) +
  4 (ring buffer length prefix) = 29 bytes, rounded to 64

Addresses review feedback about inconsistent overhead values.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
…ages

Pass send timestamp to MessageLatencyRecord instead of capturing it when
the record is created. This makes the timestamp represent when the message
was sent, so gaps between timestamps now match the actual latency.

Changes:
- MessageLatencyRecord::new() and new_combined() now require send_timestamp_ns
  parameter instead of capturing SystemTime::now() internally
- Added MessageLatencyRecord::current_timestamp_ns() helper for call sites
  that need to capture the current wall-clock time
- Updated all streaming record creation sites in benchmark.rs and
  benchmark_blocking.rs to capture and pass send timestamps

Cherry-picked from container-to-container-ipc branch (6cb8f9b).
Host-container-specific changes excluded (host_container.rs not present).

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
…n documentation

- Add 3 new unit tests for MessageLatencyRecord timestamp handling:
  test_new_uses_provided_send_timestamp, test_new_combined_uses_provided_send_timestamp,
  test_current_timestamp_ns_returns_recent_value
- Add 2 end-to-end streaming tests in benchmark.rs:
  test_one_way_streaming_captures_send_timestamp,
  test_round_trip_streaming_captures_send_timestamp
- Document sequential one-way/round-trip test execution in README.md
  (new "Test Execution Order" section), CONFIG.md, dashboard README,
  and run() doc comments in benchmark.rs and benchmark_blocking.rs
- All 332 tests pass (331 passed, 1 ignored), zero clippy warnings

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
…aths

The previous commit (68afe07) added a send_timestamp_ns parameter to
MessageLatencyRecord but only the blocking round-trip path was actually
capturing timestamps at send time. The async round-trip, async combined,
and all one-way paths were still using current_timestamp_ns() at
record-creation time (post-test), causing all timestamps within a run
to cluster into the same second.

Changes:
- Async round-trip: capture wall-clock timestamp inside the spawned
  client future before each send(), return Vec<(Duration, u64)>
- Async combined: same pattern for one-way latency vector
- One-way (async + blocking): server now writes "wall_send_ns,latency_ns"
  per line (wall_send_ns = wall_clock_now - latency); client readers
  parse and use the server-computed send timestamp
- Add parse_latency_file_line() with 7 unit tests covering valid input,
  missing commas, empty lines, non-numeric values, and extra commas
- Enhance existing end-to-end streaming tests to validate timestamps
  fall within the test execution window and are not all identical
- Document streaming output column definitions in README.md including
  timestamp_ns semantics and accuracy note for one-way clock mixing

All 265+ unit tests pass, clippy clean, no scope creep.

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
@mcurrier2 mcurrier2 force-pushed the fix/streaming-timestamps branch from dfbf5c5 to c3c16a3 Compare April 7, 2026 16:07
@github-actions

github-actions Bot commented Apr 7, 2026

Copy link
Copy Markdown
⚠️ **ERROR:** Code formatting issues detected. Please run `cargo fmt --all` locally and commit the changes.

@mcurrier2

Copy link
Copy Markdown
Collaborator Author
  1. Rebase on main

Done. Rebased onto current origin/main which includes PRs #103, #104, #108, and #109. The three commits that were already upstream (clippy refactors, uuid pin, audit.toml) dropped out automatically during rebase. CI failures from those should be resolved. Branch is now 3 commits ahead of main.

  1. Coordinate file format with PR perf/server-latency-buffering : Buffer latencies in memory instead of per-message file I/O #109

Resolved during rebase. PR #109 introduced write_latency_buffer() which buffered latencies in memory as Vec and wrote single-field lines (latency_ns). During rebase I updated the buffer type to Vec<(u64, u64)> storing (wall_send_ns, latency_ns) pairs, and updated write_latency_buffer() to write the two-field "wall_send_ns,latency_ns" format that parse_latency_file_line() expects. The should_buffer_latency() helper is preserved for canary message filtering. All existing latency buffer tests were updated to use the two-field format, including the round-trip parse test which now uses parse_latency_file_line() directly to verify end-to-end compatibility. All 42 tests pass.

  1. Remove documentation that overlaps with PR Fix/buffer size overloading - Fix SHM Buffer Over-Sizing and Condvar Hangs #108

Already handled. The "Streaming Output Columns" and "Test Execution Order" README sections were removed from PR #108 per reviewer feedback before it was merged. Those sections now exist only in this PR, which is where they belong. Verified after rebase — no duplication in the diff against main.

@mcurrier2 mcurrier2 requested review from dustinblack, ewchong and sberg-rh and removed request for sberg-rh April 7, 2026 16:45
@sberg-rh

sberg-rh commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

A couple of minor observations — not blocking

These are both minor and probably fine as-is, but wanted to flag them in case they're worth a quick look.

1. Wall-clock timestamp captured slightly late on server side (src/main.rs:699-703, 904-908)

wall_now_ns is captured after the should_buffer_latency check rather than immediately after receive_time_ns. Since the send timestamp is back-computed as wall_now - latency, any delay between capturing receive_time_ns and wall_now_ns introduces a small drift. Would it be more precise to capture wall_now_ns right next to receive_time_ns, before the if branch?

Likely negligible for most benchmark scenarios, but curious if this was intentional or just ordering convenience.

2. parse_latency_file_line may not handle Windows line endings (src/benchmark.rs:258)

splitn(2, ',') on a line like "100,200\r" would try to parse "200\r" as a u64, which would fail with a confusing error. Would adding a .trim() on the input line before splitting be worth it for cross-platform robustness? Might not matter if latency files are always generated and consumed on the same system.

- Move SystemTime::now() capture to immediately after
  get_monotonic_time_ns() in both blocking and async server paths,
  eliminating minor drift from the should_buffer_latency branch
- Add .trim() to parse_latency_file_line() input for cross-platform
  robustness against Windows \r\n line endings
- All tests passing

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
@github-actions

Copy link
Copy Markdown

📈 Changed lines coverage: 88.89% (56/63)

🚨 Uncovered lines in this PR

  • src/benchmark.rs: 1549, 1557
  • src/benchmark_blocking.rs: 1143, 1233-1234, 1254, 1306

📊 Code Coverage Summary

File Line Coverage Uncovered Lines
src/benchmark.rs 83.64%
(506/605)
75, 78, 89, 93, 102, 105, 107, 124, 422, 427-432, 439-444, 511-514, 619, 703, 709-711, 715-717, 737, 806-808, 813, 834, 839, 857, 963, 967-970, 972, 981-984, 986, 1062, 1093, 1096, 1098-1099, 1108-1109, 1251, 1264, 1281, 1404, 1413, 1415-1416, 1419-1420, 1426-1427, 1432-1433, 1435, 1440-1441, 1445-1447, 1452-1453, 1456-1457, 1489, 1547-1551, 1553-1559, 1561, 1564, 1721, 1736
src/benchmark_blocking.rs 73.50%
(319/434)
97, 111, 127, 263, 369, 375-377, 380-382, 402, 434, 488, 587, 600, 614, 644-647, 732-735, 754, 758, 773, 815-817, 820, 823-825, 827, 830, 832-836, 838-839, 847-851, 853-857, 860-861, 865-866, 901, 950, 1029, 1040, 1070, 1073, 1138-1143, 1145, 1200-1203, 1208, 1221, 1224-1227, 1231, 1233-1236, 1238, 1240-1241, 1243-1244, 1247, 1249-1254, 1256, 1260-1261, 1263, 1265, 1289, 1301-1306, 1308, 1328-1331
src/cli.rs 92.39%
(85/92)
630, 729, 769, 771, 792-794
src/execution_mode.rs 100.00%
(14/14)
``
src/ipc/mod.rs 65.28%
(47/72)
115, 425, 427-430, 740-741, 756-757, 775-776, 807, 810, 813, 818, 845-846, 860, 862, 882, 884, 1007-1009
src/ipc/posix_message_queue.rs 46.09%
(59/128)
139-140, 213-215, 217, 224, 229, 332-335, 337, 345, 437, 441-442, 446, 449-452, 454-458, 539, 679, 782, 789-790, 807-808, 819-820, 831-832, 849-850, 906, 910-911, 914-919, 921-923, 927, 929-931, 933, 935-937, 941-943, 945-947, 994-995, 1017
src/ipc/posix_message_queue_blocking.rs 81.94%
(127/155)
172, 182, 221, 251-255, 274, 325, 368, 387-390, 416-418, 422-423, 425-426, 436, 455, 457-458, 460-461
src/ipc/shared_memory.rs 69.36%
(163/235)
61, 141, 145, 246-247, 257-258, 262, 390-391, 417-419, 421, 439-441, 443-444, 446-450, 467, 474, 480, 483-484, 488, 492, 496-497, 502-503, 666-667, 670-671, 674, 676, 681-682, 709-710, 713-714, 721-723, 725, 727-732, 734-735, 738-739, 741-745, 752, 782, 784-785, 787, 791
src/ipc/shared_memory_blocking.rs 78.87%
(209/265)
196-198, 200-201, 204-206, 209-210, 212, 217, 219, 223-225, 230, 238-240, 243-245, 248-249, 251, 254, 257-258, 261-262, 266-267, 269, 273-274, 276, 311-312, 378-379, 403-407, 498, 506, 556, 573, 660, 726, 789, 798, 808, 830
src/ipc/shared_memory_direct.rs 83.80%
(150/179)
372-375, 444-451, 455, 482, 506-509, 513-514, 556-557, 569, 598, 605-606, 629-630, 636
src/ipc/tcp_socket.rs 59.43%
(63/106)
31-32, 61, 96, 113-114, 118, 124-125, 129, 136-137, 141, 147-148, 152, 171-172, 175-177, 184-185, 188, 362-363, 366-367, 370-371, 376-377, 422, 429, 447-449, 478, 480-482, 484, 487
src/ipc/tcp_socket_blocking.rs 97.62%
(82/84)
134, 159
src/ipc/unix_domain_socket.rs 59.43%
(63/106)
29-30, 58, 93, 103, 122-123, 127, 133-134, 138, 145-146, 150, 156-157, 161, 180-181, 184-186, 193-194, 197, 346-347, 350-351, 354-355, 360-361, 412-414, 443, 445-447, 449, 452, 468
src/ipc/unix_domain_socket_blocking.rs 94.34%
(100/106)
276-277, 283-285, 287
src/logging.rs 100.00%
(13/13)
``
src/main.rs 46.11%
(166/360)
84-86, 88, 125-126, 136-140, 144-146, 148-149, 151-152, 172-175, 199-203, 211, 217, 220, 225-228, 233-234, 240, 246, 248-250, 252, 258-259, 265, 270, 273-274, 278, 280-281, 285-286, 288, 294, 298-299, 301-306, 308-309, 312, 321, 324-325, 328, 375-378, 385, 387-391, 394-397, 399-400, 402-403, 405, 407-413, 417, 419-422, 425, 429-431, 435, 437, 440, 444, 449-452, 458-459, 465-466, 472, 474-475, 479, 481, 486-488, 492, 495-496, 498-499, 504, 506-508, 512-513, 515, 522, 527-528, 530-535, 537-538, 542, 551, 554-555, 558, 560, 579, 586, 590-592, 594, 624-625, 633, 666, 717, 721, 724-727, 783-786, 823-824, 831-832, 835, 862-863, 866, 918-919, 923-926, 948, 975, 984, 989, 994-995
src/metrics.rs 79.79%
(150/188)
455-460, 493-494, 552, 558, 579-582, 732-734, 736, 768, 788, 833, 838, 881, 904, 923-924, 926-927, 930-932, 952, 980, 984, 1005, 1007-1008, 1013
src/results.rs 56.63%
(252/445)
726, 735-737, 739-740, 743-744, 747, 769, 772-773, 776, 778, 781, 785-790, 800-801, 804-809, 826, 838-839, 841, 843, 846-847, 849, 853, 880, 904-906, 909-910, 914-916, 919, 945, 950, 955, 961, 980, 982-983, 985, 987-991, 993, 995-996, 1030, 1071-1072, 1075, 1081-1082, 1086, 1090-1092, 1094-1095, 1119-1123, 1126-1129, 1132-1139, 1149-1150, 1169-1170, 1172-1176, 1178, 1195-1196, 1198-1203, 1205, 1223, 1225-1230, 1248, 1251, 1267-1268, 1283-1285, 1287-1289, 1291-1292, 1294-1295, 1297-1298, 1300, 1302-1303, 1305-1308, 1310-1312, 1314-1316, 1319, 1323-1324, 1332-1337, 1339-1340, 1344-1345, 1349-1351, 1353, 1357-1358, 1367-1370, 1374-1376, 1380, 1382, 1385, 1390-1391, 1396, 1403-1407, 1409, 1607-1608, 1828-1829, 1831-1832, 1837
src/results_blocking.rs 95.48%
(296/310)
489-490, 492-493, 544, 769, 774, 779, 815, 818-819, 827-828, 886
src/utils.rs 70.73%
(29/41)
71, 143, 147-149, 153, 159, 198-202
Total 73.46%
(2893/3938)

@mcurrier2

mcurrier2 commented Apr 10, 2026

Copy link
Copy Markdown
Collaborator Author

Accepted both suggestions Shawn.

Fix 1 — Wall-clock timestamp captured immediately (blocking path, same pattern in async):

Before:

let receive_time_ns = get_monotonic_time_ns();
let latency_ns = receive_time_ns.saturating_sub(message.timestamp);
if should_buffer_latency(latency_file_path.is_some(), message.id) {
let wall_now_ns = SystemTime::now() // <-- captured LATE
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_nanos() as u64;
let wall_send_ns = wall_now_ns.saturating_sub(latency_ns);
latency_buffer.push((wall_send_ns, latency_ns));
}
After:

let receive_time_ns = get_monotonic_time_ns();
let wall_now_ns = SystemTime::now() // <-- captured IMMEDIATELY
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_nanos() as u64;
let latency_ns = receive_time_ns.saturating_sub(message.timestamp);
if should_buffer_latency(latency_file_path.is_some(), message.id) {
let wall_send_ns = wall_now_ns.saturating_sub(latency_ns);
latency_buffer.push((wall_send_ns, latency_ns));
}

Fix 2 — Trim input for Windows line endings:

Before:

pub fn parse_latency_file_line(line: &str) -> anyhow::Result<(u64, u64)> {
let parts: Vec<&str> = line.splitn(2, ',').collect();

After:

pub fn parse_latency_file_line(line: &str) -> anyhow::Result<(u64, u64)> {
let line = line.trim();
let parts: Vec<&str> = line.splitn(2, ',').collect();

The remaining diff lines are cargo fmt reformatting (collapsing multi-line expressions that fit on one line).

@dustinblack dustinblack left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous review items addressed: rebased on main (including merged #108 and #109), file format cleanly integrated with #109's buffering, documentation overlap resolved, and sberg-rh's feedback incorporated.

CI is fully green. Approved.

@dustinblack dustinblack enabled auto-merge (squash) April 13, 2026 11:57

@sberg-rh sberg-rh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dustinblack dustinblack merged commit 2bbcd52 into main Apr 13, 2026
12 checks passed
@dustinblack dustinblack deleted the fix/streaming-timestamps branch April 13, 2026 13:20
sberg-rh added a commit that referenced this pull request Apr 14, 2026
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter
to MessageLatencyRecord::new(). Update all 4 call sites in standalone
client to capture wall-clock timestamp at send time and pass it as
the 6th argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dustinblack pushed a commit that referenced this pull request May 11, 2026
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter
to MessageLatencyRecord::new(). Update all 4 call sites in standalone
client to capture wall-clock timestamp at send time and pass it as
the 6th argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sberg-rh added a commit that referenced this pull request May 28, 2026
* feat(cli): Add standalone --server and --client modes with full reporting

Add the ability to run the benchmark server and client as independent
processes, enabling cross-environment IPC testing (e.g., host and
container). Relates to #11.

Standalone mode features:
- --server flag starts a server that listens for client connections
- --client flag connects to a running server with retry logic
  (100ms backoff, 30s timeout)
- Both async (Tokio) and blocking (std) execution modes supported
- Duration (-d) and message-count (-i) modes both supported
- Default transport endpoints work without extra flags
- Endpoint flags (--socket-path, --shared-memory-name,
  --message-queue-name) promoted to user-facing

Reporting integration:
- Full ResultsManager/MetricsCollector integration for structured
  output (JSON, streaming CSV, console summary with HDR percentiles)
- Server-side one-way latency measurement using monotonic clock
  (accurate for same-host and container scenarios)
- Round-trip latency with per-message streaming support

Code quality:
- Shared helpers: dispatch_server_message(), retry constants
- 25 tests covering CLI parsing, transport config, server dispatch,
  connection retry, shutdown, duration mode, one-way, round-trip
- Explicit MessageType::Shutdown on client disconnect

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Support send_delay and include_first_message in standalone client

- Add --send-delay support: inserts a configurable pause after each
  message send (blocking uses thread::sleep, async uses tokio::sleep)
- Add --include-first-message support: when false (default), sends a
  canary message before measurement to warm up the connection, matching
  the existing BenchmarkRunner behavior
- Applied to both one-way and round-trip tests in both blocking and
  async client paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: Eliminate per-message heap allocation in standalone client

Reuse a single Message struct across loop iterations instead of
calling Message::new() with payload.clone() on every send. The
message id and timestamp are updated in-place before each send.

This removes one Vec<u8> heap allocation per message in the
measurement loop, reducing allocation overhead that can skew
latency results, especially for small messages.

Applied to both one-way and round-trip tests in both blocking
and async client paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Add concurrency support for standalone client/server

Server-side multi-accept:
- TCP and UDS servers now accept multiple concurrent connections,
  spawning a handler thread per client with its own MetricsCollector
- Grace period after first client prevents premature server exit
- SHM and PMQ fall back to single-client mode with a warning
- Server aggregates one-way latency metrics across all handlers

Client-side multi-threaded execution:
- Blocking client spawns N worker threads, each with its own transport
  connection, MetricsCollector, and message loop
- Async client uses tokio::task::JoinSet for concurrent workers
- Results aggregated via MetricsCollector::aggregate_worker_metrics()
- Per-message streaming disabled for concurrent mode (aggregated only)

Transport additions:
- BlockingTcpSocket::from_stream() wraps pre-accepted TcpStream
- BlockingUnixDomainSocket::from_stream() wraps pre-accepted UnixStream

Shared helpers:
- handle_client_connection() -- per-client message dispatch and metrics
- aggregate_and_print_server_metrics() -- shared aggregation logic

Tests:
- test_standalone_concurrent_tcp_round_trip (3 concurrent clients)
- test_handle_client_connection_round_trip (dispatch correctness)
- test_handle_client_connection_one_way_metrics (metrics recording)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Add coverage for concurrency, from_stream, and aggregation

- test_standalone_concurrent_tcp_one_way: multi-accept server with 2
  concurrent one-way clients, verifying server-side metrics recording
- test_tcp_from_stream_send_receive: BlockingTcpSocket::from_stream()
  full send/receive round-trip
- test_uds_from_stream_send_receive: BlockingUnixDomainSocket::from_stream()
  full send/receive round-trip (unix-only)
- test_concurrency_forced_to_one_for_shm: CLI parsing for SHM with
  concurrency > 1
- test_aggregate_and_print_empty_collectors: empty input edge case
- test_aggregate_and_print_single_collector: single collector with data

Total binary tests: 34.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Add async multi-accept server and remove unused parameter

- Add multi-accept support for async TCP and UDS servers, matching
  the blocking server's concurrency support. Uses tokio::net listeners
  with spawn_blocking for per-client handler threads.
- Remove unused _args parameter from run_standalone_server_async
- Replace inline latency printing in async server with shared
  print_server_one_way_latency helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Use OS-assigned ports in tests and document grace period

- Replace all 12 hardcoded test ports (18301-18314) with OS-assigned
  ports via get_free_port() helper (binds to port 0, extracts assigned
  port). Prevents port conflicts in parallel test runs and with other
  processes.
- Extract 2-second multi-accept grace period into
  SERVER_ACCEPT_GRACE_PERIOD constant with documentation explaining
  the behavior and limitation.
- Document the grace period in --server CLI help text so users know
  concurrent clients should connect within 2 seconds of each other.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Set streams to blocking mode after tokio into_std conversion

tokio::net::TcpStream::into_std() leaves the stream in non-blocking
mode (set by tokio for epoll/kqueue). The blocking transport's
read_exact/write_all calls then fail with WouldBlock errors,
causing immediate disconnection.

Fix: call set_nonblocking(false) on streams after into_std() in
both TCP and UDS async multi-accept servers.

Add test_standalone_async_concurrent_tcp_round_trip to exercise
the async multi-accept path (tokio accept + spawn_blocking +
from_stream + handle_client_connection).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Strengthen test assertions for real-world correctness

- test_standalone_blocking_tcp_one_way: verify server received exact
  message count with correct sequential IDs, add shutdown message
- test_standalone_blocking_tcp_duration_round_trip: verify response
  IDs match requests, assert count > 10 for 200ms test, add shutdown
- test_standalone_blocking_tcp_duration_one_way: verify server received
  exact count with sequential IDs, assert count > 10 for 200ms test
- test_concurrency_forced_to_one_for_shm: test actual concurrency
  forcing logic instead of just CLI parsing
- test_standalone_concurrent_tcp_one_way: assert exact message count
  per handler instead of just "greater than zero"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Address round 2 review feedback

- Clean up garbled doc comment on async concurrent test (editing
  artifacts from multiple rewrites)
- Replace silent panic swallowing in async multi-accept servers:
  try_join_next().transpose() silently dropped JoinErrors from
  panicked handler tasks. Now logs warnings via warn!().
- Extract effective_concurrency() helper to deduplicate the
  concurrency-forcing logic (was copied in blocking client, async
  client, and test). Test now calls the actual helper instead of
  reimplementing the logic inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Add real-world scenario tests for coverage and correctness

- test_standalone_large_payload_integrity: 4KB payloads with
  recognizable byte pattern, server echoes back, client verifies
  content byte-for-byte to catch corruption
- test_handle_client_connection_filters_canary: verifies warmup
  canary messages (id=u64::MAX) are excluded from one-way metrics
- test_handle_client_connection_mixed_message_types: interleaved
  OneWay and Request messages on a single connection, verifies
  correct metrics recording and response dispatch
- test_aggregate_and_print_multiple_collectors: aggregation across
  2 collectors with different latency distributions
- test_effective_concurrency_all_mechanisms: covers UDS, PMQ, SHM,
  TCP, and concurrency=1 edge case

Total binary tests: 40.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Set accepted streams to blocking mode in multi-accept servers

Accepted TCP/UDS streams inherit non-blocking mode from the
listener (set for the accept poll loop). The handler threads need
blocking mode for the transport's read_exact/write_all operations.

This is the blocking-server equivalent of the async into_std fix
in commit 8723429. Without this fix, standalone server handlers
immediately disconnect from clients.

Applies to both run_standalone_server_blocking_multi_accept_tcp
and _uds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Address review feedback - grace period, quiet flag, mutex safety

- Fix grace period bug: reset timer on every new connection, not just
  the first. Prevents premature server exit between one-way and
  round-trip test phases when using concurrency > 1. Applied to all
  four multi-accept servers (blocking TCP/UDS, async TCP/UDS).
- Honor --quiet flag in standalone server and client. When set,
  suppress all tracing output to stderr.
- Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner())
  instead of unwrap() on mutex locks. If a handler thread panics while
  holding the lock, other threads can still push their metrics instead
  of cascade-panicking.
- Add defensive --shm-direct guard in standalone server and client:
  returns error if --shm-direct is used without --blocking. This is
  normally enforced by main() but the guard protects against future
  refactoring that might change the dispatch order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Address review feedback - multiple correctness and quality fixes

- Fix grace period bug: reset timer on every new connection, not just
  the first. Prevents premature server exit between one-way and
  round-trip test phases when using concurrency > 1. Applied to all
  four multi-accept servers (blocking TCP/UDS, async TCP/UDS).
- Honor --quiet flag in standalone server and client. When set,
  suppress all tracing output to stderr.
- Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner())
  instead of unwrap() on mutex locks in handler threads.
- Add defensive --shm-direct guard in standalone server and client.
- Add socket buffer tuning (recv/send buffer sizes) to multi-accept
  TCP servers to match normal transport behavior.
- Fix integer division remainder: last worker now receives any extra
  messages when msg_count is not evenly divisible by concurrency.
- Document empty response payloads as intentional design matching
  existing benchmark runner behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Add receive_blocking_timed for accurate one-way latency

Add receive_blocking_timed() to the BlockingTransport trait that
captures a monotonic timestamp after raw bytes are read but before
bincode deserialization. This excludes deserialization overhead from
one-way latency measurements.

- Add default implementation on BlockingTransport trait (backward
  compatible -- captures timestamp after full receive)
- Override in TCP, UDS, and SHM blocking transports to place
  timestamp between raw I/O read and deserialization
- SHM-direct uses default (no bincode deserialization to exclude)
- Update handle_client_connection and standalone single-client
  server to use receive_blocking_timed

Impact is most visible with large payloads where deserialization
is non-trivial. 64KB one-way TCP test shows min latency dropped
from 41us (post-deserialize) to 14us (pre-deserialize), a ~27us
improvement representing the bincode deserialization time excluded
from measurement. Mean dropped 28% (73us to 52us) and P99 dropped
14% (132us to 113us).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Update MessageLatencyRecord calls for PR #105 compatibility

PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter
to MessageLatencyRecord::new(). Update all 4 call sites in standalone
client to capture wall-clock timestamp at send time and pass it as
the 6th argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: Move standalone logic from main.rs to library crate

Move ~3100 lines of standalone client/server code from the binary
crate (main.rs) into two new library modules, following the existing
flat-file convention (benchmark.rs/benchmark_blocking.rs pattern).

Structure:
- standalone_server.rs (1982 lines): constants, shared helpers,
  server dispatch, multi-accept TCP/UDS, async server paths
- standalone_client.rs (1146 lines): retry helpers, client dispatch,
  single/concurrent blocking and async paths
- main.rs reduced from ~4200 to ~1120 lines (thin dispatch layer)

Additional changes:
- Promote logging.rs from binary-private to library-public module
- Move set_affinity() to utils.rs as pub function
- All standalone functions now pub for tarpaulin coverage measurement
  and integration test access

No behavioral changes. All 374 tests pass. Benchmark comparison
across 3 runs confirms no performance regression (mean latencies
within 2-5% run-to-run variance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Improve server resilience and error visibility

- Socket configuration failures (set_nonblocking, set_nodelay) in
  multi-accept servers now log a warning and skip the bad connection
  instead of crashing the entire server with ?
- Thread join panics in blocking multi-accept servers now logged
  with warn! instead of silently dropped with let _ =
- Streaming latency record failures in client now logged with
  debug! instead of silently swallowed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Comprehensive coverage improvements for standalone modules

Server tests (standalone_server.rs, 82.7% coverage):
- test_multi_accept_tcp_server_direct: exercises multi-accept TCP directly
- test_single_server_direct: exercises blocking single-client directly
- test_server_blocking_dispatch: exercises dispatch logic
- test_server_blocking_dispatch_uds: exercises UDS dispatch branch
- test_async_multi_accept_tcp_full: exercises async multi-accept TCP
- test_async_single_server_path: exercises async single-client
- test_async_single_server_one_way_metrics: async one-way metrics
- test_async_multi_accept_uds_full: exercises async UDS multi-accept
- test_multi_accept_server_with_delayed_client: slow sender resilience
- test_multi_accept_server_duration_one_way: duration mode with multi-accept
- test_async_multi_accept_server_duration_one_way: async duration mode
- test_handle_client_connection_send_failure: client disconnect error path
- test_single_server_client_disconnect: single server send error path
- test_multi_accept_server_survives_bad_client: garbage input resilience
- test_handle_client_connection_garbage_input: deserialization error path
- test_run_standalone_server_full_dispatch: full entry point dispatch
- test_run_standalone_server_rejects_all_via_dispatch: 'all' validation
- test_run_standalone_server_rejects_shm_direct: shm-direct guard
- test_run_standalone_server_verbose: -vv logging level branches
- test_aggregate_server_metrics_from_handlers: real handler data
- test_print_server_one_way_latency_with_data/zero: print paths

Client tests (standalone_client.rs, 86.3% coverage):
- test_client_blocking_tcp_round_trip/one_way: single client paths
- test_client_blocking_tcp_duration_round_trip/one_way: duration mode
- test_client_blocking_tcp_concurrent_round_trip/one_way: concurrent
- test_client_async_single_round_trip/one_way: async single
- test_client_async_duration_round_trip/one_way: async duration
- test_client_async_concurrent_round_trip/one_way: async concurrent
- test_client_blocking_with_send_delay: send_delay round-trip branch
- test_client_blocking_one_way_with_send_delay: send_delay one-way branch
- test_client_blocking_with_streaming_output: JSON streaming
- test_client_blocking_combined_streaming: combined mode streaming
- test_client_blocking_csv_streaming: CSV streaming
- test_client_blocking_concurrent_duration_one_way: concurrent duration
- test_client_async_concurrent_duration_one_way: async concurrent duration
- test_run_standalone_client_full_dispatch: full entry point dispatch
- test_run_standalone_client_rejects_all_via_dispatch: 'all' validation
- test_run_standalone_client_rejects_shm_direct: shm-direct guard
- test_connect_async_with_retry_succeeds: async retry path

Also: changed tracing .init() to .try_init() with eprintln fallback
in both server and client for test compatibility.

Coverage: standalone_server 82.7%, standalone_client 86.3%, combined 84.8%
Total lib tests: 355.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: Reduce multi-accept server polling interval from 50ms to 5ms

Reduce the non-blocking accept loop sleep from 50ms to 5ms in both
TCP and UDS multi-accept servers. This cuts connection acceptance
latency by 10x with no portability concerns.

Discovered during hands-on validation testing of standalone concurrent
mode, where the 50ms polling interval was the primary contributor to
elevated tail latency under multi-client workloads.

Improvement with -c 4 concurrent clients:
- RT P95: -46% (65.9us -> 35.5us)
- RT P99: -49% (91.4us -> 46.9us)
- Throughput: +66% (94.9 -> 157.1 MB/s)

Single-client workloads also benefit from faster initial connection
acceptance (P99 improved 4-7% across all test modes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Address PR #110 review feedback (issues 1-11)

- Add 60s idle timeout to prevent server hang when no client connects
- Log canary/shutdown send errors (warn for canary, debug for shutdown)
- Wire concurrent streaming: workers collect MessageLatencyRecord and
  stream through ResultsManager after completion
- Reject multiple mechanisms in standalone mode with clear error
- Add debug logging on server receive errors for diagnostics
- Add per-worker warmup in concurrent mode (fixes metadata mismatch)
- Document two-phase reconnection behavior in concurrent mode
- Add SIGINT/SIGTERM signal handler via ctrlc crate for graceful
  server shutdown and resource cleanup
- Add receive_blocking_timed override for PMQ transport to exclude
  deserialization from one-way latency measurement
- Add 11 integration tests spawning separate server/client processes
- Fix misleading test comment on garbage input handler

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI compatibility for MSRV and lint checks

- Drop ctrlc "termination" feature (requires Rust 1.75+ due to nix
  static zeroed); SIGINT still handled, SIGTERM uses default OS behavior
- Remove unused BlockingTcpSocket import in canary failure test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Pin ctrlc to 3.4.5 for MSRV 1.70 compatibility

ctrlc 3.5+ uses std::mem::zeroed() in statics which requires Rust 1.75+.
Version 3.4.5 does not have this issue and supports Rust 1.69+.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Use cross-platform mechanisms in multiple-mechanism rejection tests

Tests used "tcp uds" but UDS is not a valid mechanism on Windows,
causing clap to reject the args before reaching our validation code.
Changed to "tcp shm" which exists on all platforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Use platform-independent temp dir in output file integration test

The test hardcoded /tmp/ which doesn't exist on Windows. Use
std::env::temp_dir() for cross-platform compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Address PR #110 follow-up review (4 additional findings)

- Add sentinel connection in concurrent client to prevent server from
  exiting between one-way and round-trip phases (keeps a connection
  open across both phases so the server's grace period check never
  triggers prematurely)
- Surface accept-loop errors: upgrade from debug to warn logging and
  return Err when accept fails before any client connects
- Fix duration precision loss: use as_secs_f64() instead of as_secs()
  when passing duration to spawned server process
- Propagate latency file write errors instead of silently discarding
- Add integration tests for concurrent both-tests scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: Fix formatting issues caught by CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix streaming output timestamps captured at record-creation time instead of message-send time

3 participants