Fix/streaming timestamps#105
Conversation
dustinblack
left a comment
There was a problem hiding this comment.
I closed #92 because fresh testing showed the problem was no longer reproducible. Issue #106 was then created the following day as the basis for this PR, but it contains no evidence of an actual problem — no reproduction steps, no sample output showing incorrect timestamps, and no before/after data.
I'd suggest closing this PR and #106 unless concrete evidence can be provided in #106 demonstrating that the specific code paths being fixed here actually produce incorrect timestamps on the current main branch — similar to what I provided in #92.
|
These are valid issues/problems. I added to issue #106 showing the problems and before/after scenarios. |
dustinblack
left a comment
There was a problem hiding this comment.
Thanks for adding the evidence to #106 — the before/after data clearly demonstrates the problem across round-trip, combined, and one-way code paths. The bugs are real and the fixes are technically sound. Withdrawing my earlier suggestion to close.
Technical assessment of the changes:
The fixes are correct. Moving SystemTime::now() capture from record-creation time to send time (via the send_timestamp_ns parameter on new()/new_combined()) is the right approach. The one-way server-side approximation (wall_now - latency) mixes clock domains but is clearly documented and is the best available without clock sync. The parse_latency_file_line() parser is clean, and the test coverage is thorough.
Before this can be approved, it needs:
-
Rebase on main. This branch is based on
e826b20, missing PRs #103 and #104. The clippy refactors and.cargo/audit.tomlare already on main and will drop out. CI failures should resolve. -
Coordinate file format with PR #109. This PR changes the server latency file from
latency_nstowall_send_ns,latency_ns. PR #109'swrite_latency_buffer()writes the old single-field format. Whichever merges second needs to match. Since #109 is closer to approval, consider whether this PR should rebase on top of #109 or vice versa. -
Remove documentation that overlaps with PR #108. The "Streaming Output Columns" and "Test Execution Order" README sections appear in both PRs. Whichever PR owns those sections, the other should drop them to avoid merge conflicts. Since those sections are most relevant here, I'd suggest keeping them in this PR and removing them from #108.
Remove sections that belong in the streaming-timestamps PR (#105), not this buffer-sizing PR: - "Streaming Output Columns" table and timestamp_ns accuracy note - "Test Execution Order" section - Streaming JSON/CSV description rewording - Round-trip CLI example comment expansion README diff now only contains buffer-sizing documentation changes: auto-sizing table, error prevention updates, and example buffer size correction. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
…Hangs (#108) * fix: Use fixed 64KB buffer for standalone SHM to enable streaming Previously, standalone SHM mode calculated buffer size to fit ALL messages (e.g., 6.4MB for 50k messages). This caused the writer to dump everything instantly while the reader slowly drained using pthread_cond_timedwait with 500us polling timeouts, leading to huge accumulated latencies (~489ms for 64B). The fix uses a fixed 64KB buffer (or 2x message size if larger), matching container behavior. This enables proper streaming where the writer blocks when the buffer is full. Before: 489ms mean latency for 64B/50k messages After: 1.95ms mean latency for 64B/10k messages (blocking) 15.85ms mean latency for 64B/10k messages (async) Also updates test_transport_config_buffer_size_logic to expect the new fixed-buffer behavior for SHM while keeping the message-count sizing for TCP/UDS. Cherry-picked from container-to-container-ipc branch (3b49877). AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor * test/docs: Add buffer sizing and SHM backpressure tests, update documentation - Add 9 new tests covering SHM buffer sizing, backpressure, and condvar timed-wait behavior: - test_shm_large_message_buffer_sizing: verifies 2x msg size path when messages exceed 32KB (async) - test_shm_duration_mode_uses_fixed_buffer: verifies SHM gets 64KB in duration mode, not 1GB TCP/UDS default (async) - test_blocking_transport_config_buffer_size_logic: full buffer sizing test for blocking mode (SHM, TCP, PMQ, duration) - test_blocking_shm_duration_mode_uses_fixed_buffer: SHM 64KB in blocking duration mode - test_blocking_shm_large_message_buffer_sizing: 2x path (blocking) - test_backpressure_with_small_buffer: exercises timed condvar wait with 1KB buffer and 20 messages - test_payload_integrity_under_backpressure: byte-level payload verification through backpressure-induced blocking writes - test_ring_buffer_wrap_around_under_backpressure: write_pos wraps the circular buffer multiple times under backpressure - test_shutdown_detected_during_blocked_write: server closes while client is blocked waiting for buffer space - Update README.md Buffer Size Configuration with per-mechanism auto-sizing table and updated error prevention guidance - Update CONFIG.md SHM defaults from 8192 to 64KB (auto) and add automatic buffer sizing explanation - All tests passing, clippy clean AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor * test/docs: Add coverage tests and fix stale buffer sizing documentation - Add test_user_buffer_size_overrides_shm_default (async + blocking): verifies user-provided --buffer-size overrides SHM's 64KB default - Add test_shm_buffer_sizing_at_32kb_boundary (async + blocking): tests exact transition where 2*(msg_size+64) crosses 64KB threshold - Add test_high_volume_condvar_stress: 100 messages through 512-byte buffer to stress pthread_cond_timedwait retry loop - Update create_transport_config_internal doc comment in benchmark.rs to describe per-mechanism buffer sizing (SHM, PMQ, TCP/UDS) - Update Adaptive Buffer Sizing doc in benchmark_blocking.rs to describe per-mechanism behavior instead of vague description - Fix README.md example output: SharedMemory buffer size 10240000 -> 65536 to reflect new fixed 64KB auto-sizing - All 340 tests passing, zero clippy warnings AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix(msrv): Pin transitive dependencies to maintain Rust 1.70 compatibility The CI MSRV job removes Cargo.lock and resolves fresh dependencies. Several transitive dependencies recently bumped their MSRV above 1.70: - uuid 1.21+ requires Rust 1.85 → pinned to <1.21 - tempfile 3.25+ pulls getrandom >=0.3,<0.5 which resolves to 0.4.x (edition 2024, unparseable by Rust 1.70's cargo) → pinned to <3.25 - zmij 1.0.20+ requires Rust 1.71 → pinned to =1.0.19 - quote 1.0.45+ requires Rust 1.71 → pinned to =1.0.44 - syn 2.0.115+ requires Rust 1.71 → pinned to =2.0.114 - unicode-ident 1.0.23+ requires Rust 1.71 → pinned to =1.0.22 Verified: MSRV builds and tests pass both with and without Cargo.lock in a Rust 1.70 container. Local clippy, fmt, and tests all clean. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * ci: trigger CI rebuild with updated MSRV dependency pins No code changes. Forces new CI run to pick up dependency pins from commit cd28295 (uuid <1.21, tempfile <3.25, zmij =1.0.19, quote =1.0.44, syn =2.0.114, unicode-ident =1.0.22). AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix: Remove out-of-scope condvar/polling code per review Remove container-IPC code that was out of scope for issue #107 (buffer sizing fix). This scopes the PR to items 4-5 only. Removed: - write_data_polling() and read_data_polling() fallback functions - pthread_cond_timedwait (reverted to pthread_cond_wait) - Broken-condvar detection (100-iteration/10ms heuristic) - Mutex-lock-failure fallbacks to polling paths - 30-second timeout counters (wait_count > 60000) - test_high_volume_condvar_stress test Restored from main (PR #104): - write_data_blocking() signature with timestamp_offset parameter so latency measurement excludes backpressure wait time - read_data_blocking() with clean pthread_cond_wait - Proper pthread_cond_signal in both write and read paths The two functional regressions cited in review are resolved: 1. Timestamp regression: write_data_polling() lacked timestamp_offset, but that function no longer exists. The only write path now refreshes the timestamp after the condvar wait. 2. Missing condvar signal: write_data_polling() never called pthread_cond_signal(&data_ready), but that function no longer exists. The only write path signals after every write. All tests passing (42/42). No clippy warnings. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * docs: Remove out-of-scope README sections per review Remove sections that belong in the streaming-timestamps PR (#105), not this buffer-sizing PR: - "Streaming Output Columns" table and timestamp_ns accuracy note - "Test Execution Order" section - Streaming JSON/CSV description rewording - Round-trip CLI example comment expansion README diff now only contains buffer-sizing documentation changes: auto-sizing table, error prevention updates, and example buffer size correction. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix: Unify message overhead constant across buffer sizing paths Replace hardcoded 64 and 32 values with the MESSAGE_OVERHEAD constant in both benchmark.rs and benchmark_blocking.rs: - TCP/UDS msg-count sizing: was hardcoded 64, now MESSAGE_OVERHEAD - SHM logging/validation: was hardcoded 32, now MESSAGE_OVERHEAD - Add comment explaining what MESSAGE_OVERHEAD covers: 8 (id) + 8 (timestamp) + 8 (bincode vec length) + 1 (message type) + 4 (ring buffer length prefix) = 29 bytes, rounded to 64 Addresses review feedback about inconsistent overhead values. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
…ages Pass send timestamp to MessageLatencyRecord instead of capturing it when the record is created. This makes the timestamp represent when the message was sent, so gaps between timestamps now match the actual latency. Changes: - MessageLatencyRecord::new() and new_combined() now require send_timestamp_ns parameter instead of capturing SystemTime::now() internally - Added MessageLatencyRecord::current_timestamp_ns() helper for call sites that need to capture the current wall-clock time - Updated all streaming record creation sites in benchmark.rs and benchmark_blocking.rs to capture and pass send timestamps Cherry-picked from container-to-container-ipc branch (6cb8f9b). Host-container-specific changes excluded (host_container.rs not present). AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor
…n documentation - Add 3 new unit tests for MessageLatencyRecord timestamp handling: test_new_uses_provided_send_timestamp, test_new_combined_uses_provided_send_timestamp, test_current_timestamp_ns_returns_recent_value - Add 2 end-to-end streaming tests in benchmark.rs: test_one_way_streaming_captures_send_timestamp, test_round_trip_streaming_captures_send_timestamp - Document sequential one-way/round-trip test execution in README.md (new "Test Execution Order" section), CONFIG.md, dashboard README, and run() doc comments in benchmark.rs and benchmark_blocking.rs - All 332 tests pass (331 passed, 1 ignored), zero clippy warnings AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
…aths The previous commit (68afe07) added a send_timestamp_ns parameter to MessageLatencyRecord but only the blocking round-trip path was actually capturing timestamps at send time. The async round-trip, async combined, and all one-way paths were still using current_timestamp_ns() at record-creation time (post-test), causing all timestamps within a run to cluster into the same second. Changes: - Async round-trip: capture wall-clock timestamp inside the spawned client future before each send(), return Vec<(Duration, u64)> - Async combined: same pattern for one-way latency vector - One-way (async + blocking): server now writes "wall_send_ns,latency_ns" per line (wall_send_ns = wall_clock_now - latency); client readers parse and use the server-computed send timestamp - Add parse_latency_file_line() with 7 unit tests covering valid input, missing commas, empty lines, non-numeric values, and extra commas - Enhance existing end-to-end streaming tests to validate timestamps fall within the test execution window and are not all identical - Document streaming output column definitions in README.md including timestamp_ns semantics and accuracy note for one-way clock mixing All 265+ unit tests pass, clippy clean, no scope creep. AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor
dfbf5c5 to
c3c16a3
Compare
|
Done. Rebased onto current origin/main which includes PRs #103, #104, #108, and #109. The three commits that were already upstream (clippy refactors, uuid pin, audit.toml) dropped out automatically during rebase. CI failures from those should be resolved. Branch is now 3 commits ahead of main.
Resolved during rebase. PR #109 introduced write_latency_buffer() which buffered latencies in memory as Vec and wrote single-field lines (latency_ns). During rebase I updated the buffer type to Vec<(u64, u64)> storing (wall_send_ns, latency_ns) pairs, and updated write_latency_buffer() to write the two-field "wall_send_ns,latency_ns" format that parse_latency_file_line() expects. The should_buffer_latency() helper is preserved for canary message filtering. All existing latency buffer tests were updated to use the two-field format, including the round-trip parse test which now uses parse_latency_file_line() directly to verify end-to-end compatibility. All 42 tests pass.
Already handled. The "Streaming Output Columns" and "Test Execution Order" README sections were removed from PR #108 per reviewer feedback before it was merged. Those sections now exist only in this PR, which is where they belong. Verified after rebase — no duplication in the diff against main. |
|
A couple of minor observations — not blocking These are both minor and probably fine as-is, but wanted to flag them in case they're worth a quick look. 1. Wall-clock timestamp captured slightly late on server side (
Likely negligible for most benchmark scenarios, but curious if this was intentional or just ordering convenience. 2.
|
- Move SystemTime::now() capture to immediately after get_monotonic_time_ns() in both blocking and async server paths, eliminating minor drift from the should_buffer_latency branch - Add .trim() to parse_latency_file_line() input for cross-platform robustness against Windows \r\n line endings - All tests passing AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
📈 Changed lines coverage: 88.89% (56/63)🚨 Uncovered lines in this PR
📊 Code Coverage Summary
|
|
Accepted both suggestions Shawn. Fix 1 — Wall-clock timestamp captured immediately (blocking path, same pattern in async):Before: let receive_time_ns = get_monotonic_time_ns(); let receive_time_ns = get_monotonic_time_ns(); Fix 2 — Trim input for Windows line endings:Before: pub fn parse_latency_file_line(line: &str) -> anyhow::Result<(u64, u64)> { After: pub fn parse_latency_file_line(line: &str) -> anyhow::Result<(u64, u64)> { The remaining diff lines are cargo fmt reformatting (collapsing multi-line expressions that fit on one line). |
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter to MessageLatencyRecord::new(). Update all 4 call sites in standalone client to capture wall-clock timestamp at send time and pass it as the 6th argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter to MessageLatencyRecord::new(). Update all 4 call sites in standalone client to capture wall-clock timestamp at send time and pass it as the 6th argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(cli): Add standalone --server and --client modes with full reporting Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container). Relates to #11. Standalone mode features: - --server flag starts a server that listens for client connections - --client flag connects to a running server with retry logic (100ms backoff, 30s timeout) - Both async (Tokio) and blocking (std) execution modes supported - Duration (-d) and message-count (-i) modes both supported - Default transport endpoints work without extra flags - Endpoint flags (--socket-path, --shared-memory-name, --message-queue-name) promoted to user-facing Reporting integration: - Full ResultsManager/MetricsCollector integration for structured output (JSON, streaming CSV, console summary with HDR percentiles) - Server-side one-way latency measurement using monotonic clock (accurate for same-host and container scenarios) - Round-trip latency with per-message streaming support Code quality: - Shared helpers: dispatch_server_message(), retry constants - 25 tests covering CLI parsing, transport config, server dispatch, connection retry, shutdown, duration mode, one-way, round-trip - Explicit MessageType::Shutdown on client disconnect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Support send_delay and include_first_message in standalone client - Add --send-delay support: inserts a configurable pause after each message send (blocking uses thread::sleep, async uses tokio::sleep) - Add --include-first-message support: when false (default), sends a canary message before measurement to warm up the connection, matching the existing BenchmarkRunner behavior - Applied to both one-way and round-trip tests in both blocking and async client paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: Eliminate per-message heap allocation in standalone client Reuse a single Message struct across loop iterations instead of calling Message::new() with payload.clone() on every send. The message id and timestamp are updated in-place before each send. This removes one Vec<u8> heap allocation per message in the measurement loop, reducing allocation overhead that can skew latency results, especially for small messages. Applied to both one-way and round-trip tests in both blocking and async client paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Add concurrency support for standalone client/server Server-side multi-accept: - TCP and UDS servers now accept multiple concurrent connections, spawning a handler thread per client with its own MetricsCollector - Grace period after first client prevents premature server exit - SHM and PMQ fall back to single-client mode with a warning - Server aggregates one-way latency metrics across all handlers Client-side multi-threaded execution: - Blocking client spawns N worker threads, each with its own transport connection, MetricsCollector, and message loop - Async client uses tokio::task::JoinSet for concurrent workers - Results aggregated via MetricsCollector::aggregate_worker_metrics() - Per-message streaming disabled for concurrent mode (aggregated only) Transport additions: - BlockingTcpSocket::from_stream() wraps pre-accepted TcpStream - BlockingUnixDomainSocket::from_stream() wraps pre-accepted UnixStream Shared helpers: - handle_client_connection() -- per-client message dispatch and metrics - aggregate_and_print_server_metrics() -- shared aggregation logic Tests: - test_standalone_concurrent_tcp_round_trip (3 concurrent clients) - test_handle_client_connection_round_trip (dispatch correctness) - test_handle_client_connection_one_way_metrics (metrics recording) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: Add coverage for concurrency, from_stream, and aggregation - test_standalone_concurrent_tcp_one_way: multi-accept server with 2 concurrent one-way clients, verifying server-side metrics recording - test_tcp_from_stream_send_receive: BlockingTcpSocket::from_stream() full send/receive round-trip - test_uds_from_stream_send_receive: BlockingUnixDomainSocket::from_stream() full send/receive round-trip (unix-only) - test_concurrency_forced_to_one_for_shm: CLI parsing for SHM with concurrency > 1 - test_aggregate_and_print_empty_collectors: empty input edge case - test_aggregate_and_print_single_collector: single collector with data Total binary tests: 34. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Add async multi-accept server and remove unused parameter - Add multi-accept support for async TCP and UDS servers, matching the blocking server's concurrency support. Uses tokio::net listeners with spawn_blocking for per-client handler threads. - Remove unused _args parameter from run_standalone_server_async - Replace inline latency printing in async server with shared print_server_one_way_latency helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Use OS-assigned ports in tests and document grace period - Replace all 12 hardcoded test ports (18301-18314) with OS-assigned ports via get_free_port() helper (binds to port 0, extracts assigned port). Prevents port conflicts in parallel test runs and with other processes. - Extract 2-second multi-accept grace period into SERVER_ACCEPT_GRACE_PERIOD constant with documentation explaining the behavior and limitation. - Document the grace period in --server CLI help text so users know concurrent clients should connect within 2 seconds of each other. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Set streams to blocking mode after tokio into_std conversion tokio::net::TcpStream::into_std() leaves the stream in non-blocking mode (set by tokio for epoll/kqueue). The blocking transport's read_exact/write_all calls then fail with WouldBlock errors, causing immediate disconnection. Fix: call set_nonblocking(false) on streams after into_std() in both TCP and UDS async multi-accept servers. Add test_standalone_async_concurrent_tcp_round_trip to exercise the async multi-accept path (tokio accept + spawn_blocking + from_stream + handle_client_connection). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: Strengthen test assertions for real-world correctness - test_standalone_blocking_tcp_one_way: verify server received exact message count with correct sequential IDs, add shutdown message - test_standalone_blocking_tcp_duration_round_trip: verify response IDs match requests, assert count > 10 for 200ms test, add shutdown - test_standalone_blocking_tcp_duration_one_way: verify server received exact count with sequential IDs, assert count > 10 for 200ms test - test_concurrency_forced_to_one_for_shm: test actual concurrency forcing logic instead of just CLI parsing - test_standalone_concurrent_tcp_one_way: assert exact message count per handler instead of just "greater than zero" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Address round 2 review feedback - Clean up garbled doc comment on async concurrent test (editing artifacts from multiple rewrites) - Replace silent panic swallowing in async multi-accept servers: try_join_next().transpose() silently dropped JoinErrors from panicked handler tasks. Now logs warnings via warn!(). - Extract effective_concurrency() helper to deduplicate the concurrency-forcing logic (was copied in blocking client, async client, and test). Test now calls the actual helper instead of reimplementing the logic inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: Add real-world scenario tests for coverage and correctness - test_standalone_large_payload_integrity: 4KB payloads with recognizable byte pattern, server echoes back, client verifies content byte-for-byte to catch corruption - test_handle_client_connection_filters_canary: verifies warmup canary messages (id=u64::MAX) are excluded from one-way metrics - test_handle_client_connection_mixed_message_types: interleaved OneWay and Request messages on a single connection, verifies correct metrics recording and response dispatch - test_aggregate_and_print_multiple_collectors: aggregation across 2 collectors with different latency distributions - test_effective_concurrency_all_mechanisms: covers UDS, PMQ, SHM, TCP, and concurrency=1 edge case Total binary tests: 40. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Set accepted streams to blocking mode in multi-accept servers Accepted TCP/UDS streams inherit non-blocking mode from the listener (set for the accept poll loop). The handler threads need blocking mode for the transport's read_exact/write_all operations. This is the blocking-server equivalent of the async into_std fix in commit 8723429. Without this fix, standalone server handlers immediately disconnect from clients. Applies to both run_standalone_server_blocking_multi_accept_tcp and _uds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Address review feedback - grace period, quiet flag, mutex safety - Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks. If a handler thread panics while holding the lock, other threads can still push their metrics instead of cascade-panicking. - Add defensive --shm-direct guard in standalone server and client: returns error if --shm-direct is used without --blocking. This is normally enforced by main() but the guard protects against future refactoring that might change the dispatch order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Address review feedback - multiple correctness and quality fixes - Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks in handler threads. - Add defensive --shm-direct guard in standalone server and client. - Add socket buffer tuning (recv/send buffer sizes) to multi-accept TCP servers to match normal transport behavior. - Fix integer division remainder: last worker now receives any extra messages when msg_count is not evenly divisible by concurrency. - Document empty response payloads as intentional design matching existing benchmark runner behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Add receive_blocking_timed for accurate one-way latency Add receive_blocking_timed() to the BlockingTransport trait that captures a monotonic timestamp after raw bytes are read but before bincode deserialization. This excludes deserialization overhead from one-way latency measurements. - Add default implementation on BlockingTransport trait (backward compatible -- captures timestamp after full receive) - Override in TCP, UDS, and SHM blocking transports to place timestamp between raw I/O read and deserialization - SHM-direct uses default (no bincode deserialization to exclude) - Update handle_client_connection and standalone single-client server to use receive_blocking_timed Impact is most visible with large payloads where deserialization is non-trivial. 64KB one-way TCP test shows min latency dropped from 41us (post-deserialize) to 14us (pre-deserialize), a ~27us improvement representing the bincode deserialization time excluded from measurement. Mean dropped 28% (73us to 52us) and P99 dropped 14% (132us to 113us). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Update MessageLatencyRecord calls for PR #105 compatibility PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter to MessageLatencyRecord::new(). Update all 4 call sites in standalone client to capture wall-clock timestamp at send time and pass it as the 6th argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: Move standalone logic from main.rs to library crate Move ~3100 lines of standalone client/server code from the binary crate (main.rs) into two new library modules, following the existing flat-file convention (benchmark.rs/benchmark_blocking.rs pattern). Structure: - standalone_server.rs (1982 lines): constants, shared helpers, server dispatch, multi-accept TCP/UDS, async server paths - standalone_client.rs (1146 lines): retry helpers, client dispatch, single/concurrent blocking and async paths - main.rs reduced from ~4200 to ~1120 lines (thin dispatch layer) Additional changes: - Promote logging.rs from binary-private to library-public module - Move set_affinity() to utils.rs as pub function - All standalone functions now pub for tarpaulin coverage measurement and integration test access No behavioral changes. All 374 tests pass. Benchmark comparison across 3 runs confirms no performance regression (mean latencies within 2-5% run-to-run variance). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Improve server resilience and error visibility - Socket configuration failures (set_nonblocking, set_nodelay) in multi-accept servers now log a warning and skip the bad connection instead of crashing the entire server with ? - Thread join panics in blocking multi-accept servers now logged with warn! instead of silently dropped with let _ = - Streaming latency record failures in client now logged with debug! instead of silently swallowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: Comprehensive coverage improvements for standalone modules Server tests (standalone_server.rs, 82.7% coverage): - test_multi_accept_tcp_server_direct: exercises multi-accept TCP directly - test_single_server_direct: exercises blocking single-client directly - test_server_blocking_dispatch: exercises dispatch logic - test_server_blocking_dispatch_uds: exercises UDS dispatch branch - test_async_multi_accept_tcp_full: exercises async multi-accept TCP - test_async_single_server_path: exercises async single-client - test_async_single_server_one_way_metrics: async one-way metrics - test_async_multi_accept_uds_full: exercises async UDS multi-accept - test_multi_accept_server_with_delayed_client: slow sender resilience - test_multi_accept_server_duration_one_way: duration mode with multi-accept - test_async_multi_accept_server_duration_one_way: async duration mode - test_handle_client_connection_send_failure: client disconnect error path - test_single_server_client_disconnect: single server send error path - test_multi_accept_server_survives_bad_client: garbage input resilience - test_handle_client_connection_garbage_input: deserialization error path - test_run_standalone_server_full_dispatch: full entry point dispatch - test_run_standalone_server_rejects_all_via_dispatch: 'all' validation - test_run_standalone_server_rejects_shm_direct: shm-direct guard - test_run_standalone_server_verbose: -vv logging level branches - test_aggregate_server_metrics_from_handlers: real handler data - test_print_server_one_way_latency_with_data/zero: print paths Client tests (standalone_client.rs, 86.3% coverage): - test_client_blocking_tcp_round_trip/one_way: single client paths - test_client_blocking_tcp_duration_round_trip/one_way: duration mode - test_client_blocking_tcp_concurrent_round_trip/one_way: concurrent - test_client_async_single_round_trip/one_way: async single - test_client_async_duration_round_trip/one_way: async duration - test_client_async_concurrent_round_trip/one_way: async concurrent - test_client_blocking_with_send_delay: send_delay round-trip branch - test_client_blocking_one_way_with_send_delay: send_delay one-way branch - test_client_blocking_with_streaming_output: JSON streaming - test_client_blocking_combined_streaming: combined mode streaming - test_client_blocking_csv_streaming: CSV streaming - test_client_blocking_concurrent_duration_one_way: concurrent duration - test_client_async_concurrent_duration_one_way: async concurrent duration - test_run_standalone_client_full_dispatch: full entry point dispatch - test_run_standalone_client_rejects_all_via_dispatch: 'all' validation - test_run_standalone_client_rejects_shm_direct: shm-direct guard - test_connect_async_with_retry_succeeds: async retry path Also: changed tracing .init() to .try_init() with eprintln fallback in both server and client for test compatibility. Coverage: standalone_server 82.7%, standalone_client 86.3%, combined 84.8% Total lib tests: 355. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: Reduce multi-accept server polling interval from 50ms to 5ms Reduce the non-blocking accept loop sleep from 50ms to 5ms in both TCP and UDS multi-accept servers. This cuts connection acceptance latency by 10x with no portability concerns. Discovered during hands-on validation testing of standalone concurrent mode, where the 50ms polling interval was the primary contributor to elevated tail latency under multi-client workloads. Improvement with -c 4 concurrent clients: - RT P95: -46% (65.9us -> 35.5us) - RT P99: -49% (91.4us -> 46.9us) - Throughput: +66% (94.9 -> 157.1 MB/s) Single-client workloads also benefit from faster initial connection acceptance (P99 improved 4-7% across all test modes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Address PR #110 review feedback (issues 1-11) - Add 60s idle timeout to prevent server hang when no client connects - Log canary/shutdown send errors (warn for canary, debug for shutdown) - Wire concurrent streaming: workers collect MessageLatencyRecord and stream through ResultsManager after completion - Reject multiple mechanisms in standalone mode with clear error - Add debug logging on server receive errors for diagnostics - Add per-worker warmup in concurrent mode (fixes metadata mismatch) - Document two-phase reconnection behavior in concurrent mode - Add SIGINT/SIGTERM signal handler via ctrlc crate for graceful server shutdown and resource cleanup - Add receive_blocking_timed override for PMQ transport to exclude deserialization from one-way latency measurement - Add 11 integration tests spawning separate server/client processes - Fix misleading test comment on garbage input handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CI compatibility for MSRV and lint checks - Drop ctrlc "termination" feature (requires Rust 1.75+ due to nix static zeroed); SIGINT still handled, SIGTERM uses default OS behavior - Remove unused BlockingTcpSocket import in canary failure test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Pin ctrlc to 3.4.5 for MSRV 1.70 compatibility ctrlc 3.5+ uses std::mem::zeroed() in statics which requires Rust 1.75+. Version 3.4.5 does not have this issue and supports Rust 1.69+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Use cross-platform mechanisms in multiple-mechanism rejection tests Tests used "tcp uds" but UDS is not a valid mechanism on Windows, causing clap to reject the args before reaching our validation code. Changed to "tcp shm" which exists on all platforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Use platform-independent temp dir in output file integration test The test hardcoded /tmp/ which doesn't exist on Windows. Use std::env::temp_dir() for cross-platform compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Address PR #110 follow-up review (4 additional findings) - Add sentinel connection in concurrent client to prevent server from exiting between one-way and round-trip phases (keeps a connection open across both phases so the server's grace period check never triggers prematurely) - Surface accept-loop errors: upgrade from debug to warn logging and return Err when accept fails before any client connects - Fix duration precision loss: use as_secs_f64() instead of as_secs() when passing duration to spawned server process - Propagate latency file write errors instead of silently discarding - Add integration tests for concurrent both-tests scenario Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: Fix formatting issues caught by CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
Brief description of changes
Type of Change
Testing
Checklist
PR: Fix Streaming Output Timestamps Across All Code Paths
Summary
Streaming output timestamps (
timestamp_nsin JSON/CSV per-messagerecords) were inaccurate across multiple code paths. Instead of
recording when each message was actually sent, timestamps were captured
at record-creation time — either post-test during batch iteration or
at file-read time. This caused all timestamps within a test run to
cluster into the same second, making time-series analysis of streaming
data meaningless.
Because this tool's primary purpose is precision measurement, incorrect
timestamps in the streaming output undermine the validity of per-message
latency records for any downstream analysis or visualization.
Branch:
fix/streaming-timestampsBase:
mainFiles changed: 11 (496 insertions, 60 deletions)
Root Cause
Commit
e0ff7fa(Oct 8, 2025, PR #82) refactored the benchmark runnerto collect latencies inside spawned async futures, then batch-create
MessageLatencyRecordobjects after the futures completed. This movedthe
SystemTime::now()call from inside the hot loop (where itreflected each message's send time) to the post-test iteration (where
it reflected the moment the record was created — all within the same
second).
The initial fix on this branch (
68afe07) changed theMessageLatencyRecordAPI to acceptsend_timestamp_nsas aparameter, but only the blocking round-trip path was actually capturing
timestamps at send time. Three other code path families remained broken.
Problems and Fixes
1. Async round-trip timestamps captured post-test
Problem: The client future returned
Vec<Duration>. After thefuture completed, the post-test loop called
current_timestamp_ns()at record-creation time. All records received the same timestamp.
Fix: Changed the future to capture
current_timestamp_ns()beforeeach
send()and returnVec<(Duration, u64)>. The post-test loopuses the captured timestamp.
Applies to: Both duration-based and count-based async round-trip
loops in
src/benchmark.rs.2. Async combined timestamps captured post-test
Problem: Same pattern as round-trip — the combined test (one-way +
round-trip in a single run) collected latency durations inside the
spawned future and created records post-test with stale timestamps.
Fix: Changed
one_way_latenciestoVec<(Duration, u64)>tocarry the wall-clock timestamp alongside each measurement. The
timestamp is captured before each
send()inside the future.Applies to: Both duration-based and count-based async combined
loops in
src/benchmark.rs.3. One-way timestamps from file-read time instead of send time
Problem: One-way tests use a server process that measures receive
latency and writes results to a temporary file. The original format
was one
latency_nsvalue per line. The client read this filepost-test and called
current_timestamp_ns()at file-read time —all records received the same timestamp.
Fix (server side): Both async and blocking server loops now write
wall_send_ns,latency_nsper line. The wall-clock send time iscomputed as
SystemTime::now() - latency_ns, approximating when themessage entered the IPC channel.
Fix (client side): A new
parse_latency_file_line()functionparses the two-field format. Both async and blocking file readers
use the parsed
wall_send_nsassend_timestamp_ns.Applies to:
src/main.rs(both server loops),src/benchmark.rs(async reader),
src/benchmark_blocking.rs(blocking reader).Known Limitations
Wall-clock / monotonic clock mixing in one-way path
For one-way tests, the server computes the send timestamp by
subtracting the measured monotonic latency from its current wall-clock
time. This mixes two clock domains:
latency_ns=monotonic_receive - monotonic_send(from message timestamp)
wall_send_ns=SystemTime::now() - latency_nsIf NTP adjusts the system clock between message send and receive, the
computed
wall_send_nswill be slightly off. This is the bestapproximation available without clock synchronization between the
client and server processes. The error is bounded by the magnitude
of any NTP adjustment during the test (typically microseconds).
Timestamp capture ordering
In the round-trip and combined futures,
current_timestamp_ns()iscaptured one instruction before
Instant::now(). The wall-clocktimestamp therefore slightly predates the monotonic measurement start.
The gap is single-digit nanoseconds — orders of magnitude below the
IPC latencies being measured.
Tests Added
Unit tests for
parse_latency_file_line(7 tests)test_parse_latency_file_line_valid"170...,42000"test_parse_latency_file_line_zeros"0,0"test_parse_latency_file_line_missing_commatest_parse_latency_file_line_emptytest_parse_latency_file_line_non_numeric_first"abc,789"test_parse_latency_file_line_non_numeric_second"123,xyz"test_parse_latency_file_line_extra_commas"1,2,3"Enhanced end-to-end streaming tests (2 tests)
Both
test_one_way_streaming_captures_send_timestampandtest_round_trip_streaming_captures_send_timestampnow validate:timestamp_nsvalues fall within the test execution window(
before_ns <= ts <= after_ns)Pre-existing timestamp API tests (3 tests in
results.rs)test_new_uses_provided_send_timestamptest_new_combined_uses_provided_send_timestamptest_current_timestamp_ns_returns_recent_valueDocumentation Added
README.md: Streaming Output Columns
Added a new "Streaming Output Columns" section documenting all six
per-message streaming columns with types, descriptions, and nullable
semantics:
timestamp_nsu64message_idu64mechanismstringmessage_sizeu64one_way_latency_nsu64/nullround_trip_latency_nsu64/nullIncludes a note on
timestamp_nsaccuracy for one-way tests(wall-clock / monotonic clock mixing).
Validation
Unit and Integration Tests
Functional Verification
3-second duration benchmarks across all mechanisms with streaming
output enabled. Timestamps validated to span the full test window:
All mechanisms show timestamps distributed across the full test
duration. PMQ delta is 45ms short due to backpressure from shallow
system queue depth (typically 10 messages).
Before/After: Timestamp Distribution
Before fix (main branch): All
timestamp_nsvalues in streamingoutput were within the same second, regardless of test duration. A
3-second test with 200,000+ messages would show all timestamps
clustering around a single epoch second.
After fix: Timestamps span the full test duration. Each message's
timestamp_nsreflects the approximate wall-clock time it was sent,enabling meaningful time-series analysis of per-message latency data.
Files Changed
src/benchmark.rswall_tsinside async round-trip and combined futures before eachsend(); update one-way file reader to parse newwall_send_ns,latency_nsformat; addparse_latency_file_line()with 7 unit tests; enhance 2 end-to-end streaming tests with timestamp validationsrc/benchmark_blocking.rsparse_latency_file_line()src/main.rswall_send_ns,latency_nsper line (waslatency_nsonly)README.mdsrc/results.rsMessageLatencyRecord::new()andnew_combined()acceptsend_timestamp_nsparameter; addcurrent_timestamp_ns()helper; 3 API testssrc/ipc/tcp_socket.rsis_some()/unwrap()toif let(clippy fix)src/ipc/unix_domain_socket.rsCargo.toml<1.21for MSRV compatibility.cargo/audit.tomlCONFIG.mdutils/dashboard/README.mdRisk Assessment
get_monotonic_time_ns()) is untouched. Only the wall-clockmetadata timestamp in streaming records is changed.
schema is unchanged (same columns, same types). The internal
server-to-client latency file format changed from
latency_nstowall_send_ns,latency_ns, but this file is ephemeral (created anddeleted within a single benchmark run) and never exposed to users.