Commit f7f8eb9
authored
Fix/buffer size overloading - Fix SHM Buffer Over-Sizing and Condvar Hangs (#108)
* fix: Use fixed 64KB buffer for standalone SHM to enable streaming
Previously, standalone SHM mode calculated buffer size to fit ALL
messages (e.g., 6.4MB for 50k messages). This caused the writer to
dump everything instantly while the reader slowly drained using
pthread_cond_timedwait with 500us polling timeouts, leading to huge
accumulated latencies (~489ms for 64B).
The fix uses a fixed 64KB buffer (or 2x message size if larger),
matching container behavior. This enables proper streaming where the
writer blocks when the buffer is full.
Before: 489ms mean latency for 64B/50k messages
After: 1.95ms mean latency for 64B/10k messages (blocking)
15.85ms mean latency for 64B/10k messages (async)
Also updates test_transport_config_buffer_size_logic to expect the
new fixed-buffer behavior for SHM while keeping the message-count
sizing for TCP/UDS.
Cherry-picked from container-to-container-ipc branch (3b49877).
AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
* test/docs: Add buffer sizing and SHM backpressure tests, update documentation
- Add 9 new tests covering SHM buffer sizing, backpressure, and
condvar timed-wait behavior:
- test_shm_large_message_buffer_sizing: verifies 2x msg size path
when messages exceed 32KB (async)
- test_shm_duration_mode_uses_fixed_buffer: verifies SHM gets 64KB
in duration mode, not 1GB TCP/UDS default (async)
- test_blocking_transport_config_buffer_size_logic: full buffer
sizing test for blocking mode (SHM, TCP, PMQ, duration)
- test_blocking_shm_duration_mode_uses_fixed_buffer: SHM 64KB in
blocking duration mode
- test_blocking_shm_large_message_buffer_sizing: 2x path (blocking)
- test_backpressure_with_small_buffer: exercises timed condvar wait
with 1KB buffer and 20 messages
- test_payload_integrity_under_backpressure: byte-level payload
verification through backpressure-induced blocking writes
- test_ring_buffer_wrap_around_under_backpressure: write_pos wraps
the circular buffer multiple times under backpressure
- test_shutdown_detected_during_blocked_write: server closes while
client is blocked waiting for buffer space
- Update README.md Buffer Size Configuration with per-mechanism
auto-sizing table and updated error prevention guidance
- Update CONFIG.md SHM defaults from 8192 to 64KB (auto) and add
automatic buffer sizing explanation
- All tests passing, clippy clean
AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
* test/docs: Add coverage tests and fix stale buffer sizing documentation
- Add test_user_buffer_size_overrides_shm_default (async + blocking):
verifies user-provided --buffer-size overrides SHM's 64KB default
- Add test_shm_buffer_sizing_at_32kb_boundary (async + blocking):
tests exact transition where 2*(msg_size+64) crosses 64KB threshold
- Add test_high_volume_condvar_stress: 100 messages through 512-byte
buffer to stress pthread_cond_timedwait retry loop
- Update create_transport_config_internal doc comment in benchmark.rs
to describe per-mechanism buffer sizing (SHM, PMQ, TCP/UDS)
- Update Adaptive Buffer Sizing doc in benchmark_blocking.rs to
describe per-mechanism behavior instead of vague description
- Fix README.md example output: SharedMemory buffer size 10240000 ->
65536 to reflect new fixed 64KB auto-sizing
- All 340 tests passing, zero clippy warnings
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
* fix(msrv): Pin transitive dependencies to maintain Rust 1.70 compatibility
The CI MSRV job removes Cargo.lock and resolves fresh dependencies.
Several transitive dependencies recently bumped their MSRV above 1.70:
- uuid 1.21+ requires Rust 1.85 → pinned to <1.21
- tempfile 3.25+ pulls getrandom >=0.3,<0.5 which resolves to 0.4.x
(edition 2024, unparseable by Rust 1.70's cargo) → pinned to <3.25
- zmij 1.0.20+ requires Rust 1.71 → pinned to =1.0.19
- quote 1.0.45+ requires Rust 1.71 → pinned to =1.0.44
- syn 2.0.115+ requires Rust 1.71 → pinned to =2.0.114
- unicode-ident 1.0.23+ requires Rust 1.71 → pinned to =1.0.22
Verified: MSRV builds and tests pass both with and without Cargo.lock
in a Rust 1.70 container. Local clippy, fmt, and tests all clean.
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
* ci: trigger CI rebuild with updated MSRV dependency pins
No code changes. Forces new CI run to pick up dependency pins
from commit cd28295 (uuid <1.21, tempfile <3.25, zmij =1.0.19,
quote =1.0.44, syn =2.0.114, unicode-ident =1.0.22).
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
* fix: Remove out-of-scope condvar/polling code per review
Remove container-IPC code that was out of scope for issue #107
(buffer sizing fix). This scopes the PR to items 4-5 only.
Removed:
- write_data_polling() and read_data_polling() fallback functions
- pthread_cond_timedwait (reverted to pthread_cond_wait)
- Broken-condvar detection (100-iteration/10ms heuristic)
- Mutex-lock-failure fallbacks to polling paths
- 30-second timeout counters (wait_count > 60000)
- test_high_volume_condvar_stress test
Restored from main (PR #104):
- write_data_blocking() signature with timestamp_offset parameter
so latency measurement excludes backpressure wait time
- read_data_blocking() with clean pthread_cond_wait
- Proper pthread_cond_signal in both write and read paths
The two functional regressions cited in review are resolved:
1. Timestamp regression: write_data_polling() lacked timestamp_offset,
but that function no longer exists. The only write path now refreshes
the timestamp after the condvar wait.
2. Missing condvar signal: write_data_polling() never called
pthread_cond_signal(&data_ready), but that function no longer exists.
The only write path signals after every write.
All tests passing (42/42). No clippy warnings.
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
* docs: Remove out-of-scope README sections per review
Remove sections that belong in the streaming-timestamps PR (#105),
not this buffer-sizing PR:
- "Streaming Output Columns" table and timestamp_ns accuracy note
- "Test Execution Order" section
- Streaming JSON/CSV description rewording
- Round-trip CLI example comment expansion
README diff now only contains buffer-sizing documentation changes:
auto-sizing table, error prevention updates, and example buffer
size correction.
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
* fix: Unify message overhead constant across buffer sizing paths
Replace hardcoded 64 and 32 values with the MESSAGE_OVERHEAD
constant in both benchmark.rs and benchmark_blocking.rs:
- TCP/UDS msg-count sizing: was hardcoded 64, now MESSAGE_OVERHEAD
- SHM logging/validation: was hardcoded 32, now MESSAGE_OVERHEAD
- Add comment explaining what MESSAGE_OVERHEAD covers: 8 (id) +
8 (timestamp) + 8 (bincode vec length) + 1 (message type) +
4 (ring buffer length prefix) = 29 bytes, rounded to 64
Addresses review feedback about inconsistent overhead values.
AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor1 parent 3898cea commit f7f8eb9
7 files changed
Lines changed: 1028 additions & 329 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
172 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
173 | 179 | | |
174 | 180 | | |
175 | 181 | | |
176 | | - | |
177 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
178 | 186 | | |
179 | 187 | | |
180 | 188 | | |
| |||
0 commit comments