Commit a72bedb
Perf/shm latency and compiler optimizations (#116)
* perf: SHM latency optimizations — close gap with reference C implementation
- Add .cargo/config.toml with target-cpu=native for optimal codegen
- Replace nix crate clock_gettime wrapper with direct libc call
- Capture receive timestamp inside receive_blocking() immediately after
condvar wake, matching C measurement point
- Eliminate redundant zero-fill: vec![0u8;N] → Vec::with_capacity + set_len
- Replace per-byte ring buffer copies with copy_nonoverlapping in blocking path
- Add #[inline] hints to all hot-path SHM functions
- Server loop uses transport-captured timestamp when available
Reduces SHM direct mode mean latency by ~12.5% (23.4µs → 20.5µs),
narrowing gap vs reference C benchmark from 25% to ~10%.
Co-authored-by: Cursor <cursoragent@cursor.com>
* perf: clarify receive timestamp comment in SHM-direct
Update comment on the receive-side clock_gettime call to better
describe why it's captured inside the mutex (matches reference C
approach for accurate latency measurement).
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: add detailed PERF comments to all optimized code paths
Document the rationale behind each optimization with inline comments:
- .cargo/config.toml: explain target-cpu=native and portability note
- mod.rs: explain direct libc vs nix crate clock_gettime, receive_time_ns field
- shared_memory_direct.rs: send/receive timestamp placement, zero-fill elimination
- shared_memory_blocking.rs: bulk copy_nonoverlapping vs byte-by-byte with before/after
- shared_memory.rs: inline hints on ring buffer hot-path functions
- main.rs: transport-level timestamp preference in both server loops
Co-authored-by: Cursor <cursoragent@cursor.com>
* style: fix cargo fmt formatting issues
- Collapse short copy_nonoverlapping calls to single line in shared_memory_blocking.rs
- Remove extra blank line in shared_memory_direct.rs
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address PR review — conditional timestamp placement and clock_gettime error handling
Move SHM-direct receive timestamp inside/outside mutex based on
--send-delay: latency benchmarks (send-delay > 0) capture inside the
mutex for accuracy matching the reference C implementation; throughput
benchmarks (no send-delay) capture after mutex unlock to eliminate the
22-31% regression at small message sizes. The flag is derived
automatically with no new user-facing CLI options.
Add debug_assert! on all raw clock_gettime return values as cheap
insurance against silent failures.
Remove .cargo/config.toml (target-cpu=native) to restore binary
portability across CPU variants.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs+test: add precise_timestamps tests, CPU-optimized build docs, and target-cpu=native rationale
- Add 3 unit tests for BlockingSharedMemoryDirect::with_precise_timestamps():
constructor flag verification (true/false) and end-to-end receive with
precise_timestamps=true exercising the inside-mutex timestamp code path
- Add factory test verifying send_delay variants (None, ZERO, 10ms) are
accepted when creating SHM-direct transports
- Document SHM-direct conditional timestamp placement in README: adaptive
inside/outside-mutex receive timestamp based on --send-delay, with
latency vs throughput tradeoff explanation (22-31% regression context)
- Document CPU-optimized builds in README: rationale for removing
.cargo/config.toml (portability, CI cross-compilation risks across
NXP S32G/Qualcomm Ride SX4/Renesas R-Car S4), on-target builds with
RUSTFLAGS="-C target-cpu=native", per-platform cross-compile examples
- Update CONFIG.md Rust Compiler Optimizations section with callout
explaining why target-cpu=native must not be in repo-wide config,
add cross-compile example and link to README
- Fix pre-existing clippy lint: map_or -> is_some_and on send_delay wiring
- All tests passing, clippy clean, cargo fmt applied
AI-assisted-by: Claude Opus 4 (Anthropic)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>1 parent 50db84c commit a72bedb
8 files changed
Lines changed: 436 additions & 61 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
334 | 334 | | |
335 | 335 | | |
336 | 336 | | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
337 | 349 | | |
338 | | - | |
| 350 | + | |
339 | 351 | | |
340 | 352 | | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
341 | 357 | | |
342 | 358 | | |
343 | 359 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
208 | 218 | | |
209 | 219 | | |
210 | 220 | | |
| |||
358 | 368 | | |
359 | 369 | | |
360 | 370 | | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
361 | 402 | | |
362 | 403 | | |
363 | 404 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
477 | 477 | | |
478 | 478 | | |
479 | 479 | | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
480 | 487 | | |
481 | 488 | | |
482 | 489 | | |
| |||
813 | 820 | | |
814 | 821 | | |
815 | 822 | | |
816 | | - | |
817 | | - | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
818 | 828 | | |
819 | 829 | | |
820 | 830 | | |
| |||
1000 | 1010 | | |
1001 | 1011 | | |
1002 | 1012 | | |
1003 | | - | |
1004 | | - | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
1005 | 1018 | | |
1006 | 1019 | | |
1007 | 1020 | | |
| |||
1181 | 1194 | | |
1182 | 1195 | | |
1183 | 1196 | | |
1184 | | - | |
1185 | | - | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
1186 | 1202 | | |
1187 | 1203 | | |
1188 | 1204 | | |
| |||
0 commit comments