Skip to content

feat: venue decoder crates + SPSC benchmark overhaul#28

Merged
Milerius merged 38 commits intomainfrom
feat/venue-decoder-crates
Apr 7, 2026
Merged

feat: venue decoder crates + SPSC benchmark overhaul#28
Milerius merged 38 commits intomainfrom
feat/venue-decoder-crates

Conversation

@Milerius
Copy link
Copy Markdown
Owner

@Milerius Milerius commented Apr 7, 2026

Summary

  • Venue decoder crates: Add mantis-binance and mantis-polymarket crates with zero-alloc market data decoders
  • Fixed-point parsing: Add parse_decimal_bytes for zero-alloc hot-path price parsing
  • Transport refactor: Change feed callback from &str to &mut [u8] for in-place parsing
  • SPSC benchmark overhaul: Replace single-thread benchmarks with proper two-thread cross-core latency measurement
  • Queue performance optimizations: Colocated cache lines, bool API, BranchWrap, inline(always)

SPSC Benchmark Results (AMD Ryzen 7 PRO 8700GE, isolated cores)

Queue Language Mean (TSC cycles) Stability
Mantis SpscRingFast (BranchWrap) Rust 472 +/-8
Mantis SpscRing (Pow2Masked) Rust 554 +/-14
Mantis SpscRingCopy Rust 552 +/-11
rigtorp::SPSCQueue C++ 510 best / 1315 median +/-400
rtrb Rust 1090 best / 2345 median +/-750

At 3.65 GHz TSC, 472 cycles = 129 ns per cross-core message handoff.

Protocol: two threads on separate pinned cores, isolcpus, rdtsc with lfence, 48-byte messages, 1M operations, chrt -f 99, taskset -c 2,3. See benchmarks/RESULTS.md for full methodology.

Key optimizations that closed the Rust-C++ gap

  1. Colocated cache lines — head+tail_cached on same 64B line (~400 cycles saved)
  2. Bool APIpush() -> bool + pop(*mut T) -> bool instead of Result<T> (~500 cycles vs rtrb)
  3. &self shared referencespush_shared/pop_shared avoid LLVM noalias interference
  4. BranchWrap — branch-predictor-friendly index wrapping (~85 cycles vs AND mask)
  5. inline(always) + LTO — guaranteed cross-crate inlining of hot paths

Test plan

  • cargo +nightly test — workspace tests pass
  • cargo +nightly clippy --all-targets --features alloc,std -- -D warnings — clean
  • cargo +nightly fmt --all --check — formatted
  • cd benchmarks/rust && cargo +nightly build --release — standalone bench builds
  • Remaining criterion benchmarks (fixed, seqlock, market_state, decode) still run

🤖 Generated with Claude Code

Milerius and others added 10 commits April 6, 2026 17:21
…ion simulator

Replace "BBO cross = fill" with queue-aware fill model:
- SimulatedExchange with FIFO queue tracking per price level
- Latency injection (configurable profiles: optimistic/realistic/pessimistic)
- Partial fills (trade_qty minus queue_ahead)
- Cancel/repost loses queue priority
- Post-only safety check at arrival-time book state
- Markout tracking (adverse selection at 100ms/500ms/1s/5s)
- PnL breakdown: execution vs MTM vs rebate estimate (rebates separate)
- Two modes only: dry_run (realistic simulation) and live

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks parse_decimal_bytes against rust_decimal, fixed crate,
fast-float, lexical-core, and stdlib f64 parse. Tests Polymarket
short prices (0.53) and Binance medium prices (67396.70).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required for simd-json in-place parsing. read_text() becomes
read_bytes() writing into a reusable Vec<u8> (allocated once on IO
thread). Callback receives &mut [u8] — decoders can call
simd_json::from_slice directly without copying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add workspace dependency entries (serde, serde_json, simd-json, tracing)
and create the mantis-binance crate with zero-allocation BinanceBookTicker
serde schema and spawn stub.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zero-allocation decoder converting Binance bookTicker JSON into HotEvent
TopOfBook values using FixedI64 parse_decimal_bytes and InstrumentMeta
tick/lot conversion. Supports simd-json feature gate with serde_json
fallback. Includes 4 tests covering happy path, seq increment, malformed
input, and empty input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add mantis-polymarket workspace dependency and crate with zero-allocation
serde schema types for Polymarket book, price_change, and trade messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ice_change

Add registry-backed decoder that converts Polymarket WS messages into
HotEvent values with peek_type() for simd-json-safe type dispatch.
Includes 11 tests covering all message types, unknown tokens, and
malformed input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR removes legacy SPSC Criterion benchmarks and C++/Rust FFI contenders, adds a standalone cross-language benchmarks suite (Rust + C++), introduces new Binance and Polymarket decoder crates, changes transport WebSocket text APIs to a mutable byte-buffer API, adds byte-slice decimal parsing, and supplies many benchmark tooling scripts and CI/workflow adjustments.

Changes

Cohort / File(s) Summary
CI / Workflow
​.github/workflows/bench.yml, .github/scripts/bench-report.sh
Removed SPSC job steps and SPSC JSON inputs; bench-report invocation simplified to omit SPSC parameters; report-generation/normalization logic reduced to grouped renderings.
Workspace & top-level config
Cargo.toml, crates/bench/Cargo.toml
Excluded benchmarks/rust from workspace member resolution; added workspace crates/deps (mantis-binance, mantis-polymarket, serde*, simd-json, tracing, fast-float2, lexical-core); removed SPSC bench feature and replaced with decode.
Removed SPSC infra & FFI
crates/bench/benches/spsc.rs, crates/bench/src/messages.rs, crates/bench/src/workloads.rs, crates/bench/src/{drogalis_ffi.rs,rigtorp_ffi.rs}, crates/bench/cpp/*_ffi.cpp
Deleted legacy SPSC Criterion bench, message definitions, workloads, and C++/Rust FFI wrappers for rigtorp/drogalis.
New standalone benchmarks (Rust)
benchmarks/rust/Cargo.toml, benchmarks/rust/src/*, benchmarks/rust/src/queues/*
Added Rust SPSC bench crate: harness, Message48, rdtsc, CycleHistogram/stats, queue adapters (mantis-inline/mantis-copy/rtrb), and a CLI runner.
New standalone benchmarks (C++)
benchmarks/cpp/CMakeLists.txt, benchmarks/cpp/src/*, benchmarks/cpp/src/queues/*
Added C++ SPSC bench: CMake target, harness, Message48, rdtsc, CycleHistogram, and adapters for rigtorp/drogalis with CLI output JSON.
Bench tooling & scripts
benchmarks/README.md, benchmarks/.gitignore, benchmarks/scripts/*
Added README, .gitignore, and many scripts (run_bench, setup_machine, prepare_system, check_system, perf_*, compare.py, deploy_and_run, etc.).
Bench decoding & fixed-point benches
crates/bench/benches/decode.rs, crates/bench/benches/fixed.rs, crates/bench/build.rs
Added decode and fixed-point benchmarks; removed build logic for deleted C++ FFI contenders and adjusted features.
Binance decoder crate
crates/binance/Cargo.toml, crates/binance/src/{lib.rs,decoder.rs,schema.rs,spawn.rs}
New mantis-binance crate with zero-allocation schema and BinanceDecoder (optional simd-json); tests included.
Polymarket decoder crate
crates/polymarket/Cargo.toml, crates/polymarket/src/market/*, crates/polymarket/src/lib.rs
New mantis-polymarket crate with PolymarketMarketDecoder, schema, registry usage, and tests.
Transport API & read path changes
crates/transport/src/ws.rs, crates/transport/src/feed.rs, crates/transport/src/binance/reference.rs, crates/transport/src/polymarket/market.rs, crates/transport/tests/*
Switched text-based API (read_text / FnMut(&str)) to buffer-based API (read_bytes(&mut Vec<u8>) -> usize / FnMut(&mut [u8])); added reusable read buffer and updated tests and live logging.
Decimal parsing
crates/fixed/src/parse.rs
Added const parse_decimal_bytes(&[u8]) and made from_str_decimal delegate to it; extended tests.
Transport dependency tweak
crates/transport/Cargo.toml
Changed rustls feature from ring to aws_lc_rs.
Tests / logging adjustments
crates/transport/tests/live_feeds.rs, crates/transport/tests/feed_thread.rs
Updated test closures to accept &mut [u8] and improved live-test logging to print up to 120 UTF‑8 bytes or "<binary>".

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FeedThread
    participant WsConnection
    participant Decoder
    participant Results
    rect rgba(200,200,255,0.5)
    Client->>FeedThread: spawn(on_message: FnMut(&mut [u8]))
    end
    FeedThread->>WsConnection: read_bytes(&mut buffer)
    WsConnection-->>FeedThread: bytes_read (usize)
    alt bytes_read > 0
        FeedThread->>Decoder: pass &mut buffer (in-place parse)
        Decoder-->>FeedThread: events / continue? (bool)
        FeedThread->>Results: record histograms / write JSON
    else bytes_read == 0
        FeedThread-->>FeedThread: no-data / continue loop
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • brohamgoham

Poem

🐰 I hopped through crates and scripts tonight,

moved benches out where Rust and C++ write;
bytes now dance where strings once used to sing,
histograms hum and decoders bring spring.
Hooray — CI’s carrots gleam and CI bell rings!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: venue decoder crates + SPSC benchmark overhaul' accurately and concisely summarizes the main changes: adding venue decoder crates and overhauling the SPSC benchmark infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/venue-decoder-crates

Comment @coderabbitai help to get the list of available commands and usage tips.

Milerius and others added 2 commits April 7, 2026 08:44
…module

Add benchmarks/rust/ as a standalone binary crate (excluded from workspace)
with CycleHistogram (4096 direct + 20 log-bucketed overflow), percentile
computation, and serializable BenchResult/LatencyResults/SystemInfo structs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…re suite

Replace all single-threaded push+pop benchmarks with proper two-thread
latency measurement using core pinning and rdtsc timestamping.

- Add benchmarks/rust/: standalone Rust binary with mantis-inline,
  mantis-copy, and rtrb adapters, real histogram percentiles, JSON output
- Add benchmarks/cpp/: standalone C++ binary with native rigtorp and
  drogalis adapters (no FFI), same protocol and JSON schema
- Add benchmarks/scripts/: system validation (check_system, prepare_system),
  perf profiling (perf_profile, perf_cache, perf_branch, perf_flamegraph),
  orchestration (run_bench, setup_machine, deploy_and_run), and compare.py
- Remove crates/bench/benches/spsc.rs and all FFI contender code
- Remove crossbeam (MPMC) from SPSC comparisons

Addresses methodology flaws identified by HFT University article:
single-thread measurement, FFI-handicapped contenders, forced copies,
MPMC-vs-SPSC confusion, and Gaussian percentile approximation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

Benchmark Report

Commit: 47af874026ddb97aa862ef81bc5eb1a10c2a0974

Sequence Lock (mantis-seqlock)

Linux

CPU: AMD EPYC 7763 64-Core Processor | Arch: x86_64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

seqlock_read_contended

Variant ns/op
u64 38.91
msg64 405.86

seqlock_read_uncontended

Variant ns/op
u64 0.63
msg64 1.42
msg128 2.58

seqlock_write

Variant ns/op
u64 1.56
msg64 1.87
msg128 3.12
macOS

CPU: Apple M1 (Virtual) | Arch: arm64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

seqlock_read_contended

Variant ns/op
u64 77.22
msg64 82.35

seqlock_read_uncontended

Variant ns/op
u64 1.54
msg64 2.43
msg128 5.18

seqlock_write

Variant ns/op
u64 7.77
msg64 8.65
msg128 9.25

Fixed-Point Arithmetic (mantis-fixed)

Linux

CPU: AMD EPYC 7763 64-Core Processor | Arch: x86_64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

fixed_checked_add

Variant ns/op
FixedI64_6_ 1.25
raw_i64 1.25

fixed_checked_div

Variant ns/op
trunc 4.98
round 5.92

fixed_checked_mul_trunc

Variant ns/op
D=4 3.91
D=2 4.14
D=6 4.34
D=8 4.35

fixed_decimal_parse

Variant ns/op
f64_roundtrip_short_0.53 12.93
f64_roundtrip_medium_67396.70 15.29
mantis_bytes_short_0.53 20.65
mantis_str_short_0.53 20.65
mantis_bytes_integer_67396 21.27
mantis_bytes_medium_67396.70 30.74
mantis_str_medium_67396.70 30.75
mantis_bytes_long_0.00012345 35.27

fixed_display

Variant ns/op
FixedI64_6_ 49.43

fixed_mul_round_vs_trunc

Variant ns/op
trunc 4.35
round 13.41

fixed_parse

Variant ns/op
short 18.84
integer_only 24.86
full_precision 34.77

fixed_rescale

Variant ns/op
D2_to_D8_widen 0.63
D6_to_D2_trunc 0.92
macOS

CPU: Apple M1 (Virtual) | Arch: arm64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

fixed_checked_add

Variant ns/op
FixedI64_6_ 1.42
raw_i64 1.48

fixed_checked_div

Variant ns/op
trunc 4
round 4.78

fixed_checked_mul_trunc

Variant ns/op
D=8 2.08
D=6 2.32
D=4 2.5
D=2 3.1

fixed_decimal_parse

Variant ns/op
f64_roundtrip_short_0.53 10.2
f64_roundtrip_medium_67396.70 12.87
mantis_bytes_short_0.53 16
mantis_str_short_0.53 18.17
mantis_bytes_integer_67396 19.83
mantis_bytes_medium_67396.70 22.89
mantis_str_medium_67396.70 28.67
mantis_bytes_long_0.00012345 35.51

fixed_display

Variant ns/op
FixedI64_6_ 42.93

fixed_mul_round_vs_trunc

Variant ns/op
trunc 2.14
round 2.59

fixed_parse

Variant ns/op
short 14.74
integer_only 20.81
full_precision 29.91

fixed_rescale

Variant ns/op
D2_to_D8_widen 0.57
D6_to_D2_trunc 0.6

Market-State Engine (mantis-market-state)

Linux

CPU: AMD EPYC 7763 64-Core Processor | Arch: x86_64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

market_state_array_book

Variant ns/op
best_bid 0.94
apply_delta 1.87

market_state_engine

Variant ns/op
micro_price 3.13
process_delta_mid_batch 4.06
process_delta_batch_end 19.61
book_imbalance_5 162.92
macOS

CPU: Apple M1 (Virtual) | Arch: arm64 | Compiler: rustc 1.96.0-nightly (bcded3316 2026-04-06)

market_state_array_book

Variant ns/op
best_bid 0.8
apply_delta 1.49

market_state_engine

Variant ns/op
micro_price 2.49
process_delta_mid_batch 3.39
process_delta_batch_end 37.69
book_imbalance_5 147.92

@Milerius Milerius force-pushed the feat/venue-decoder-crates branch from 3d1fa1c to 37b8138 Compare April 7, 2026 06:44
Milerius and others added 3 commits April 7, 2026 08:47
These files were committed before the .gitignore rule was added.
Remove them from tracking — they remain on disk but are ignored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update fastrand 2.4.0 -> 2.4.1 (yanked crate)
- Fix live_feeds test: use str::from_utf8 for &mut [u8] callback Display
- Fix mantis-binance doc: remove intra-doc link to HotEvent::TopOfBook

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 92.48879% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.57%. Comparing base (8cf8901) to head (1c472ad).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
crates/polymarket/src/market/decoder.rs 94.97% 17 Missing and 6 partials ⚠️
crates/queue/src/handle.rs 0.00% 18 Missing ⚠️
crates/queue/src/copy_ring/mod.rs 0.00% 6 Missing ⚠️
crates/queue/src/presets.rs 0.00% 6 Missing ⚠️
crates/binance/src/decoder.rs 97.31% 5 Missing ⚠️
crates/platform/src/pad.rs 76.47% 3 Missing and 1 partial ⚠️
crates/transport/src/ws.rs 70.00% 3 Missing ⚠️
crates/queue/src/engine.rs 98.80% 1 Missing ⚠️
crates/transport/src/polymarket/market.rs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
- Coverage   95.84%   95.57%   -0.27%     
==========================================
  Files          75       77       +2     
  Lines        9183    10009     +826     
  Branches     9183    10009     +826     
==========================================
+ Hits         8801     9566     +765     
- Misses        338      393      +55     
- Partials       44       50       +6     
Files with missing lines Coverage Δ
crates/core/src/lib.rs 91.54% <100.00%> (+1.71%) ⬆️
crates/fixed/src/parse.rs 94.22% <100.00%> (+1.24%) ⬆️
crates/queue/src/copy_ring/engine.rs 99.68% <100.00%> (ø)
crates/queue/src/copy_ring/raw/mod.rs 95.45% <ø> (ø)
crates/queue/src/raw/mod.rs 100.00% <100.00%> (ø)
crates/queue/src/raw/slot.rs 100.00% <100.00%> (ø)
crates/queue/src/storage.rs 89.47% <ø> (ø)
crates/transport/src/binance/reference.rs 91.48% <100.00%> (ø)
crates/transport/src/feed.rs 89.14% <100.00%> (+0.08%) ⬆️
crates/queue/src/engine.rs 99.45% <98.80%> (-0.55%) ⬇️
... and 8 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Milerius and others added 2 commits April 7, 2026 09:01
ring 0.17 fails to compile on nightly + macOS Xcode 16.4. The new
venue decoder crates (binance, polymarket) pull ring into the bench
crate transitively. Switch to aws-lc-rs which compiles cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead SPSC normalize/render functions from bench-report.sh
- Update positional args (no longer expects spsc report as $1/$2)
- Fix bench.yml to pass correct args to bench-report.sh
- Remove stale comment in criterion-to-json.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/transport/src/polymarket/market.rs (1)

3-5: ⚠️ Potential issue | 🟡 Minor

Update docs to match byte-slice callback API.

The docs still describe “text frames,” but the callback now receives &mut [u8]. Please update wording to avoid API confusion.

✏️ Suggested doc fix
-//! forwards raw JSON text frames via the `FeedThread` callback.
+//! forwards raw JSON byte frames via the `FeedThread` callback.
@@
-/// The callback receives each raw JSON text frame (excluding `"PONG"`
+/// The callback receives each raw JSON byte frame (excluding `"PONG"`

Also applies to: 41-43

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/transport/src/polymarket/market.rs` around lines 3 - 5, Update the
module docs in crates/transport/src/polymarket/market.rs to describe the
callback API as receiving raw byte slices rather than “text frames”: replace
references to “text frames” with wording like “raw JSON byte slices (&mut [u8])”
and mention that the FeedThread callback is invoked with a mutable byte slice
containing the JSON payload; also apply the same wording change to the other doc
occurrences around lines 41-43 to avoid API confusion.
🧹 Nitpick comments (10)
crates/fixed/src/parse.rs (1)

93-135: Orphaned documentation block.

Lines 93-112 contain documentation that was originally for from_str_decimal, but this doc block is now orphaned — it precedes the new parse_decimal_bytes documentation (lines 113-135) and doesn't attach to any function. The from_str_decimal function at line 270 has its own documentation.

Consider removing lines 93-112 to avoid documentation duplication and confusion.

♻️ Proposed fix
 #[expect(
     clippy::cast_possible_truncation,
     reason = "i128-to-i64 cast is guarded by range check"
 )]
 impl<const D: u8> FixedI64<D> {
-    /// Parse a decimal string like `"1.23"` or `"-0.5"` into `FixedI64<D>`.
-    ///
-    /// Accepts:
-    /// - Optional sign prefix (`-` or `+`)
-    /// - Integer-only: `"123"`
-    /// - Decimal: `"1.23"`, `".5"`, `"0.5"`
-    /// - Leading zeros: `"007.50"`
-    ///
-    /// Rejects:
-    /// - Empty strings
-    /// - Exponent notation (`"1e5"`)
-    /// - Double dots (`"1..2"`)
-    /// - Non-digit characters
-    /// - Fractional digits exceeding D
-    ///
-    /// # Errors
-    ///
-    /// Returns [`ParseFixedError::InvalidFormat`] for malformed input,
-    /// [`ParseFixedError::Overflow`] if the value exceeds `i64` range,
-    /// or [`ParseFixedError::ExcessPrecision`] if fractional digits exceed D.
     /// Parse a decimal byte slice like `b"1.23"` or `b"-0.5"` into `FixedI64<D>`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/fixed/src/parse.rs` around lines 93 - 135, Remove the
duplicated/orphaned doc block that describes from_str_decimal which currently
sits immediately before the parse_decimal_bytes docs (the earlier block that
starts the duplicate description); delete that orphaned documentation so only
the parse_decimal_bytes doc (and the existing from_str_decimal doc attached to
the from_str_decimal function) remain, ensuring no duplicate or unattached docs
are left around parse_decimal_bytes or from_str_decimal.
benchmarks/scripts/prepare_system.sh (2)

87-88: Avoid variables in printf format string.

Using variables directly in the printf format string can cause issues if they contain % characters. Use %b to interpret escape sequences safely.

🧹 Proposed fix
-printf "\n${GRN}System prepared for benchmarking.${RST}\n"
+printf "\n%bSystem prepared for benchmarking.%b\n" "$GRN" "$RST"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/scripts/prepare_system.sh` around lines 87 - 88, The printf calls
printing colored status messages use variables directly in the format string
which can break if they contain % sequences; update the two printf invocations
(the lines that currently call printf "\n${GRN}System prepared for
benchmarking.${RST}\n" and printf "Run ./check_system.sh to verify.\n") to use a
safe format specifier (e.g. "%b\n" or "%s\n") and pass the composed string or
variables as arguments so escape sequences and % chars are interpreted safely
rather than treated as format directives.

18-21: Remove unused RED variable.

The RED variable is defined but never used in the script. This is flagged by shellcheck (SC2034).

🧹 Proposed fix
-RED='\033[0;31m'
 GRN='\033[0;32m'
 BLD='\033[1m'
 RST='\033[0m'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/scripts/prepare_system.sh` around lines 18 - 21, Remove the unused
RED variable definition: delete the RED='\033[0;31m' line (symbol RED) from the
script so only the used color variables (GRN, BLD, RST) remain; confirm there
are no other references to RED elsewhere in the script and run shellcheck to
verify SC2034 is resolved.
benchmarks/rust/src/queues/mantis_inline.rs (1)

69-74: Match the empty-pop error explicitly.

Err(_) => false treats every try_pop failure as "queue empty". If mantis_queue already has, or later adds, a non-empty error case, the harness will spin forever instead of surfacing it.

As per coding guidelines: "No wildcard matches — use explicit destructuring in pattern matching".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/rust/src/queues/mantis_inline.rs` around lines 69 - 74, The match
on self.0.try_pop() uses a wildcard Err(_) which hides non-empty errors; change
it to explicitly match the queue-empty variant (e.g., Err(<EmptyVariant>) =>
false) and handle any other Err(e) separately (for example by panicking or
returning/propagating the error) so unexpected errors are surfaced. Update the
match arms around the call to self.0.try_pop() to use the concrete empty variant
from the mantis_queue error enum and a separate Err(e) arm that
logs/panics/returns the error.
crates/polymarket/src/market/schema.rs (1)

1-1: Doc comment slightly overstates "zero-allocation".

The Vec<BookLevel<'a>> fields in PolymarketBookMsg allocate heap memory for the level vectors. Consider updating to "low-allocation" or "zero-copy string" to be more precise, since the borrowed &'a str fields do avoid per-field string allocations.

📝 Suggested doc update
-//! Zero-allocation serde structs for Polymarket WebSocket messages.
+//! Low-allocation serde structs for Polymarket WebSocket messages.
+//!
+//! String fields borrow from the input buffer; only `Vec` fields allocate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/schema.rs` at line 1, The crate doc comment
claims "Zero-allocation serde structs", but PolymarketBookMsg contains
Vec<BookLevel<'a>> which allocates heap memory; change the module-level doc to a
more accurate phrase like "low-allocation" or "zero-copy string serde structs"
and mention that fields use borrowed & 'a str to avoid per-field string
allocations; update references to PolymarketBookMsg and BookLevel<'a> in the
comment to reflect this more precise wording.
benchmarks/cpp/src/harness.hpp (1)

75-106: consumer_done atomic is set but never read.

The consumer_done flag is stored on line 106 but never checked. The consumer_thread.join() on line 130 already ensures the consumer completes. Consider removing consumer_done to simplify the code.

🧹 Suggested cleanup
     CycleHistogram result_hist;
-    std::atomic<bool> consumer_done{false};

     // Spawn consumer thread
     std::thread consumer_thread([&]() {
         // ... existing code ...

         result_hist = histogram;
-        consumer_done.store(true, std::memory_order_release);
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/src/harness.hpp` around lines 75 - 106, The consumer_done
atomic is written by the consumer thread but never read anywhere; remove the
unused variable and its stores to simplify the logic: delete the
std::atomic<bool> consumer_done declaration and remove the
consumer_done.store(true, std::memory_order_release) call in the consumer thread
body (leave consumer_thread.join() and existing result_hist assignment intact),
ensuring no other code references consumer_done before committing.
benchmarks/cpp/src/main.cpp (2)

56-60: atoi/atoll silently return 0 on invalid input.

Invalid numeric arguments (e.g., --messages abc) will be silently accepted as 0, which could lead to confusing benchmark behavior. Consider using strtoul/strtoull with error checking, or at minimum validate that parsed values are positive where expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/src/main.cpp` around lines 56 - 60, The parsing of numeric CLI
args (when match(...) and next() set args.producer_core, args.consumer_core,
args.messages, args.warmup, args.runs) uses atoi/atoll which silently yields 0
on invalid input; replace these with strtoul/strtoull calls, capture the endptr
and errno, validate that endptr consumed the whole token and the result is in
the expected range (>0 for counts/cores), and on error emit a clear message and
exit or set a safe default. Ensure you cast the validated unsigned long/unsigned
long long to the target types (unsigned / uint64_t) only after checking for
overflow/underflow.

91-97: Missing p99.99 percentile in output.

The summary prints p999 (99.9th percentile) but the comment mentions "percentile computation via histograms" with p99.99. Consider adding the p99.99 percentile for consistency with typical latency reporting, or clarify if this is intentional.

📊 Suggested enhancement
-    fprintf(stderr, "  run %u/%u: p50=%lu p99=%lu p999=%lu max=%lu mean=%.1f cycles/op\n",
+    fprintf(stderr, "  run %u/%u: p50=%lu p99=%lu p999=%lu p9999=%lu max=%lu mean=%.1f cycles/op\n",
             run, args.runs,
             static_cast<unsigned long>(hist.percentile(50.0)),
             static_cast<unsigned long>(hist.percentile(99.0)),
             static_cast<unsigned long>(hist.percentile(99.9)),
+            static_cast<unsigned long>(hist.percentile(99.99)),
             static_cast<unsigned long>(hist.max()),
             hist.mean());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/src/main.cpp` around lines 91 - 97, The fprintf summary
currently prints p50/p99/p999/max/mean using the fprintf call that references
hist.percentile(50.0), hist.percentile(99.0), hist.percentile(99.9), hist.max(),
and hist.mean(); update that fprintf invocation to include the 99.99th
percentile by calling hist.percentile(99.99) and adding a corresponding format
token and label (e.g., add "p99.99=%lu" or rename labels to clarify p99.9 vs
p99.99) so the arguments list includes static_cast<unsigned
long>(hist.percentile(99.99)) alongside the existing percentile casts (still
using run and args.runs, hist.max(), hist.mean()).
benchmarks/scripts/run_bench.sh (1)

29-39: Argument parsing can fail silently on missing values at end of args.

If the last argument is a flag like --messages without a value, shift 2 will fail due to set -u, but the error message from next() won't be shown since this is bash, not the C++ parser. This is acceptable given set -euo pipefail, but consider adding validation after parsing to ensure numeric args are valid.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/scripts/run_bench.sh` around lines 29 - 39, The argument parsing
loop can leave flags like --messages without values (causing silent failures
under set -euo pipefail); after the while/case block, add explicit validation
for the parsed variables (PRODUCER_CORE, CONSUMER_CORE, MESSAGES, WARMUP, RUNS)
to ensure they are set and contain valid integers (e.g., non-empty and matching
/^[0-9]+$/), and call usage (the existing usage function) with an error if any
check fails; also consider validating that no trailing single flag remained by
checking "$#" or the raw args before shifting to surface a clear error message
for missing values.
crates/bench/benches/decode.rs (1)

52-65: Consider adding throughput measurement for better cross-hardware comparison.

Per coding guidelines, benchmarks should export JSON for cross-hardware comparison. While Criterion handles JSON export, adding throughput annotations would make comparisons more meaningful across different hardware.

📈 Suggested enhancement
 fn bench_binance_decode(c: &mut Criterion) {
     let mut group = c.benchmark_group("decode/binance_book_ticker");
+    let json_size = binance_json().len();
+    group.throughput(criterion::Throughput::Bytes(json_size as u64));

     group.bench_function("full_pipeline", |b| {

Based on learnings: "All benchmarks must export JSON for cross-hardware comparison"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/bench/benches/decode.rs` around lines 52 - 65, The benchmark lacks a
throughput annotation for cross-hardware comparison; update bench_binance_decode
to call group.throughput(...) before group.bench_function, using an appropriate
Throughput unit (e.g., Throughput::Bytes with the JSON buffer size or
Throughput::Elements for per-item work) computed from binance_json() (or a
constant representing its length), and ensure the Throughput type is imported so
Criterion will include throughput in its JSON export when running the
full_pipeline benchmark.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 22-39: The overflow bucket upper bound currently returns the
exclusive top of the bucket (and overshoots by one), so update
overflow_upper_bound to return the inclusive upper bound (i.e. (DIRECT_BUCKETS
<< (i+1)) - 1) for i < LOG_BUCKETS-1 and still return
numeric_limits<uint64_t>::max() for the final bucket; keep using DIRECT_BUCKETS
and LOG_BUCKETS to locate the calculation. Also review log_bucket usage to
ensure it is only called for values >= 4096 (shifted != 0) and leave its
bit-computation logic as-is; this guarantees percentiles coming from overflow_
report the bucket’s true inclusive top rather than an off-by-one/exclusive
value.
- Around line 3-10: Add the missing header for std::min: include <algorithm> at
the top of the file (stats.hpp) so std::min is available; update the existing
include block that currently has <array>, <cmath>, <cstdint>, <cstdio>,
<fstream>, <limits>, <sstream>, <string> to also include <algorithm> to avoid
relying on transitive includes.

In `@benchmarks/README.md`:
- Around line 97-103: The README example for cache profiling is missing the
required benchmark command argument for scripts/perf_cache.sh; update the second
code block so the perf_cache.sh invocation includes the benchmark command
placeholder (e.g., append " -- <bench-command...>" or a concrete example like "
-- ./scripts/bench.sh" after the flags) so that scripts/perf_cache.sh receives
the expected <bench-command...>; reference the scripts/perf_cache.sh invocation
in the README and mirror the same structure used in the perf_profile.sh example.

In `@benchmarks/rust/src/harness.rs`:
- Around line 13-20: The pin_to_core function should be rewritten to return
Result<(), String> and use let...else (or ok_or/ok_or_else) to early-return on
failures when calling core_affinity::get_core_ids and finding the target core,
and must check the bool result of core_affinity::set_for_current and return an
Err if it fails; update callers (where consumer and producer threads call
pin_to_core) to propagate or handle the Result (do not silently ignore
failures), and add a validation check where producer_core and consumer_core are
parsed/used to ensure they are distinct and return an error if they are equal so
cross-core measurements are not run with the same core IDs.

In `@benchmarks/rust/src/rdtsc.rs`:
- Around line 7-10: The unsafe block calling core::arch::x86_64::_mm_lfence()
and core::arch::x86_64::_rdtsc() lacks the required safety rationale; add a
single-line comment starting with "// SAFETY:" immediately above the unsafe
block that documents the invariants (this code relies on x86_64 architecture and
availability of RDTSC/LFENCE, requires caller to run on that ISA, and that these
intrinsics do not violate memory or UB assumptions), what guarantee is being
upheld (ordering and timestamp read semantics), and the failure mode if the
invariant is not met (undefined behavior or incorrect timing on non-x86_64).
Ensure the comment references the intrinsics _mm_lfence and _rdtsc so reviewers
can map the rationale to the unsafe calls.

In `@benchmarks/rust/src/stats.rs`:
- Around line 56-83: The percentile() routine currently returns the
exclusive/log-bucket upper bound (via overflow_upper_bound) which overstates
values (e.g. 5000→8192) and is off-by-one for documented buckets; update
percentile() so that when walking overflow (the overflow array / LOG_BUCKETS) it
returns the inclusive upper bound (overflow_upper_bound(i) - 1) and clamp that
value by self.max (use saturating_sub/min to avoid underflow), and similarly
apply the same inclusive-bound logic to the other percentile code path noted
(the duplicate at lines ~115-122) so exported tail latencies reflect the real
bucket value range (e.g., bucket 0 -> 4096..8191).

In `@benchmarks/scripts/compare.py`:
- Around line 28-33: group_by_implementation currently groups solely by
r["implementation"] which can mix runs with different metadata; update it to
either validate that key config fields (e.g., r["language"], r["capacity"],
r["message_size"], r["pinning"], r["host_cpu"] or similar keys present in
results) are identical for all entries in a group and raise an error if they
mismatch, or include those fields in the grouping key so runs are only collapsed
when all relevant config fields match; also apply the same validation/adjustment
where print_system_info or any other grouping logic aggregates by implementation
to ensure the “Runs per implementation” and reported CPU/core/message metadata
correspond to homogeneous benchmark configs.

In `@benchmarks/scripts/deploy_and_run.sh`:
- Around line 70-73: The current rsync pulls all remote results into a single
directory and causes overwrites; update the target path to include a
host-specific namespace by creating and syncing into a host subdirectory (use
BENCH_DIR, REMOTE_HOST and REMOTE_DIR from the script) — e.g., ensure mkdir -p
"$BENCH_DIR/results/$REMOTE_HOST" and change the rsync destination to
"$BENCH_DIR/results/$REMOTE_HOST/" (alternatively, prepend REMOTE_HOST to
artifact filenames on the remote side before pulling) so each host’s results are
isolated.
- Around line 63-65: The remote invocation currently collapses BENCH_ARGS into a
single string causing re-splitting and injection; change the call so you do not
expand BENCH_ARGS into the remote shell string but instead stream/exec the
run_bench.sh on the remote host and pass BENCH_ARGS as separate positional
parameters so they arrive unchanged. Concretely, stop using "${BENCH_ARGS[*]}"
in the ssh command that references REMOTE_HOST and REMOTE_DIR and instead invoke
the remote bash with the script fed via stdin (or executed remotely) while
supplying BENCH_ARGS as separate arguments so run_bench.sh receives them as
"$@"; ensure the remote invocation uses the standard "--" convention to
terminate options and still performs the cd/chmod steps before running
run_bench.sh.

In `@benchmarks/scripts/run_bench.sh`:
- Around line 117-120: Add an existence check for the compare.py script before
invoking it: test for the file at "$SCRIPT_DIR/compare.py" (using -f or -x as
appropriate) and if missing print a clear error to STDERR and exit non‑zero;
otherwise run python3 "$SCRIPT_DIR/compare.py" "$RESULTS_DIR" | tee
"$RESULTS_DIR/comparison.md". Reference the existing variables SCRIPT_DIR and
RESULTS_DIR and the compare.py filename so the check is colocated with the
current Step 7 invocation.

In `@benchmarks/scripts/setup_machine.sh`:
- Around line 11-25: The install_packages function currently omits git and curl
so subsequent curl and git clone calls will fail; update install_packages to
include git and curl in the package lists for each package manager branch
(apt-get: add git and curl to the sudo apt-get install line and linux-tools
list, dnf: add git and curl to sudo dnf install, pacman: add git and curl to
sudo pacman -Sy), and also update the fallback warning to mention git and curl
as required tools so a manual install reminder is accurate.

In `@crates/polymarket/src/market/decoder.rs`:
- Around line 61-65: The dispatcher currently treats "tick_size_change" as
unsupported (falls to _ => 0) so tick-size updates never reach the decoder; add
a branch for "tick_size_change" that calls a new handler (e.g.
self.decode_tick_size_change(buf, recv_ts, out)) in the same match where
decode_price_change/decode_trade/decode_book are called, and mirror that change
in the other dispatcher instance referenced around lines 516-521. Implement
decode_tick_size_change to parse the tick-size payload and update the instrument
metadata/state used by price_to_ticks/qty_to_lots (or notify the instrument
registry), ensuring subsequent conversions use the new tick size. Use the
existing function names (decode_price_change, decode_trade, decode_book) as
references when adding the "tick_size_change" branch and implement a
decode_tick_size_change method that updates the shared instrument map/state.
- Around line 170-245: The code currently computes `total` from raw JSON levels
and sets EventFlags::LAST_IN_BATCH inside the `for` loops using that raw count,
which misses cases where `parse_price_qty::<D>` continues or emission stops at
64; instead, stop setting LAST_IN_BATCH when constructing each
`HotEvent::book_delta` (only use EventFlags::IS_SNAPSHOT there), and after both
loops, if `count > 0` set the LAST_IN_BATCH flag on the last emitted event (e.g.
mutate or rebuild `out[count-1]` to OR in EventFlags::LAST_IN_BATCH); refer to
`parse_price_qty::<D>`, `HotEvent::book_delta`, `EventFlags::LAST_IN_BATCH`,
`out`, and `count` to locate and implement this change.

---

Outside diff comments:
In `@crates/transport/src/polymarket/market.rs`:
- Around line 3-5: Update the module docs in
crates/transport/src/polymarket/market.rs to describe the callback API as
receiving raw byte slices rather than “text frames”: replace references to “text
frames” with wording like “raw JSON byte slices (&mut [u8])” and mention that
the FeedThread callback is invoked with a mutable byte slice containing the JSON
payload; also apply the same wording change to the other doc occurrences around
lines 41-43 to avoid API confusion.

---

Nitpick comments:
In `@benchmarks/cpp/src/harness.hpp`:
- Around line 75-106: The consumer_done atomic is written by the consumer thread
but never read anywhere; remove the unused variable and its stores to simplify
the logic: delete the std::atomic<bool> consumer_done declaration and remove the
consumer_done.store(true, std::memory_order_release) call in the consumer thread
body (leave consumer_thread.join() and existing result_hist assignment intact),
ensuring no other code references consumer_done before committing.

In `@benchmarks/cpp/src/main.cpp`:
- Around line 56-60: The parsing of numeric CLI args (when match(...) and next()
set args.producer_core, args.consumer_core, args.messages, args.warmup,
args.runs) uses atoi/atoll which silently yields 0 on invalid input; replace
these with strtoul/strtoull calls, capture the endptr and errno, validate that
endptr consumed the whole token and the result is in the expected range (>0 for
counts/cores), and on error emit a clear message and exit or set a safe default.
Ensure you cast the validated unsigned long/unsigned long long to the target
types (unsigned / uint64_t) only after checking for overflow/underflow.
- Around line 91-97: The fprintf summary currently prints p50/p99/p999/max/mean
using the fprintf call that references hist.percentile(50.0),
hist.percentile(99.0), hist.percentile(99.9), hist.max(), and hist.mean();
update that fprintf invocation to include the 99.99th percentile by calling
hist.percentile(99.99) and adding a corresponding format token and label (e.g.,
add "p99.99=%lu" or rename labels to clarify p99.9 vs p99.99) so the arguments
list includes static_cast<unsigned long>(hist.percentile(99.99)) alongside the
existing percentile casts (still using run and args.runs, hist.max(),
hist.mean()).

In `@benchmarks/rust/src/queues/mantis_inline.rs`:
- Around line 69-74: The match on self.0.try_pop() uses a wildcard Err(_) which
hides non-empty errors; change it to explicitly match the queue-empty variant
(e.g., Err(<EmptyVariant>) => false) and handle any other Err(e) separately (for
example by panicking or returning/propagating the error) so unexpected errors
are surfaced. Update the match arms around the call to self.0.try_pop() to use
the concrete empty variant from the mantis_queue error enum and a separate
Err(e) arm that logs/panics/returns the error.

In `@benchmarks/scripts/prepare_system.sh`:
- Around line 87-88: The printf calls printing colored status messages use
variables directly in the format string which can break if they contain %
sequences; update the two printf invocations (the lines that currently call
printf "\n${GRN}System prepared for benchmarking.${RST}\n" and printf "Run
./check_system.sh to verify.\n") to use a safe format specifier (e.g. "%b\n" or
"%s\n") and pass the composed string or variables as arguments so escape
sequences and % chars are interpreted safely rather than treated as format
directives.
- Around line 18-21: Remove the unused RED variable definition: delete the
RED='\033[0;31m' line (symbol RED) from the script so only the used color
variables (GRN, BLD, RST) remain; confirm there are no other references to RED
elsewhere in the script and run shellcheck to verify SC2034 is resolved.

In `@benchmarks/scripts/run_bench.sh`:
- Around line 29-39: The argument parsing loop can leave flags like --messages
without values (causing silent failures under set -euo pipefail); after the
while/case block, add explicit validation for the parsed variables
(PRODUCER_CORE, CONSUMER_CORE, MESSAGES, WARMUP, RUNS) to ensure they are set
and contain valid integers (e.g., non-empty and matching /^[0-9]+$/), and call
usage (the existing usage function) with an error if any check fails; also
consider validating that no trailing single flag remained by checking "$#" or
the raw args before shifting to surface a clear error message for missing
values.

In `@crates/bench/benches/decode.rs`:
- Around line 52-65: The benchmark lacks a throughput annotation for
cross-hardware comparison; update bench_binance_decode to call
group.throughput(...) before group.bench_function, using an appropriate
Throughput unit (e.g., Throughput::Bytes with the JSON buffer size or
Throughput::Elements for per-item work) computed from binance_json() (or a
constant representing its length), and ensure the Throughput type is imported so
Criterion will include throughput in its JSON export when running the
full_pipeline benchmark.

In `@crates/fixed/src/parse.rs`:
- Around line 93-135: Remove the duplicated/orphaned doc block that describes
from_str_decimal which currently sits immediately before the parse_decimal_bytes
docs (the earlier block that starts the duplicate description); delete that
orphaned documentation so only the parse_decimal_bytes doc (and the existing
from_str_decimal doc attached to the from_str_decimal function) remain, ensuring
no duplicate or unattached docs are left around parse_decimal_bytes or
from_str_decimal.

In `@crates/polymarket/src/market/schema.rs`:
- Line 1: The crate doc comment claims "Zero-allocation serde structs", but
PolymarketBookMsg contains Vec<BookLevel<'a>> which allocates heap memory;
change the module-level doc to a more accurate phrase like "low-allocation" or
"zero-copy string serde structs" and mention that fields use borrowed & 'a str
to avoid per-field string allocations; update references to PolymarketBookMsg
and BookLevel<'a> in the comment to reflect this more precise wording.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5611b1f-af32-4fe6-ba47-6f3f631d6f80

📥 Commits

Reviewing files that changed from the base of the PR and between d48029a and 991982e.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • benchmarks/rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (62)
  • .github/workflows/bench.yml
  • Cargo.toml
  • benchmarks/.gitignore
  • benchmarks/README.md
  • benchmarks/cpp/CMakeLists.txt
  • benchmarks/cpp/src/harness.hpp
  • benchmarks/cpp/src/main.cpp
  • benchmarks/cpp/src/message.hpp
  • benchmarks/cpp/src/queues/drogalis.hpp
  • benchmarks/cpp/src/queues/rigtorp.hpp
  • benchmarks/cpp/src/rdtsc.hpp
  • benchmarks/cpp/src/stats.hpp
  • benchmarks/results/.gitkeep
  • benchmarks/rust/Cargo.toml
  • benchmarks/rust/src/harness.rs
  • benchmarks/rust/src/main.rs
  • benchmarks/rust/src/message.rs
  • benchmarks/rust/src/queues/mantis_copy.rs
  • benchmarks/rust/src/queues/mantis_inline.rs
  • benchmarks/rust/src/queues/mod.rs
  • benchmarks/rust/src/queues/rtrb_queue.rs
  • benchmarks/rust/src/rdtsc.rs
  • benchmarks/rust/src/stats.rs
  • benchmarks/scripts/check_system.sh
  • benchmarks/scripts/compare.py
  • benchmarks/scripts/deploy_and_run.sh
  • benchmarks/scripts/perf_branch.sh
  • benchmarks/scripts/perf_cache.sh
  • benchmarks/scripts/perf_flamegraph.sh
  • benchmarks/scripts/perf_profile.sh
  • benchmarks/scripts/prepare_system.sh
  • benchmarks/scripts/run_bench.sh
  • benchmarks/scripts/setup_machine.sh
  • crates/bench/Cargo.toml
  • crates/bench/benches/decode.rs
  • crates/bench/benches/fixed.rs
  • crates/bench/benches/spsc.rs
  • crates/bench/build.rs
  • crates/bench/cpp/drogalis_ffi.cpp
  • crates/bench/cpp/rigtorp_ffi.cpp
  • crates/bench/src/drogalis_ffi.rs
  • crates/bench/src/lib.rs
  • crates/bench/src/messages.rs
  • crates/bench/src/rigtorp_ffi.rs
  • crates/bench/src/workloads.rs
  • crates/binance/Cargo.toml
  • crates/binance/src/decoder.rs
  • crates/binance/src/lib.rs
  • crates/binance/src/schema.rs
  • crates/binance/src/spawn.rs
  • crates/fixed/src/parse.rs
  • crates/polymarket/Cargo.toml
  • crates/polymarket/src/lib.rs
  • crates/polymarket/src/market/decoder.rs
  • crates/polymarket/src/market/mod.rs
  • crates/polymarket/src/market/schema.rs
  • crates/polymarket/src/market/spawn.rs
  • crates/transport/src/binance/reference.rs
  • crates/transport/src/feed.rs
  • crates/transport/src/polymarket/market.rs
  • crates/transport/src/ws.rs
  • crates/transport/tests/feed_thread.rs
💤 Files with no reviewable changes (9)
  • crates/bench/build.rs
  • crates/bench/src/lib.rs
  • crates/bench/src/messages.rs
  • crates/bench/src/rigtorp_ffi.rs
  • crates/bench/src/drogalis_ffi.rs
  • crates/bench/cpp/rigtorp_ffi.cpp
  • crates/bench/src/workloads.rs
  • crates/bench/benches/spsc.rs
  • crates/bench/cpp/drogalis_ffi.cpp

Comment thread benchmarks/cpp/src/stats.hpp
Comment thread benchmarks/cpp/src/stats.hpp
Comment thread benchmarks/README.md
Comment thread benchmarks/rust/src/harness.rs Outdated
Comment on lines +13 to +20
fn pin_to_core(core_id: usize) {
let core_ids = core_affinity::get_core_ids().expect("failed to get core ids");
let target = core_ids
.into_iter()
.find(|c| c.id == core_id)
.unwrap_or_else(|| panic!("core {core_id} not found"));
core_affinity::set_for_current(target);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n benchmarks/rust/src/harness.rs | head -100

Repository: Milerius/Mantis

Length of output: 3634


🏁 Script executed:

rg "core_affinity" -A 3 -B 3 benchmarks/rust/

Repository: Milerius/Mantis

Length of output: 2074


🏁 Script executed:

# Check Cargo.toml to see the core_affinity version
find benchmarks/rust -name "Cargo.toml" -exec cat {} \;

Repository: Milerius/Mantis

Length of output: 433


🌐 Web query:

core_affinity 0.8 set_for_current return type rust

💡 Result:

In core_affinity 0.8.x, set_for_current returns a bool:

pub fn set_for_current(core_id: CoreId) -> bool

It returns false if setting the current thread’s CPU affinity fails. [1]

Source: [1]


Use let...else for early returns and handle set_for_current failure.

Line 19 ignores the bool return value of core_affinity::set_for_current, which fails silently. If affinity setup fails in the consumer thread (line 50), the thread is not actually pinned, but continues to signal readiness at line 58, causing the producer to proceed with unreliable cross-core latency measurements. Additionally, there is no validation that producer_core and consumer_core are distinct, so identical core IDs can also produce misleading results. Rewrite pin_to_core using let...else for early returns per Rust guidelines:

Suggested pattern
fn pin_to_core(core_id: usize) -> Result<(), String> {
    let core_ids = core_affinity::get_core_ids()
        .ok_or("failed to get core ids")?;
    let target = core_ids
        .into_iter()
        .find(|c| c.id == core_id)
        .ok_or_else(|| format!("core {core_id} not found"))?;
    let _ = core_affinity::set_for_current(target)
        .then_some(())
        .ok_or("failed to set core affinity")?;
    Ok(())
}

Then propagate errors in the consumer (line 50) and producer (line 83) call sites, and add validation that producer and consumer cores are distinct (line 27).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/rust/src/harness.rs` around lines 13 - 20, The pin_to_core
function should be rewritten to return Result<(), String> and use let...else (or
ok_or/ok_or_else) to early-return on failures when calling
core_affinity::get_core_ids and finding the target core, and must check the bool
result of core_affinity::set_for_current and return an Err if it fails; update
callers (where consumer and producer threads call pin_to_core) to propagate or
handle the Result (do not silently ignore failures), and add a validation check
where producer_core and consumer_core are parsed/used to ensure they are
distinct and return an error if they are equal so cross-core measurements are
not run with the same core IDs.

Comment thread benchmarks/rust/src/rdtsc.rs Outdated
Comment on lines +70 to +73
mkdir -p "$BENCH_DIR/results"
rsync -avz \
"$REMOTE_HOST:$REMOTE_DIR/results/" \
"$BENCH_DIR/results/"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n benchmarks/scripts/deploy_and_run.sh | head -100

Repository: Milerius/Mantis

Length of output: 3324


🏁 Script executed:

cat -n benchmarks/scripts/run_bench.sh | head -80

Repository: Milerius/Mantis

Length of output: 3259


🏁 Script executed:

cat -n benchmarks/scripts/run_bench.sh | tail -50

Repository: Milerius/Mantis

Length of output: 2170


🏁 Script executed:

find benchmarks -name "*.rs" -type f | head -20

Repository: Milerius/Mantis

Length of output: 367


🏁 Script executed:

grep -n "output" benchmarks/rust/src/main.rs | head -20

Repository: Milerius/Mantis

Length of output: 336


🏁 Script executed:

grep -n "output" benchmarks/cpp/src/main.cpp | head -20

Repository: Milerius/Mantis

Length of output: 362


🏁 Script executed:

sed -n '100,120p' benchmarks/cpp/src/main.cpp

Repository: Milerius/Mantis

Length of output: 716


🏁 Script executed:

cat -n benchmarks/scripts/compare.py | head -50

Repository: Milerius/Mantis

Length of output: 1913


Add host-specific namespace to prevent result overwrites across multiple hosts.

Lines 70–73 sync all remote results to the same local benchmarks/results/ directory. Since the Rust and C++ benchmark binaries generate fixed filenames like rust_mantis_copy_run_1.json and cpp_rigtorp_run_1.json (based only on queue/implementation name and run number), running this script sequentially against multiple hosts causes later hosts to overwrite earlier hosts' results. Data loss occurs because filenames contain no host identifier.

Sync into host-specific subdirectories (e.g., results/$REMOTE_HOST/) or include the hostname in artifact filenames before pulling results back.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/scripts/deploy_and_run.sh` around lines 70 - 73, The current rsync
pulls all remote results into a single directory and causes overwrites; update
the target path to include a host-specific namespace by creating and syncing
into a host subdirectory (use BENCH_DIR, REMOTE_HOST and REMOTE_DIR from the
script) — e.g., ensure mkdir -p "$BENCH_DIR/results/$REMOTE_HOST" and change the
rsync destination to "$BENCH_DIR/results/$REMOTE_HOST/" (alternatively, prepend
REMOTE_HOST to artifact filenames on the remote side before pulling) so each
host’s results are isolated.

Comment thread benchmarks/scripts/run_bench.sh
Comment thread benchmarks/scripts/setup_machine.sh
Comment thread crates/polymarket/src/market/decoder.rs
Comment thread crates/polymarket/src/market/decoder.rs Outdated
Milerius and others added 2 commits April 7, 2026 09:10
Binance: 5 new tests covering invalid price/qty parse failures.
Polymarket: 8 new tests covering invalid side, price, size, empty
book levels, skipped invalid levels, and default trade aggressor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/polymarket/src/market/decoder.rs (1)

170-245: ⚠️ Potential issue | 🟠 Major

LAST_IN_BATCH incorrectly computed from raw level count, not emitted count.

When parse_price_qty fails (line 182/216 continue) or the buffer fills at 64, the is_last check at lines 185/219 uses the raw total from JSON, not the actual number of successfully emitted events. This can result in no event having LAST_IN_BATCH set.

Example: bids = [invalid, valid], asks = []total = 2, but only 1 event emits. At count = 0, is_last = (0 + 1 == 2) = false, so the only emitted event lacks LAST_IN_BATCH.

Proposed fix: set LAST_IN_BATCH post-loop
         for (depth_idx, level) in msg.bids.iter().enumerate() {
             if count >= 64 {
                 break;
             }
             let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
                 continue;
             };
 
-            let is_last = count + 1 == total;
-            let flags = if is_last {
-                EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH
-            } else {
-                EventFlags::IS_SNAPSHOT
-            };
+            let flags = EventFlags::IS_SNAPSHOT;
 
             self.seq += 1;
             out[count] = HotEvent::book_delta(
                 recv_ts,
                 SeqNum::from_raw(self.seq),
                 instrument_id,
                 self.source_id,
                 flags,
                 BookDeltaPayload {
                     price,
                     qty,
                     side: Side::Bid,
                     action: UpdateAction::New,
                     depth: depth_idx as u8,
                     _pad: [0; 5],
                 },
             );
             count += 1;
         }
 
         for (depth_idx, level) in msg.asks.iter().enumerate() {
             if count >= 64 {
                 break;
             }
             let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
                 continue;
             };
 
-            let is_last = count + 1 == total;
-            let flags = if is_last {
-                EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH
-            } else {
-                EventFlags::IS_SNAPSHOT
-            };
+            let flags = EventFlags::IS_SNAPSHOT;
 
             self.seq += 1;
             out[count] = HotEvent::book_delta(
                 recv_ts,
                 SeqNum::from_raw(self.seq),
                 instrument_id,
                 self.source_id,
                 flags,
                 BookDeltaPayload {
                     price,
                     qty,
                     side: Side::Ask,
                     action: UpdateAction::New,
                     depth: depth_idx as u8,
                     _pad: [0; 5],
                 },
             );
             count += 1;
         }
 
+        if count > 0 {
+            out[count - 1].header.flags |= EventFlags::LAST_IN_BATCH;
+        }
+
         count
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 170 - 245, The
LAST_IN_BATCH flag is computed against the raw JSON total but some levels are
skipped (parse_price_qty) or truncated by the 64-entry limit, so update logic to
set LAST_IN_BATCH based on the actual emitted count: keep emitting as you do in
the bids/asks loops (functions/variables: parse_price_qty, out, count,
HotEvent::book_delta, self.seq, EventFlags::LAST_IN_BATCH), but remove/ignore
the is_last check inside each loop and after both loops, if count > 0 set the
LAST_IN_BATCH bit on the final emitted event in out[count-1] (preserving any
existing IS_SNAPSHOT flag) so the last successfully emitted event gets
LAST_IN_BATCH.
🧹 Nitpick comments (4)
.github/scripts/bench-report.sh (1)

8-13: Keep the static usage comment in sync with the new argument contract.

Runtime usage here is updated, but the header comment still documents the old 2-arg form. Updating that comment will prevent confusion when invoking the script manually.

Proposed doc-only fix
-# Usage: bench-report.sh <linux-json> <macos-json>
+# Usage: bench-report.sh <linux-seqlock> <macos-seqlock> [<linux-fixed> <macos-fixed>] [<linux-market-state> <macos-market-state>]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/bench-report.sh around lines 8 - 13, Update the script
header usage comment to match the new positional-argument contract used by the
variable assignments (linux_seqlock, macos_seqlock, linux_fixed, macos_fixed,
linux_market_state, macos_market_state): replace the old 2-argument usage line
with a single-line usage that documents required first two args and the optional
third/fourth and fifth/sixth pairs (e.g. bench-report.sh <linux-seqlock>
<macos-seqlock> [<linux-fixed> <macos-fixed>] [<linux-market-state>
<macos-market-state>]) so the top-of-file static comment is consistent with how
the script actually parses arguments.
crates/polymarket/src/market/decoder.rs (3)

291-294: Misleading SAFETY comment on safe code.

Line 292's // SAFETY: comment implies an unsafe block, but core::str::from_utf8 is a safe function. Per coding guidelines, // SAFETY: comments should only accompany unsafe blocks. Consider rewording to a regular comment explaining the ASCII assumption.

Suggested rewording
-    // SAFETY: JSON values in the "type" field are always ASCII
+    // Note: JSON "type" field values are ASCII, so UTF-8 conversion is infallible
     core::str::from_utf8(remaining.get(..end)?).ok()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 291 - 294, The comment
preceding the call to core::str::from_utf8 in decoder.rs incorrectly uses a "//
SAFETY:" marker (implying an unsafe block) even though from_utf8 is safe;
replace or reword that comment to a normal explanatory comment (e.g., "// JSON
values in the 'type' field are always ASCII") adjacent to the
core::str::from_utf8(remaining.get(..end)?).ok() call so it documents the ASCII
assumption without using a SAFETY label.

702-723: Test should verify LAST_IN_BATCH flag on the emitted event.

This test validates that invalid levels are skipped, but doesn't check that the single emitted event has LAST_IN_BATCH set. Adding this assertion would expose the LAST_IN_BATCH computation bug flagged earlier.

Suggested assertion
         // Only the valid level should be emitted
         assert_eq!(n, 1);
+        
+        // The sole emitted event should have LAST_IN_BATCH
+        assert!(out[0].header.flags.contains(EventFlags::LAST_IN_BATCH));
 
         if let EventBody::BookDelta(p) = out[0].body {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 702 - 723, Update the
test decode_book_skips_invalid_price_level to also assert the LAST_IN_BATCH flag
is set on the single emitted event: after decoding and before inspecting
EventBody, check the event's flags (e.g., out[0].flags or out[0].meta.flags)
includes EventFlags::LAST_IN_BATCH (or the equivalent bit/constant) so the test
verifies PolymarketMarketDecoder::decode correctly marks the only emitted
BookDelta as last in batch when make_out produces a single slot.

296-318: Consider using memchr crate for hot-path byte scanning.

Since peek_type is documented as a "hot-path optimization", the linear scans in memchr_find and memchr_byte could benefit from the SIMD-accelerated memchr crate. This is optional since the current implementation is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 296 - 318, Replace the
manual linear scans in memchr_find and memchr_byte with the SIMD-accelerated
routines from the memchr crate: add memchr = "..." to Cargo.toml, change
memchr_byte to call memchr::memchr(byte, haystack) and change memchr_find to use
memchr::memmem::Finder::new(needle).find(haystack) (or memchr::memmem::find) to
locate the subslice; keep the same Option<usize> return semantics so callers
such as peek_type continue to work unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 690-700: The test function decode_book_empty_levels_returns_zero
contains a formatting issue around the raw byte literal
br#"{"type":"book","asset_id":"abc123","bids":[],"asks":[]}"#.to_vec() which
triggers rustfmt; run cargo +nightly fmt --all to auto-fix or manually reformat
the literal and surrounding lines so they follow rustfmt rules (ensure spacing
and line breaks around the br! literal and the to_vec() call are normalized) and
re-run tests; check the PolymarketMarketDecoder::<6>::new and decoder.decode
usages remain unchanged after formatting.

---

Duplicate comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 170-245: The LAST_IN_BATCH flag is computed against the raw JSON
total but some levels are skipped (parse_price_qty) or truncated by the 64-entry
limit, so update logic to set LAST_IN_BATCH based on the actual emitted count:
keep emitting as you do in the bids/asks loops (functions/variables:
parse_price_qty, out, count, HotEvent::book_delta, self.seq,
EventFlags::LAST_IN_BATCH), but remove/ignore the is_last check inside each loop
and after both loops, if count > 0 set the LAST_IN_BATCH bit on the final
emitted event in out[count-1] (preserving any existing IS_SNAPSHOT flag) so the
last successfully emitted event gets LAST_IN_BATCH.

---

Nitpick comments:
In @.github/scripts/bench-report.sh:
- Around line 8-13: Update the script header usage comment to match the new
positional-argument contract used by the variable assignments (linux_seqlock,
macos_seqlock, linux_fixed, macos_fixed, linux_market_state,
macos_market_state): replace the old 2-argument usage line with a single-line
usage that documents required first two args and the optional third/fourth and
fifth/sixth pairs (e.g. bench-report.sh <linux-seqlock> <macos-seqlock>
[<linux-fixed> <macos-fixed>] [<linux-market-state> <macos-market-state>]) so
the top-of-file static comment is consistent with how the script actually parses
arguments.

In `@crates/polymarket/src/market/decoder.rs`:
- Around line 291-294: The comment preceding the call to core::str::from_utf8 in
decoder.rs incorrectly uses a "// SAFETY:" marker (implying an unsafe block)
even though from_utf8 is safe; replace or reword that comment to a normal
explanatory comment (e.g., "// JSON values in the 'type' field are always
ASCII") adjacent to the core::str::from_utf8(remaining.get(..end)?).ok() call so
it documents the ASCII assumption without using a SAFETY label.
- Around line 702-723: Update the test decode_book_skips_invalid_price_level to
also assert the LAST_IN_BATCH flag is set on the single emitted event: after
decoding and before inspecting EventBody, check the event's flags (e.g.,
out[0].flags or out[0].meta.flags) includes EventFlags::LAST_IN_BATCH (or the
equivalent bit/constant) so the test verifies PolymarketMarketDecoder::decode
correctly marks the only emitted BookDelta as last in batch when make_out
produces a single slot.
- Around line 296-318: Replace the manual linear scans in memchr_find and
memchr_byte with the SIMD-accelerated routines from the memchr crate: add memchr
= "..." to Cargo.toml, change memchr_byte to call memchr::memchr(byte, haystack)
and change memchr_find to use memchr::memmem::Finder::new(needle).find(haystack)
(or memchr::memmem::find) to locate the subslice; keep the same Option<usize>
return semantics so callers such as peek_type continue to work unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0148a23f-2ec0-41c9-bd57-fb841d81e8b1

📥 Commits

Reviewing files that changed from the base of the PR and between 991982e and 9ca2927.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • .github/scripts/bench-report.sh
  • .github/scripts/criterion-to-json.sh
  • .github/workflows/bench.yml
  • crates/binance/src/decoder.rs
  • crates/polymarket/src/market/decoder.rs
  • crates/transport/Cargo.toml
  • crates/transport/tests/live_feeds.rs
✅ Files skipped from review due to trivial changes (3)
  • .github/scripts/criterion-to-json.sh
  • .github/workflows/bench.yml
  • crates/transport/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/binance/src/decoder.rs

Comment thread crates/polymarket/src/market/decoder.rs
- Add missing <algorithm> include for std::min in C++ stats.hpp
- Add // SAFETY: comment to rdtsc unsafe block per project policy
- Assert core_affinity::set_for_current return value in harness
- Validate producer and consumer cores are distinct
- Fix README perf script examples (missing benchmark binary path)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/polymarket/src/market/decoder.rs (2)

701-722: Test doesn't assert on depth value.

This test verifies that invalid levels are skipped, but doesn't check the depth field of the emitted event. Adding assert_eq!(p.depth, ...) would clarify the expected semantic (should it be 0 or 1 when the first level is skipped?) and catch regressions if the depth logic changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 701 - 722, In the test
decode_book_skips_invalid_price_level, add an assertion that the emitted
BookDelta's depth is the expected value after the first (invalid) level is
skipped — e.g., assert that p.depth equals Depth::from_raw(0) (or the equivalent
zero-depth value used in your code) right after the existing assertions on
p.price, p.qty, and p.side so the test verifies depth semantics for
PolymarketMarketDecoder::<6>::new and EventBody::BookDelta.

296-318: Consider memchr crate for hot-path byte searches.

These naive O(n) / O(n*m) search implementations are called on every incoming message. The memchr crate provides SIMD-accelerated alternatives (memchr::memmem::find and memchr::memchr) that would improve throughput on this hot path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 296 - 318, The custom
byte/slice search helpers memchr_find and memchr_byte are used on the hot path
and should be replaced with SIMD-accelerated functions from the memchr crate to
improve throughput: swap memchr_find's implementation to call
memchr::memmem::find(haystack, needle) and memchr_byte to call
memchr::memchr(byte, haystack) (both return Option<usize> so adapt directly),
add memchr to Cargo.toml dependencies, and update the use/imports (e.g., use
memchr::memmem and memchr::memchr) so tests and callers of
memchr_find/memchr_byte keep the same signature and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 177-209: The emitted BookDeltaPayload.depth currently uses the
enumerate index (depth_idx) which advances even when parse_price_qty::<D>
returns None, causing holes; fix by introducing a per-side emitted counter
(e.g., let mut bid_depth = 0u8) inside the bids loop and set
BookDeltaPayload.depth = bid_depth, incrementing bid_depth only when you
successfully parse and push an event; apply the same pattern (e.g., ask_depth)
to the asks loop and ensure you stop at count >= 64 as before and still
increment self.seq and count only when emitting an event.

---

Nitpick comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 701-722: In the test decode_book_skips_invalid_price_level, add an
assertion that the emitted BookDelta's depth is the expected value after the
first (invalid) level is skipped — e.g., assert that p.depth equals
Depth::from_raw(0) (or the equivalent zero-depth value used in your code) right
after the existing assertions on p.price, p.qty, and p.side so the test verifies
depth semantics for PolymarketMarketDecoder::<6>::new and EventBody::BookDelta.
- Around line 296-318: The custom byte/slice search helpers memchr_find and
memchr_byte are used on the hot path and should be replaced with
SIMD-accelerated functions from the memchr crate to improve throughput: swap
memchr_find's implementation to call memchr::memmem::find(haystack, needle) and
memchr_byte to call memchr::memchr(byte, haystack) (both return Option<usize> so
adapt directly), add memchr to Cargo.toml dependencies, and update the
use/imports (e.g., use memchr::memmem and memchr::memchr) so tests and callers
of memchr_find/memchr_byte keep the same signature and behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3786a78b-5af8-41bc-a67c-e4e27ae84f31

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca2927 and 4de2cea.

📒 Files selected for processing (1)
  • crates/polymarket/src/market/decoder.rs

Comment on lines +177 to +209
for (depth_idx, level) in msg.bids.iter().enumerate() {
if count >= 64 {
break;
}
let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
continue;
};

let is_last = count + 1 == total;
let flags = if is_last {
EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH
} else {
EventFlags::IS_SNAPSHOT
};

self.seq += 1;
out[count] = HotEvent::book_delta(
recv_ts,
SeqNum::from_raw(self.seq),
instrument_id,
self.source_id,
flags,
BookDeltaPayload {
price,
qty,
side: Side::Bid,
action: UpdateAction::New,
depth: depth_idx as u8,
_pad: [0; 5],
},
);
count += 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "decoder.rs" | head -5

Repository: Milerius/Mantis

Length of output: 132


🏁 Script executed:

cd crates/polymarket && pwd && head -300 src/market/decoder.rs | tail -150

Repository: Milerius/Mantis

Length of output: 4799


🏁 Script executed:

cat -n crates/polymarket/src/market/decoder.rs | sed -n '150,250p'

Repository: Milerius/Mantis

Length of output: 3820


🏁 Script executed:

cat -n crates/polymarket/src/market/decoder.rs | sed -n '210,280p'

Repository: Milerius/Mantis

Length of output: 2796


🏁 Script executed:

rg -n "depth:" crates/polymarket/src/ -A 2 -B 2

Repository: Milerius/Mantis

Length of output: 1225


🏁 Script executed:

rg -n "struct BookDeltaPayload\|depth:" crates/ -A 5 -B 2

Repository: Milerius/Mantis

Length of output: 41


🏁 Script executed:

rg -n "depth.*venue\|venue.*depth\|depth.*index\|depth.*position" crates/polymarket/ -i

Repository: Milerius/Mantis

Length of output: 41


🏁 Script executed:

find . -name "*decoder*test*" -o -name "*test*decoder*" 2>/dev/null

Repository: Milerius/Mantis

Length of output: 41


🏁 Script executed:

find . -path "*/test*" -name "*.rs" | xargs rg -l "decoder\|BookDelta" 2>/dev/null | head -5

Repository: Milerius/Mantis

Length of output: 41


🏁 Script executed:

cat -n crates/polymarket/src/market/decoder.rs | sed -n '150,300p'

Repository: Milerius/Mantis

Length of output: 5820


🏁 Script executed:

rg -n "depth" crates/polymarket/src/market/ -B 3 -A 3

Repository: Milerius/Mantis

Length of output: 4824


🏁 Script executed:

rg -n "BookDeltaPayload" crates/ -B 5 -A 10 | head -100

Repository: Milerius/Mantis

Length of output: 7370


🏁 Script executed:

find . -name "*.md" -o -name "*.txt" | xargs rg -l "depth" 2>/dev/null | head -3

Repository: Milerius/Mantis

Length of output: 134


🏁 Script executed:

cat -n crates/polymarket/src/market/decoder.rs | sed -n '420,500p'

Repository: Milerius/Mantis

Length of output: 3761


🏁 Script executed:

rg -n "parse_price_qty\|Invalid|fail" crates/polymarket/src/market/decoder.rs -B 2 -A 2

Repository: Milerius/Mantis

Length of output: 285


depth field represents input array index, not emitted event position within each side.

When parse_price_qty fails, continue is used but the enumerate index still advances. If bids[0] fails and bids[1] succeeds, the emitted event has depth: 1 instead of the correct depth: 0. The same issue applies to asks.

Use a separate counter per side to track emitted position:

Fix
-        for (depth_idx, level) in msg.bids.iter().enumerate() {
+        let mut bid_depth: u8 = 0;
+        for level in &msg.bids {
             if count >= 64 {
                 break;
             }
             let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
                 continue;
             };
             // ... flags logic ...
             self.seq += 1;
             out[count] = HotEvent::book_delta(
                 // ...
                 BookDeltaPayload {
                     price,
                     qty,
                     side: Side::Bid,
                     action: UpdateAction::New,
-                    depth: depth_idx as u8,
+                    depth: bid_depth,
                     _pad: [0; 5],
                 },
             );
+            bid_depth += 1;
             count += 1;
         }

Apply the same pattern to the asks loop (lines 211+).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (depth_idx, level) in msg.bids.iter().enumerate() {
if count >= 64 {
break;
}
let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
continue;
};
let is_last = count + 1 == total;
let flags = if is_last {
EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH
} else {
EventFlags::IS_SNAPSHOT
};
self.seq += 1;
out[count] = HotEvent::book_delta(
recv_ts,
SeqNum::from_raw(self.seq),
instrument_id,
self.source_id,
flags,
BookDeltaPayload {
price,
qty,
side: Side::Bid,
action: UpdateAction::New,
depth: depth_idx as u8,
_pad: [0; 5],
},
);
count += 1;
}
let mut bid_depth: u8 = 0;
for level in &msg.bids {
if count >= 64 {
break;
}
let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
continue;
};
let is_last = count + 1 == total;
let flags = if is_last {
EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH
} else {
EventFlags::IS_SNAPSHOT
};
self.seq += 1;
out[count] = HotEvent::book_delta(
recv_ts,
SeqNum::from_raw(self.seq),
instrument_id,
self.source_id,
flags,
BookDeltaPayload {
price,
qty,
side: Side::Bid,
action: UpdateAction::New,
depth: bid_depth,
_pad: [0; 5],
},
);
bid_depth += 1;
count += 1;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 177 - 209, The emitted
BookDeltaPayload.depth currently uses the enumerate index (depth_idx) which
advances even when parse_price_qty::<D> returns None, causing holes; fix by
introducing a per-side emitted counter (e.g., let mut bid_depth = 0u8) inside
the bids loop and set BookDeltaPayload.depth = bid_depth, incrementing bid_depth
only when you successfully parse and push an event; apply the same pattern
(e.g., ask_depth) to the asks loop and ensure you stop at count >= 64 as before
and still increment self.seq and count only when emitting an event.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
benchmarks/cpp/src/stats.hpp (1)

33-40: ⚠️ Potential issue | 🟡 Minor

Use the inclusive overflow-bucket ceiling.

percentile() reports overflow_upper_bound() directly, but Line 39 currently returns the exclusive end of the documented range. Bucket 0 is described as 4096..8191, yet this code reports 8192, so any percentile that lands in overflow_ is overstated by one cycle.

🩹 Proposed fix
 inline uint64_t overflow_upper_bound(size_t i) {
     if (i >= LOG_BUCKETS - 1) {
         return std::numeric_limits<uint64_t>::max();
     }
     // bucket i covers [4096 << i, (4096 << (i+1)) - 1]
-    return static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1);
+    return (static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/src/stats.hpp` around lines 33 - 40, The function
overflow_upper_bound(size_t i) currently returns the exclusive end of the
documented bucket range (e.g., returns 8192 for bucket 0) which inflates
percentiles; change its non-max branch to return the inclusive upper bound by
subtracting one from the shifted value (i.e., use
(static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1) while keeping the
existing numeric_limits<uint64_t>::max() for the final overflow bucket.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 314-317: Replace the locale-dependent snprintf usage that formats
hist.mean() (the mean_buf/snprintf sequence and the os << "   
\"cycles_per_op_mean\"" emission) with a locale-independent formatter: include
<iomanip> and <locale>, create a std::ostringstream, imbue it with
std::locale::classic(), use std::fixed and std::setprecision(1) to stream
hist.mean() into the ostringstream, then write the resulting string to os for
the "cycles_per_op_mean" field; this ensures JSON uses a dot decimal separator
regardless of LC_NUMERIC.

In `@benchmarks/README.md`:
- Around line 13-20: The README currently claims using `lfence; rdtsc` and
same-CCD + `isolcpus` is sufficient, but it omits that synchronized/invariant
TSC is a hard prerequisite because `lfence` only serializes `rdtsc` locally and
does not eliminate cross-core TSC skew; update the benchmark docs to explicitly
require invariant/synchronized TSC (or mention using a kernel TSC sync feature)
and warn that producer/consumer timestamp deltas (`producer_rdtsc`,
`consumer_rdtsc`) can wrap or produce huge outliers on machines without it; keep
the existing same-CCD/isolation guidance but add a clear note adjacent to the
`lfence; rdtsc` and same-CCD/isolcpus lines stating this hard requirement and
recommended ways to verify or enable synchronized TSC.

---

Duplicate comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 33-40: The function overflow_upper_bound(size_t i) currently
returns the exclusive end of the documented bucket range (e.g., returns 8192 for
bucket 0) which inflates percentiles; change its non-max branch to return the
inclusive upper bound by subtracting one from the shifted value (i.e., use
(static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1) while keeping the
existing numeric_limits<uint64_t>::max() for the final overflow bucket.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a13487a-888e-4bfd-af46-b0cf048f0ad3

📥 Commits

Reviewing files that changed from the base of the PR and between 4de2cea and 23500a2.

📒 Files selected for processing (4)
  • benchmarks/README.md
  • benchmarks/cpp/src/stats.hpp
  • benchmarks/rust/src/harness.rs
  • benchmarks/rust/src/rdtsc.rs
✅ Files skipped from review due to trivial changes (1)
  • benchmarks/rust/src/rdtsc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • benchmarks/rust/src/harness.rs

Comment on lines +314 to +317
// Format mean with one decimal place
char mean_buf[64];
std::snprintf(mean_buf, sizeof(mean_buf), "%.1f", hist.mean());
os << " \"cycles_per_op_mean\": " << mean_buf << ",\n";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "stats.hpp" | head -20

Repository: Milerius/Mantis

Length of output: 89


🏁 Script executed:

if [ -f "benchmarks/cpp/src/stats.hpp" ]; then
  wc -l "benchmarks/cpp/src/stats.hpp"
fi

Repository: Milerius/Mantis

Length of output: 91


🏁 Script executed:

cat -n "benchmarks/cpp/src/stats.hpp" | sed -n '310,320p'

Repository: Milerius/Mantis

Length of output: 621


🏁 Script executed:

head -50 "benchmarks/cpp/src/stats.hpp"

Repository: Milerius/Mantis

Length of output: 1651


🏁 Script executed:

grep -n "#include" "benchmarks/cpp/src/stats.hpp"

Repository: Milerius/Mantis

Length of output: 244


🏁 Script executed:

grep -n "snprintf\|sprintf" "benchmarks/cpp/src/stats.hpp"

Repository: Milerius/Mantis

Length of output: 130


Serialize cycles_per_op_mean with a locale-independent decimal point.

Line 316 uses std::snprintf("%.1f", ...), which follows LC_NUMERIC. On hosts with a comma decimal separator, this emits 12,3 and breaks JSON validity for the cross-hardware comparison tooling. Use std::ostringstream with std::locale::classic() to force the classic locale.

Add these includes at the top of the file:

`#include` <iomanip>
`#include` <locale>

Then update the code:

🩹 Proposed fix
-    // Format mean with one decimal place
-    char mean_buf[64];
-    std::snprintf(mean_buf, sizeof(mean_buf), "%.1f", hist.mean());
-    os << "    \"cycles_per_op_mean\": " << mean_buf << ",\n";
+    // Format mean with one decimal place using a locale-independent decimal point
+    std::ostringstream mean_os;
+    mean_os.imbue(std::locale::classic());
+    mean_os << std::fixed << std::setprecision(1) << hist.mean();
+    os << "    \"cycles_per_op_mean\": " << mean_os.str() << ",\n";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/src/stats.hpp` around lines 314 - 317, Replace the
locale-dependent snprintf usage that formats hist.mean() (the mean_buf/snprintf
sequence and the os << "    \"cycles_per_op_mean\"" emission) with a
locale-independent formatter: include <iomanip> and <locale>, create a
std::ostringstream, imbue it with std::locale::classic(), use std::fixed and
std::setprecision(1) to stream hist.mean() into the ostringstream, then write
the resulting string to os for the "cycles_per_op_mean" field; this ensures JSON
uses a dot decimal separator regardless of LC_NUMERIC.

Comment thread benchmarks/README.md
- Fix isolcpus parsing: use dedicated cmdline param parser instead of
  read_proc_field which expects colon-separated fields
- Fix LAST_IN_BATCH: set flag on last emitted event after both loops,
  not based on raw JSON level count (which ignores skipped/truncated levels)
- Add git/curl to setup_machine.sh bootstrap packages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
crates/polymarket/src/market/decoder.rs (1)

176-201: ⚠️ Potential issue | 🟠 Major

BookDeltaPayload.depth is incorrect when invalid levels are skipped.

At Line 176 and Line 203, depth is derived from enumerate() index. Because Line 180 and Line 207 can continue, emitted depths can have holes (e.g., first valid bid emitted with depth 1). This breaks the documented “0 = best level” contract.

Suggested fix
-        for (depth_idx, level) in msg.bids.iter().enumerate() {
+        let mut bid_depth: u8 = 0;
+        for level in &msg.bids {
             if count >= 64 {
                 break;
             }
             let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
                 continue;
             };

             self.seq += 1;
             out[count] = HotEvent::book_delta(
                 recv_ts,
                 SeqNum::from_raw(self.seq),
                 instrument_id,
                 self.source_id,
                 EventFlags::IS_SNAPSHOT,
                 BookDeltaPayload {
                     price,
                     qty,
                     side: Side::Bid,
                     action: UpdateAction::New,
-                    depth: depth_idx as u8,
+                    depth: bid_depth,
                     _pad: [0; 5],
                 },
             );
+            bid_depth = bid_depth.saturating_add(1);
             count += 1;
         }

-        for (depth_idx, level) in msg.asks.iter().enumerate() {
+        let mut ask_depth: u8 = 0;
+        for level in &msg.asks {
             if count >= 64 {
                 break;
             }
             let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
                 continue;
             };

             self.seq += 1;
             out[count] = HotEvent::book_delta(
                 recv_ts,
                 SeqNum::from_raw(self.seq),
                 instrument_id,
                 self.source_id,
                 EventFlags::IS_SNAPSHOT,
                 BookDeltaPayload {
                     price,
                     qty,
                     side: Side::Ask,
                     action: UpdateAction::New,
-                    depth: depth_idx as u8,
+                    depth: ask_depth,
                     _pad: [0; 5],
                 },
             );
+            ask_depth = ask_depth.saturating_add(1);
             count += 1;
         }

Also applies to: 203-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 176 - 201, The emitted
BookDeltaPayload.depth currently uses the enumerate() index (depth_idx) which
becomes incorrect when parse_price_qty(...) returns None and the loop continues;
change each bid and ask loop to maintain a separate depth counter (e.g.,
bid_depth / ask_depth initialized to 0) that is only incremented when you
successfully parse and emit a level so depths are contiguous (0 = best level)
and assign that counter to BookDeltaPayload.depth; apply the same fix to both
the bids loop (for (depth_idx, level) in msg.bids.iter().enumerate()) and the
asks loop so skipped/invalid levels do not create depth holes.
🧹 Nitpick comments (2)
crates/polymarket/src/market/decoder.rs (1)

61-66: Avoid wildcard match arms to comply with repo rule.

Both dispatch matches use _ => .... Please rewrite with explicit branching to satisfy the no-wildcard rule.

As per coding guidelines, No wildcard matches — use explicit destructuring in pattern matching.

Also applies to: 248-252

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 61 - 66, Replace the
wildcard arms in the message-type dispatches with explicit branches: in the
match over msg_type (the block that currently has "price_change",
"last_trade_price", "book" and `_ => 0`) change the `_` arm to a named binding
(e.g., other) and handle it explicitly by logging/handling the unrecognized
msg_type and returning 0; do the same for the other dispatch at the same file
(the match around lines 248-252). Specifically update the match in the decoder
(the function that calls self.decode_price_change / self.decode_trade /
self.decode_book) to replace `_ => 0` with an explicit arm like other => { /*
log unrecognized message type using other */ 0 } so there are no wildcard `_`
matches while preserving existing behavior.
benchmarks/rust/src/stats.rs (1)

379-395: Lock the overflow-bucket contract with an exact assertion.

This only proves that p99.99 lands somewhere above the direct range. It still passes if a later cleanup changes the intentional exclusive-boundary behavior and Rust/C++ quietly drift. For this fixture, asserting h.percentile(99.99) == 131072 (or checking overflow_upper_bound() directly) would make that contract explicit.

Based on learnings, overflow_upper_bound(i) intentionally returns the exclusive top of each log bucket rather than an inclusive upper bound.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/rust/src/stats.rs` around lines 379 - 395, The test's assertion is
too weak; replace the loose range check with an exact contract check so overflow
behavior can't drift. In the histogram_overflow_buckets test, assert that
CycleHistogram::percentile(99.99) equals the expected exclusive overflow upper
bound (use assert_eq!(p9999, 131072)) or alternatively call
h.overflow_upper_bound(...) and assert that it returns 131072; update the
assertion replacing the current assert!(p9999 > 4095, ...) to an exact equality
assertion referencing percentile (and/or overflow_upper_bound) to lock the
overflow-bucket contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/rust/src/stats.rs`:
- Line 181: The serialized `compiler` field is calling rustc_version() at
runtime (e.g., the `compiler: rustc_version()` assignment in stats.rs), which
reports the runner's toolchain; instead capture the compiler at build time via a
build script and read that value at runtime. Add a build.rs that queries the
compiler (e.g., using the rustc_version crate or `rustc --version`) and emits it
with println!("cargo:rustc-env=BUILD_RUSTC_VERSION={}") and then replace calls
to rustc_version() in stats.rs (and the other spots around the 212-219 block) to
read the env var (e.g., env!("BUILD_RUSTC_VERSION") or std::env::var at runtime)
so the JSON contains the toolchain used to build the binary rather than the
runner's rustc.
- Around line 19-20: The mean calculation can be corrupted by wrapped TSC deltas
(near u64::MAX); modify the stats update path (the method that processes a new
cycles value—e.g., add() or the code that does self.sum += cycles) to detect and
ignore suspiciously large deltas before mutating sum/count (for example: if
cycles > (u64::MAX / 2) or another large threshold treat it as a wrapped/invalid
sample and return early), and keep min/max/count updates unchanged if you
prefer; also ensure mean() uses checked arithmetic (checked_add/checked_div or
using u128 accumulation) to avoid panics/wraps when computing self.sum /
self.count using the struct fields sum, min, max, count.

In `@crates/polymarket/src/market/decoder.rs`:
- Around line 272-284: peek_type currently only matches the exact bytes
"\"type\":\"" and so misses valid JSON with whitespace; update peek_type to
first locate the key bytes b"\"type\"" (using memchr_find), then advance past
that key, skip any ASCII whitespace bytes, assert/verify the next byte is b':',
skip any ASCII whitespace again, verify the next byte is the opening quote b'"',
and then use memchr_byte to find the closing quote and return the slice between
quotes; refer to peek_type, memchr_find, memchr_byte and the variables
pos/value_start/remaining when making the change.

---

Duplicate comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 176-201: The emitted BookDeltaPayload.depth currently uses the
enumerate() index (depth_idx) which becomes incorrect when parse_price_qty(...)
returns None and the loop continues; change each bid and ask loop to maintain a
separate depth counter (e.g., bid_depth / ask_depth initialized to 0) that is
only incremented when you successfully parse and emit a level so depths are
contiguous (0 = best level) and assign that counter to BookDeltaPayload.depth;
apply the same fix to both the bids loop (for (depth_idx, level) in
msg.bids.iter().enumerate()) and the asks loop so skipped/invalid levels do not
create depth holes.

---

Nitpick comments:
In `@benchmarks/rust/src/stats.rs`:
- Around line 379-395: The test's assertion is too weak; replace the loose range
check with an exact contract check so overflow behavior can't drift. In the
histogram_overflow_buckets test, assert that CycleHistogram::percentile(99.99)
equals the expected exclusive overflow upper bound (use assert_eq!(p9999,
131072)) or alternatively call h.overflow_upper_bound(...) and assert that it
returns 131072; update the assertion replacing the current assert!(p9999 > 4095,
...) to an exact equality assertion referencing percentile (and/or
overflow_upper_bound) to lock the overflow-bucket contract.

In `@crates/polymarket/src/market/decoder.rs`:
- Around line 61-66: Replace the wildcard arms in the message-type dispatches
with explicit branches: in the match over msg_type (the block that currently has
"price_change", "last_trade_price", "book" and `_ => 0`) change the `_` arm to a
named binding (e.g., other) and handle it explicitly by logging/handling the
unrecognized msg_type and returning 0; do the same for the other dispatch at the
same file (the match around lines 248-252). Specifically update the match in the
decoder (the function that calls self.decode_price_change / self.decode_trade /
self.decode_book) to replace `_ => 0` with an explicit arm like other => { /*
log unrecognized message type using other */ 0 } so there are no wildcard `_`
matches while preserving existing behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f70fa48-1426-4937-a132-d0191d424b06

📥 Commits

Reviewing files that changed from the base of the PR and between 23500a2 and 11282f1.

📒 Files selected for processing (3)
  • benchmarks/rust/src/stats.rs
  • benchmarks/scripts/setup_machine.sh
  • crates/polymarket/src/market/decoder.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • benchmarks/scripts/setup_machine.sh

Comment thread benchmarks/rust/src/stats.rs Outdated
Comment thread benchmarks/rust/src/stats.rs Outdated
implementation: implementation.to_string(),
language: "Rust".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
compiler: rustc_version(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Capture the build compiler, not the runner's rustc.

Every JSON result serializes this value, but rustc --version here reports whatever toolchain happens to be installed on the machine executing the benchmark. In the deploy/run workflow that can stamp results with the wrong compiler version or "unknown" even though the binary was built with a known toolchain.

Based on learnings, "All benchmarks must export JSON for cross-hardware comparison".

Also applies to: 212-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/rust/src/stats.rs` at line 181, The serialized `compiler` field is
calling rustc_version() at runtime (e.g., the `compiler: rustc_version()`
assignment in stats.rs), which reports the runner's toolchain; instead capture
the compiler at build time via a build script and read that value at runtime.
Add a build.rs that queries the compiler (e.g., using the rustc_version crate or
`rustc --version`) and emits it with
println!("cargo:rustc-env=BUILD_RUSTC_VERSION={}") and then replace calls to
rustc_version() in stats.rs (and the other spots around the 212-219 block) to
read the env var (e.g., env!("BUILD_RUSTC_VERSION") or std::env::var at runtime)
so the JSON contains the toolchain used to build the binary rather than the
runner's rustc.

Comment on lines +272 to +284
fn peek_type(buf: &[u8]) -> Option<&str> {
// Search for the byte pattern: "type":"
let needle = b"\"type\":\"";
let pos = memchr_find(buf, needle)?;
let value_start = pos + needle.len();

// Find the closing quote
let remaining = buf.get(value_start..)?;
let end = memchr_byte(b'"', remaining)?;

// SAFETY: JSON values in the "type" field are always ASCII
core::str::from_utf8(remaining.get(..end)?).ok()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

peek_type currently rejects valid JSON formatting variants.

Line 274 hardcodes "\"type\":\""; valid JSON like {"type" : "book"} won’t match and gets dropped. This creates avoidable decode misses.

Suggested fix
 fn peek_type(buf: &[u8]) -> Option<&str> {
-    // Search for the byte pattern: "type":"
-    let needle = b"\"type\":\"";
-    let pos = memchr_find(buf, needle)?;
-    let value_start = pos + needle.len();
-
-    // Find the closing quote
-    let remaining = buf.get(value_start..)?;
-    let end = memchr_byte(b'"', remaining)?;
-
-    // SAFETY: JSON values in the "type" field are always ASCII
-    core::str::from_utf8(remaining.get(..end)?).ok()
+    let key = b"\"type\"";
+    let pos = memchr_find(buf, key)?;
+    let mut i = pos + key.len();
+    while matches!(buf.get(i), Some(b' ' | b'\n' | b'\r' | b'\t')) { i += 1; }
+    if *buf.get(i)? != b':' {
+        return None;
+    }
+    i += 1;
+    while matches!(buf.get(i), Some(b' ' | b'\n' | b'\r' | b'\t')) { i += 1; }
+    if *buf.get(i)? != b'"' {
+        return None;
+    }
+    i += 1;
+    let rest = buf.get(i..)?;
+    let end = memchr_byte(b'"', rest)?;
+    core::str::from_utf8(rest.get(..end)?).ok()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/polymarket/src/market/decoder.rs` around lines 272 - 284, peek_type
currently only matches the exact bytes "\"type\":\"" and so misses valid JSON
with whitespace; update peek_type to first locate the key bytes b"\"type\""
(using memchr_find), then advance past that key, skip any ASCII whitespace
bytes, assert/verify the next byte is b':', skip any ASCII whitespace again,
verify the next byte is the opening quote b'"', and then use memchr_byte to find
the closing quote and return the slice between quotes; refer to peek_type,
memchr_find, memchr_byte and the variables pos/value_start/remaining when making
the change.

Milerius and others added 9 commits April 7, 2026 10:31
core_affinity only sees cores in the process's default affinity mask,
which excludes isolcpus cores. Use sched_setaffinity(2) directly via
libc which can pin to any online core including isolated ones.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pthread_setaffinity_np may silently fail on isolated cores. Use
sched_setaffinity(2) directly with error checking, matching the
Rust harness approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both producer and consumer now run on spawned threads (not main).
Consumer records raw cycle deltas into a pre-allocated Vec/vector,
then builds the histogram after measurement completes. This avoids
cache pollution from histogram writes during the hot loop, which was
inflating C++ numbers by ~25x.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No traits, no Arc, no Vec, no Result unwrapping. Uses leaked Box
for shared state, usize-cast for Send, inline asm rdtsc, and
sum += delta matching the HFT University run_latency() exactly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two &mut references to the same SpscRing from different threads is UB
per Rust aliasing rules. Use spsc_ring() split handles instead — they
access the engine through Arc<RingEngine> via &self, which is sound.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline RawRing with no Arc, no Result, no &mut self, no generics.
push() -> bool, pop(&mut out) -> bool. Stack-local via Box::leak.
Raw pointer field projections avoid Rust noalias interference.
Same algorithm as RingEngine but stripped to bare metal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…wrap

- Colocate head+tail_cached on same 64B cache line (producer-local)
- Colocate tail+head_cached on same 64B cache line (consumer-local)
- Use 64-byte alignment instead of 128-byte (x86_64 native cache line)
- Branch-based index wrapping (branch predictor > always-execute AND)
- Enable LTO + codegen-units=1 for cross-crate inlining
- pop() takes raw *mut Msg to avoid any &mut reference overhead

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l API, inline(always)

Benchmark-driven optimizations closing the gap with C++ rigtorp:

- Add CacheLine<T> with native alignment (64B x86, 128B ARM)
- Colocate head+tail_cached on producer CacheLine, tail+head_cached
  on consumer CacheLine — reduces hot-path cache footprint from 4
  lines to 2
- Add push() -> bool and unsafe pop(*mut T) -> bool to RingEngine,
  RawRing, Producer, Consumer — no Result overhead, no return-by-value
- Add BranchWrap index strategy (branch predictor > bitmask AND)
- Upgrade all hot-path functions to #[inline(always)]
- Add read_into slot operation for zero-copy pop into caller buffer

Fully backwards-compatible: existing try_push/try_pop Result API and
Arc-based split handles are unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iants

- Add push_shared/pop_shared on RawRing taking &self instead of &mut self
  to avoid LLVM noalias interference in two-thread benchmarks
- Raw bench now tests all 3 Rust queues: mantis-inline (push_shared),
  mantis-copy (push/pop &self), rtrb (push/pop Result)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Milerius Milerius force-pushed the feat/venue-decoder-crates branch from f4603ab to a765d86 Compare April 7, 2026 13:09
- Add SpscRingFast preset using BranchWrap index strategy
- Upgrade InlineStorage::capacity and slot_ptr to inline(always)
- Raw bench now tests: Pow2Masked, BranchWrap, copy, and rtrb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Milerius Milerius force-pushed the feat/venue-decoder-crates branch from da6e2bd to c7f2eba Compare April 7, 2026 13:15
Milerius and others added 7 commits April 7, 2026 15:25
- Remove dead benchmark modules (harness, message, queues, rdtsc, stats)
- Simplify main.rs to only use raw_bench protocol
- Fix clippy: backticks in docs, allow inline_always workspace-wide,
  use &raw mut in tests
- Add comprehensive RESULTS.md with methodology, hardware, configuration,
  and final benchmark numbers (472 cycles/op SpscRingFast)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single-thread SPSC claims (<10ns) with measured cross-core
latency (130ns / 472 TSC cycles on isolated cores). Remove "same
ballpark as Optiver/LMAX" claim. Add link to benchmarks/RESULTS.md.
Update benchmark table, quick start, and highlights to reflect
actual measured performance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o C++

Rust: consumer thread captures perf_event_open grouped counters
(instructions, branch misses, L1D misses, LLC misses) per-op.
Printed alongside cycles/op when available (Linux only).

C++: add --mode raw flag for sum/count protocol matching HFT
University. Tests both rigtorp AND drogalis in raw mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…parison

Identical algorithm to library SpscRingFast but with no Storage trait,
no IndexStrategy, no Instrumentation, no RawRing wrapper — pure
struct with inline push/pop. Shows the true floor of Rust SPSC latency
vs the library abstraction cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two optimizations identified via assembly analysis:

1. Shadow tail_local/head_local on x86_64: LLVM won't fold a Relaxed
   load after a Release store to the same atomic. Cell shadow breaks
   the dependency chain. Expected: ~30-50 cycles/op improvement.

2. Combine rdtsc inside asm block: "shl rdx,32; or rax,rdx" produces
   2 instructions vs the previous 3 (shl + mov eax,eax + add).
   Expected: ~2-5 cycles/op improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Milerius Milerius merged commit 04d93a4 into main Apr 7, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant