Close the class: missing-order reconciliation onto monotonic receipt time#4387
Conversation
6fb1482 to
0e4d121
Compare
|
Hi @folknor, Thanks for the PR and the careful clock-axis cleanup. The direction still reads right to me, but I found one blocker before this lands.
I think this should be fixed here rather than as a follow-up. Either clear first and then stamp local activity for non-terminal order events like Let me know if anything is unclear. |
`handle_missing_order` had a recency gate comparing the order's venue `ts_last` against `self.clock` now - a cross-axis compare that nautechsystems#4376 hardened against underflow but left on the trading clock. Under a custom live/sandbox clock factory the trading clock is not wall-paced (it can run accelerated or sit on a foreign epoch), so that window did not measure the real settling time it was meant to, and a corrupt far-future `ts_last` could stall the order's reconciliation. Drop the venue-`ts_last` gate. The missing-order settling window is now solely the monotonic `order_local_activity` recency gate (the `RecencyMap` from the recency-map consolidation), which measures real receipt-time elapsed at any clock speed. This also removes the warn-and-defer arm nautechsystems#4376 added for a far-future `ts_last`: with the cross-axis gate gone there is no longer a failure mode to warn about - the order simply reconciles once the real grace expires. Making local activity the sole gate exposed an ordering bug in the `LiveNode` dispatch path: acknowledgement events (`Accepted` et al) stamped local activity and then immediately wiped it via `clear_recon_tracking`, so a just-accepted order omitted by a lagging venue report could be falsely rejected as NOT_FOUND_AT_VENUE. The per-order-event tracking now lives in `ExecutionManager::observe_order_event`, which clears first and stamps after - matching the ordering `observe_execution_report` already used - and the node's batch accept/cancel arms are reordered the same way. Audit of the remaining live timers for the same class: the cache-purge intervals are deliberately left on the ExecutionEngine clock-timer path so they stay controlled by the injected Clock for custom-clock callers; a comment and a conversion test now pin that. Data-engine, order emulator, and core timing stay domain/deterministic, and `snapshot_positions_interval_secs` is left as-is (no live monotonic replacement). The missing-order test becomes a differential paused-time case: with a far-future venue `ts_last` present throughout, recent local activity defers, and after the monotonic grace expires reconciliation proceeds. It tracks the accepted event through `observe_order_event` - the exact `LiveNode` call - and a dedicated regression test covers the just-accepted-order deferral end to end. Coded by an LLM.
0e4d121 to
4e379e4
Compare
|
I went with your second option, generalized slightly: the per-order-event tracking now lives in On tests: the far-future |
cjdsellers
left a comment
There was a problem hiding this comment.
Thank you for the follow-ups @folknor 👌
handle_missing_orderstill had a recency gate comparing the order's venuets_lastagainstself.clocknow. #4376 hardened its subtraction againstunderflow but left it on the trading clock, because at the time there was no
monotonic instant to move it to - it read a domain timestamp off the order.
A caller can hand a live/sandbox node a custom clock through the factory added
in #4331, and that clock is not guaranteed to tick at wall rate - it can run
accelerated or sit on a foreign epoch. So
self.clock-now minus a venuets_lastwas never real elapsed time, and the settling window it was meant toenforce shrank or stretched with the clock. A corrupt far-future
ts_last(adouble-scaled timestamp, say) could also park that order's reconciliation.
Drop the venue-
ts_lastgate. The missing-order settling window is alreadycovered by the monotonic
order_local_activityrecency gate - theRecencyMapfrom #4386 - which measures real receipt-time elapsed at any clock speed. Both
gates were trying to provide a grace period before declaring the order missing;
the venue-
ts_lastversion measured that grace on the wrong clock axis, whilethe local-activity
RecencyMapmeasures the real settling window directly.This also drops the warn-and-defer arm #4376 added for a far-future
ts_last.With the cross-axis gate gone there is no longer a failure mode to warn about. A
garbage
ts_lastno longer influences reconciliation timing at all - the orderreconciles once the real monotonic grace expires, same as any other.
The audit
The rest of the sweep is deciding, per live timer, whether it is a real-time
window on the wrong clock or a domain/deterministic one that belongs on
self.clock.ExecutionEngine::start_purge_timersarms these on the injectedClock, andthat is correct: a custom-clock caller (Add caller-supplied clock factory seam for live/sandbox nodes #4331) wants purges to ride their
clock, so purge cadence and retention advance with the injected clock rather
than with wall time.
LiveNodealso dispatches purge checks from itsmonotonic maintenance loop, but the two share the same domain-time cutoff and
are idempotent, so the engine timer remains part of the intended behavior. A
comment at the conversion seam and a
From<LiveExecEngineConfig>test now pinthe pass-through so this is not "tidied" into breakage later.
are domain/deterministic, not real-time settling windows.
snapshot_positions_interval_secs- left as-is. No live monotonicreplacement to move it to.
Tests
The missing-order test becomes a differential paused-time case: with a
far-future venue
ts_lastpresent the whole time, recent local activity defers,and after the monotonic grace expires reconciliation proceeds - so a regression
back to the venue axis would fail it. The purge pass-through gets a conversion
assertion.