Skip to content

Close the class: missing-order reconciliation onto monotonic receipt time#4387

Merged
cjdsellers merged 1 commit into
nautechsystems:developfrom
folknor:sweep-realtime-clock-gates
Jul 5, 2026
Merged

Close the class: missing-order reconciliation onto monotonic receipt time#4387
cjdsellers merged 1 commit into
nautechsystems:developfrom
folknor:sweep-realtime-clock-gates

Conversation

@folknor

@folknor folknor commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

handle_missing_order still had a recency gate comparing the order's venue
ts_last against self.clock now. #4376 hardened its subtraction against
underflow but left it on the trading clock, because at the time there was no
monotonic instant to move it to - it read a domain timestamp off the order.

A caller can hand a live/sandbox node a custom clock through the factory added
in #4331, and that clock is not guaranteed to tick at wall rate - it can run
accelerated or sit on a foreign epoch. So self.clock-now minus a venue
ts_last was never real elapsed time, and the settling window it was meant to
enforce shrank or stretched with the clock. A corrupt far-future ts_last (a
double-scaled timestamp, say) could also park that order's reconciliation.

Drop the venue-ts_last gate. The missing-order settling window is already
covered by the monotonic order_local_activity recency gate - the RecencyMap
from #4386 - which measures real receipt-time elapsed at any clock speed. Both
gates were trying to provide a grace period before declaring the order missing;
the venue-ts_last version measured that grace on the wrong clock axis, while
the local-activity RecencyMap measures the real settling window directly.

This also drops the warn-and-defer arm #4376 added for a far-future ts_last.
With the cross-axis gate gone there is no longer a failure mode to warn about. A
garbage ts_last no longer influences reconciliation timing at all - the order
reconciles once the real monotonic grace expires, same as any other.

The audit

The rest of the sweep is deciding, per live timer, whether it is a real-time
window on the wrong clock or a domain/deterministic one that belongs on
self.clock.

  • Cache-purge intervals - deliberately left on the clock-timer path.
    ExecutionEngine::start_purge_timers arms these on the injected Clock, and
    that is correct: a custom-clock caller (Add caller-supplied clock factory seam for live/sandbox nodes #4331) wants purges to ride their
    clock, so purge cadence and retention advance with the injected clock rather
    than with wall time. LiveNode also dispatches purge checks from its
    monotonic maintenance loop, but the two share the same domain-time cutoff and
    are idempotent, so the engine timer remains part of the intended behavior. A
    comment at the conversion seam and a From<LiveExecEngineConfig> test now pin
    the pass-through so this is not "tidied" into breakage later.
  • Data-engine cadences, the order emulator, core timing - left as-is. These
    are domain/deterministic, not real-time settling windows.
  • snapshot_positions_interval_secs - left as-is. No live monotonic
    replacement to move it to.

Tests

The missing-order test becomes a differential paused-time case: with a
far-future venue ts_last present the whole time, recent local activity defers,
and after the monotonic grace expires reconciliation proceeds - so a regression
back to the venue axis would fail it. The purge pass-through gets a conversion
assertion.

Stacked on #4386.

@folknor folknor force-pushed the sweep-realtime-clock-gates branch from 6fb1482 to 0e4d121 Compare July 5, 2026 08:21
@cjdsellers

Copy link
Copy Markdown
Member

Hi @folknor,

Thanks for the PR and the careful clock-axis cleanup. The direction still reads right to me, but I found one blocker before this lands.

handle_missing_order now makes order_local_activity the only grace gate. In the normal LiveNode path for ExecutionEvent::Order(OrderEventAny::Accepted(_)), though, the node records local activity and then immediately calls clear_recon_tracking, which removes that same activity mark. Several adapters emit accepts through this path, so a just-accepted order can still reach missing-order reconciliation with no activity stamp and be rejected when open_check_open_only=false and the venue report lags or omits it.

I think this should be fixed here rather than as a follow-up. Either clear first and then stamp local activity for non-terminal order events like Accepted / Updated, or split clear_recon_tracking so ack cleanup does not remove the local activity timestamp. It would also be good to add a regression test that follows the LiveNode ordering, rather than manually calling record_local_activity after applying Accepted.

Let me know if anything is unclear.

`handle_missing_order` had a recency gate comparing the order's venue
`ts_last` against `self.clock` now - a cross-axis compare that nautechsystems#4376
hardened against underflow but left on the trading clock. Under a
custom live/sandbox clock factory the trading clock is not wall-paced
(it can run accelerated or sit on a foreign epoch), so that window did
not measure the real settling time it was meant to, and a corrupt
far-future `ts_last` could stall the order's reconciliation.

Drop the venue-`ts_last` gate. The missing-order settling window is now
solely the monotonic `order_local_activity` recency gate (the
`RecencyMap` from the recency-map consolidation), which measures real
receipt-time elapsed at any clock speed. This also removes the
warn-and-defer arm nautechsystems#4376 added for a far-future `ts_last`: with the
cross-axis gate gone there is no longer a failure mode to warn about -
the order simply reconciles once the real grace expires.

Making local activity the sole gate exposed an ordering bug in the
`LiveNode` dispatch path: acknowledgement events (`Accepted` et al)
stamped local activity and then immediately wiped it via
`clear_recon_tracking`, so a just-accepted order omitted by a lagging
venue report could be falsely rejected as NOT_FOUND_AT_VENUE. The
per-order-event tracking now lives in
`ExecutionManager::observe_order_event`, which clears first and stamps
after - matching the ordering `observe_execution_report` already used -
and the node's batch accept/cancel arms are reordered the same way.

Audit of the remaining live timers for the same class: the cache-purge
intervals are deliberately left on the ExecutionEngine clock-timer path
so they stay controlled by the injected Clock for custom-clock callers;
a comment and a conversion test now pin that. Data-engine, order
emulator, and core timing stay domain/deterministic, and
`snapshot_positions_interval_secs` is left as-is (no live monotonic
replacement).

The missing-order test becomes a differential paused-time case: with a
far-future venue `ts_last` present throughout, recent local activity
defers, and after the monotonic grace expires reconciliation proceeds.
It tracks the accepted event through `observe_order_event` - the exact
`LiveNode` call - and a dedicated regression test covers the
just-accepted-order deferral end to end.

Coded by an LLM.
@folknor folknor force-pushed the sweep-realtime-clock-gates branch from 0e4d121 to 4e379e4 Compare July 5, 2026 12:03
@folknor

folknor commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator Author

I went with your second option, generalized slightly: the per-order-event tracking now lives in ExecutionManager::observe_order_event, which does the ack cleanup first and stamps local activity after, so the stamp survives clear_recon_tracking. This matches the clear-then-stamp ordering observe_execution_report already used, so the LiveNode event path is no longer the odd one out. The node's OrderAcceptedBatch / OrderCanceledBatch arms are reordered the same way.

On tests: the far-future ts_last test now tracks the accepted event through observe_order_event - the exact call the node dispatch makes - rather than manually stamping after applying Accepted, and there is a new dedicated regression test (test_check_open_orders_defers_for_just_accepted_order) covering the case you described: open_check_open_only=false, venue response omits the just-accepted order, first check defers, and reconciliation proceeds once the grace expires. Both fail against the previous ordering.

@cjdsellers cjdsellers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow-ups @folknor 👌

@cjdsellers cjdsellers merged commit 89becc7 into nautechsystems:develop Jul 5, 2026
25 checks passed
@folknor folknor deleted the sweep-realtime-clock-gates branch July 5, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants