Skip to content

feat(coprocessor): drift detection in gw-listener (M1)#2096

Open
Eikix wants to merge 17 commits intomainfrom
coproc-drift-heal
Open

feat(coprocessor): drift detection in gw-listener (M1)#2096
Eikix wants to merge 17 commits intomainfrom
coproc-drift-heal

Conversation

@Eikix
Copy link
Contributor

@Eikix Eikix commented Mar 11, 2026

Summary

  • Add opt-in drift detection to gw-listener by polling CiphertextCommits events
  • Fetch the expected coprocessor tx-sender set from GatewayConfig when drift detection is enabled
  • Track per-handle submissions in memory until the handle is complete or stale
  • Persist the earliest open CiphertextCommits block and rebuild open-handle state from chain logs after restart
  • Detect multiple digest variants for a handle and log the sender breakdown
  • Compare consensus digests against the local ciphertext_digest row when local digests already exist
  • Alert if all expected coprocessors submitted but no consensus event was observed
  • Alert if consensus was reached but some expected coprocessors never submitted within a grace window
  • Metrics: drift_detected, consensus_timeout, missing_submission

Scope Notes

  • Restart recovery persists only one watermark: the earliest open CiphertextCommits block
  • Open-handle state is rebuilt from chain logs on startup; it is not stored in Postgres as structured detector state
  • Consensus mismatch remains stateless because it compares consensus against the local DB row
  • This PR does not pause processing, fetch canonical ciphertexts, or wipe/recompute state

Test plan

  • SQLX_OFFLINE=true cargo build -p gw-listener
  • SQLX_OFFLINE=true cargo test -p gw-listener
  • Manual: run gw-listener with --ciphertext_commits_address and --gateway-config-address
  • Manual: restart gw-listener while a handle is still open and verify tracking resumes from the persisted watermark
  • Manual: verify /metrics exposes drift_detected, consensus_timeout, and missing_submission

closes https://github.com/zama-ai/fhevm-internal/issues/1147

@cla-bot cla-bot bot added the cla-signed label Mar 11, 2026
@Eikix Eikix force-pushed the coproc-drift-heal branch from 637dca2 to a76cffe Compare March 11, 2026 13:18
@Eikix Eikix marked this pull request as ready for review March 11, 2026 18:04
@Eikix Eikix requested review from a team as code owners March 11, 2026 18:04
Eikix added 14 commits March 11, 2026 19:06
Compare local ciphertext digests against on-chain consensus from the
CiphertextCommits contract. Enabled via --ciphertext-commits-address.

Adds early-warning logging when peer submissions diverge and structured
warn logs when local digest mismatches consensus. Four new Prometheus
counters provide observability. Detection-only — no recovery action.
…ft checks

- P1: Propagate DB errors from handle_consensus instead of swallowing
  them, so the block is retried via the outer backoff loop.
- P2: Queue unresolved consensus events (local digest not yet computed)
  in a bounded pending queue (10k cap) and retry each block tick, so
  late-arriving local digests are still checked.
- Refactor comparison logic into try_resolve_consensus shared by both
  the initial check and the retry path.
@Eikix Eikix force-pushed the coproc-drift-heal branch from 515f6a5 to 56d267f Compare March 11, 2026 18:07
@zama-ai zama-ai deleted a comment from claude bot Mar 11, 2026
@mergify
Copy link

mergify bot commented Mar 11, 2026

🧪 CI Insights

Here's what we observed from your CI run for d114dac.

🟢 All jobs passed!

But CI Insights is watching 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant