@@ -53,36 +53,18 @@ and backlog slope.
5353
5454### Testnet incident runbook (slow lane)
5555
56- Use this when testnet decryptions stall and you suspect slow-lane side effects.
56+ Use only when slow lane is likely the cause (do not disable blindly):
57+ - ` rate(host_listener_slow_lane_marked_chains_total[5m]) ` sustained high,
58+ - completion throughput flat/low,
59+ - tfhe-worker shows repeated no-progress/fallback,
60+ - no DB/RPC/host-listener outage explains the stall.
5761
58- 1 ) Diagnose before action (do not disable blindly)
59-
60- - Slow lane is expected to throttle heavy dependent traffic; disabling it during an active DoS can make things worse.
61- - Check all three gates in the same 10–15 min window:
62- - ` rate(host_listener_slow_lane_marked_chains_total[5m]) > 0 ` and sustained,
63- - TFHE completion stays flat/low,
64- - tfhe-worker logs show repeated no-progress/fallback,
65- - and there is no DB/RPC/host-listener outage explaining the stall.
66-
67- 2 ) If all gates hold, disable slow lane in Argo
68-
69- - Set ` --dependent-ops-max-per-chain=0 ` on ** all** host-listener types together:
70- - ` main ` ,
71- - ` poller ` ,
72- - ` catchup ` .
73- - Roll out all three together.
74-
75- 3 ) Reassess after rollout
76-
77- - Keep following COP-RB01 health/catchup/capacity checks.
78- - If recovery is unclear, keep standard incident flow and investigate other root causes.
79-
80- Operational note: off mode promotes chains to fast at startup (advisory-lock serialized, batched updates).
62+ If all gates hold, set ` --dependent-ops-max-per-chain=0 ` in Argo for all host-listener types (` main ` , ` poller ` , ` catchup ` ) and roll out together.
63+ Then continue COP-RB01 checks and reassess recovery.
8164
8265### Local stack notes
8366
84- Minimal deterministic checks:
85-
67+ Quick local validation:
8668``` bash
8769cd coprocessor/fhevm-engine
8870cargo test -p host-listener --test host_listener_integration_tests \
@@ -91,20 +73,6 @@ cargo test -p host-listener --test host_listener_integration_tests \
9173 test_slow_lane_off_mode_promotes_all_chains_on_startup_locally -- --nocapture
9274```
9375
94- Before any stack-level slow-lane validation, ensure key bootstrap is healthy:
95-
96- ``` bash
97- docker logs --since=20m coprocessor-gw-listener | rg -n ' ActivateKey event successful'
98- docker logs --since=20m coprocessor-sns-worker | rg -n ' Fetched keyset|No keys available'
99- docker exec -i coprocessor-and-kms-db psql -U postgres -d coprocessor -c "
100- SELECT tenant_id, COALESCE(SUM(octet_length(lo.data)), 0) AS sns_pk_bytes
101- FROM tenants t
102- LEFT JOIN pg_largeobject lo ON lo.loid = t.sns_pk
103- GROUP BY tenant_id;"
104- ```
105-
106- If bootstrap is not complete, restart only ` coprocessor-gw-listener ` and re-check.
107-
10876## Events in FHEVM
10977
11078### Blockchain Events
0 commit comments