Skip to content

Commit 4f05b54

Browse files
committed
docs(host-listener): trim slow-lane runbook and local notes
1 parent 45adc63 commit 4f05b54

File tree

1 file changed

+8
-40
lines changed
  • coprocessor/fhevm-engine/host-listener

1 file changed

+8
-40
lines changed

coprocessor/fhevm-engine/host-listener/README.md

Lines changed: 8 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -53,36 +53,18 @@ and backlog slope.
5353

5454
### Testnet incident runbook (slow lane)
5555

56-
Use this when testnet decryptions stall and you suspect slow-lane side effects.
56+
Use only when slow lane is likely the cause (do not disable blindly):
57+
- `rate(host_listener_slow_lane_marked_chains_total[5m])` sustained high,
58+
- completion throughput flat/low,
59+
- tfhe-worker shows repeated no-progress/fallback,
60+
- no DB/RPC/host-listener outage explains the stall.
5761

58-
1) Diagnose before action (do not disable blindly)
59-
60-
- Slow lane is expected to throttle heavy dependent traffic; disabling it during an active DoS can make things worse.
61-
- Check all three gates in the same 10–15 min window:
62-
- `rate(host_listener_slow_lane_marked_chains_total[5m]) > 0` and sustained,
63-
- TFHE completion stays flat/low,
64-
- tfhe-worker logs show repeated no-progress/fallback,
65-
- and there is no DB/RPC/host-listener outage explaining the stall.
66-
67-
2) If all gates hold, disable slow lane in Argo
68-
69-
- Set `--dependent-ops-max-per-chain=0` on **all** host-listener types together:
70-
- `main`,
71-
- `poller`,
72-
- `catchup`.
73-
- Roll out all three together.
74-
75-
3) Reassess after rollout
76-
77-
- Keep following COP-RB01 health/catchup/capacity checks.
78-
- If recovery is unclear, keep standard incident flow and investigate other root causes.
79-
80-
Operational note: off mode promotes chains to fast at startup (advisory-lock serialized, batched updates).
62+
If all gates hold, set `--dependent-ops-max-per-chain=0` in Argo for all host-listener types (`main`, `poller`, `catchup`) and roll out together.
63+
Then continue COP-RB01 checks and reassess recovery.
8164

8265
### Local stack notes
8366

84-
Minimal deterministic checks:
85-
67+
Quick local validation:
8668
```bash
8769
cd coprocessor/fhevm-engine
8870
cargo test -p host-listener --test host_listener_integration_tests \
@@ -91,20 +73,6 @@ cargo test -p host-listener --test host_listener_integration_tests \
9173
test_slow_lane_off_mode_promotes_all_chains_on_startup_locally -- --nocapture
9274
```
9375

94-
Before any stack-level slow-lane validation, ensure key bootstrap is healthy:
95-
96-
```bash
97-
docker logs --since=20m coprocessor-gw-listener | rg -n 'ActivateKey event successful'
98-
docker logs --since=20m coprocessor-sns-worker | rg -n 'Fetched keyset|No keys available'
99-
docker exec -i coprocessor-and-kms-db psql -U postgres -d coprocessor -c "
100-
SELECT tenant_id, COALESCE(SUM(octet_length(lo.data)), 0) AS sns_pk_bytes
101-
FROM tenants t
102-
LEFT JOIN pg_largeobject lo ON lo.loid = t.sns_pk
103-
GROUP BY tenant_id;"
104-
```
105-
106-
If bootstrap is not complete, restart only `coprocessor-gw-listener` and re-check.
107-
10876
## Events in FHEVM
10977

11078
### Blockchain Events

0 commit comments

Comments
 (0)