|
| 1 | +# Ciphertext Drift E2E Implementation Plan |
| 2 | + |
| 3 | +## Overview |
| 4 | +Add one focused end-to-end test path that proves the Rust `gw-listener` drift detector fires on an intentionally corrupted ciphertext digest in the existing e2e stack. Keep it compatible with the current `2-of-2` coprocessor workflow and reuse the existing Hardhat tests instead of building a new scenario from scratch. |
| 5 | + |
| 6 | +## Goals |
| 7 | +- Enable the Rust drift detector in the existing e2e coprocessor stack. |
| 8 | +- Add one deterministic drift-injection workflow that fits the current `fhevm-cli test` flow. |
| 9 | +- Verify detector output via metrics, not brittle log matching. |
| 10 | + |
| 11 | +## Non-Goals |
| 12 | +- No new general-purpose fault-injection framework. |
| 13 | +- No changes to the existing consensus watchdog behavior outside the dedicated drift test path. |
| 14 | +- No new CI profile or topology change beyond what the current workflow already deploys. |
| 15 | + |
| 16 | +## Assumptions and Constraints |
| 17 | +- Current workflow deploys `2` coprocessors with threshold `2`. |
| 18 | +- In `2-of-2`, one corrupted coprocessor means no consensus; this test should assert drift detection, not successful consensus. |
| 19 | +- The existing JS `consensusWatchdog` would intentionally fail this scenario, so the dedicated drift test run must disable it. |
| 20 | +- Host-side scripts can use `docker exec` against the shared Postgres container. |
| 21 | + |
| 22 | +## Requirements |
| 23 | + |
| 24 | +### Functional |
| 25 | +- `coprocessor-gw-listener` in e2e must receive `ciphertext_commits` and `gateway_config` addresses. |
| 26 | +- A script must mutate one ready-but-unsent `ciphertext_digest` row on a chosen coprocessor DB. |
| 27 | +- A dedicated `fhevm-cli test` target must orchestrate the injection and run an existing Hardhat test that produces ciphertext work. |
| 28 | +- The test must fail if the Rust `gw-listener` drift metric does not increment. |
| 29 | + |
| 30 | +### Non-Functional |
| 31 | +- Keep the change surface small and local to the existing e2e workflow. |
| 32 | +- Avoid adding new runtime services or test dependencies. |
| 33 | +- Keep the workflow deterministic enough for CI. |
| 34 | + |
| 35 | +## Technical Design |
| 36 | + |
| 37 | +### Data Model |
| 38 | +- No schema changes. |
| 39 | +- Reuse `ciphertext_digest(handle, ciphertext, ciphertext128, txn_is_sent, created_at)`. |
| 40 | + |
| 41 | +### Architecture |
| 42 | +- Extend the e2e `coprocessor-gw-listener` command with: |
| 43 | + - `--ciphertext-commits-address` |
| 44 | + - `--gateway-config-address` |
| 45 | +- Add a host-side drift injector script that: |
| 46 | + - targets one coprocessor DB |
| 47 | + - waits for a new ready row with `txn_is_sent = false` |
| 48 | + - flips one byte in `ciphertext` |
| 49 | +- Add a host-side runner script that: |
| 50 | + - stops one transaction sender |
| 51 | + - starts the injector |
| 52 | + - runs one existing Hardhat grep with the JS watchdog disabled for that process |
| 53 | + - restarts the transaction sender |
| 54 | + - checks `gw-listener` metrics for `coprocessor_gw_listener_drift_detected_counter` |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Implementation Plan |
| 59 | + |
| 60 | +### Serial Dependencies (Must Complete First) |
| 61 | + |
| 62 | +#### Phase 0: E2E Wiring |
| 63 | +**Prerequisite for:** Drift injection and runner script |
| 64 | + |
| 65 | +| Task | Description | Output | |
| 66 | +|------|-------------|--------| |
| 67 | +| 0.1 | Add detector address args to e2e `coprocessor-gw-listener` compose command | Drift detector enabled in e2e stack | |
| 68 | +| 0.2 | Add `GATEWAY_CONFIG_ADDRESS` to coprocessor env template used for multi-copro copies | Address available in all coprocessor env files | |
| 69 | +| 0.3 | Add a small plan file documenting the scoped implementation | `PLAN.md` | |
| 70 | + |
| 71 | +--- |
| 72 | + |
| 73 | +### Parallel Workstreams |
| 74 | + |
| 75 | +#### Workstream A: Drift Injection |
| 76 | +**Dependencies:** Phase 0 |
| 77 | +**Can parallelize with:** Workstream B |
| 78 | + |
| 79 | +| Task | Description | Output | |
| 80 | +|------|-------------|--------| |
| 81 | +| A.1 | Add a host-side script to wait for a new ready `ciphertext_digest` row in one coprocessor DB | `inject-coprocessor-drift.sh` | |
| 82 | +| A.2 | Mutate one byte in `ciphertext` for the selected handle and print the handle | Deterministic DB-level drift injection | |
| 83 | + |
| 84 | +#### Workstream B: Test Orchestration |
| 85 | +**Dependencies:** Phase 0 |
| 86 | +**Can parallelize with:** Workstream A |
| 87 | + |
| 88 | +| Task | Description | Output | |
| 89 | +|------|-------------|--------| |
| 90 | +| B.1 | Add a host-side runner that pauses one transaction sender, launches the injector, runs one existing Hardhat test, and checks metrics | `run-ciphertext-drift-e2e.sh` | |
| 91 | +| B.2 | Add one `fhevm-cli test ciphertext-drift` entrypoint that calls the runner | Existing CLI can trigger the new path | |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +### Merge Phase |
| 96 | + |
| 97 | +#### Phase 1: Validation |
| 98 | +**Dependencies:** Workstreams A, B |
| 99 | + |
| 100 | +| Task | Description | Output | |
| 101 | +|------|-------------|--------| |
| 102 | +| 1.1 | Run targeted checks on the changed scripts and CLI wiring | Verified local changes | |
| 103 | +| 1.2 | Document how to run the new e2e path and what it proves | Clear operator/developer usage | |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +## Testing and Validation |
| 108 | + |
| 109 | +- Verify `coprocessor-gw-listener` command in e2e compose includes both new addresses. |
| 110 | +- Verify the injector script can discover and mutate a new unsent row for a chosen DB. |
| 111 | +- Verify the runner script disables the JS consensus watchdog only for this one intentional drift run. |
| 112 | +- Verify the runner fails if `coprocessor_gw_listener_drift_detected_counter` does not increase. |
| 113 | + |
| 114 | +## Rollout and Migration |
| 115 | + |
| 116 | +- No migration. |
| 117 | +- Hard cutover for the new e2e path: once merged, `fhevm-cli test ciphertext-drift` becomes the supported drift test entrypoint. |
| 118 | + |
| 119 | +## Verification Checklist |
| 120 | + |
| 121 | +- `bash -n test-suite/fhevm/scripts/inject-coprocessor-drift.sh` |
| 122 | +- `bash -n test-suite/fhevm/scripts/run-ciphertext-drift-e2e.sh` |
| 123 | +- `rg -n "ciphertext-commits-address|gateway-config-address" test-suite/fhevm/docker-compose/coprocessor-docker-compose.yml` |
| 124 | +- `rg -n "ciphertext-drift" test-suite/fhevm/fhevm-cli` |
| 125 | + |
| 126 | +## Risk Assessment |
| 127 | + |
| 128 | +| Risk | Likelihood | Impact | Mitigation | |
| 129 | +|------|------------|--------|------------| |
| 130 | +| The injector races before a ready row exists | Medium | Medium | Poll for a new ready row and gate on `txn_is_sent = false` | |
| 131 | +| The existing JS watchdog fails the intentional drift run | High | High | Disable it only for the dedicated drift test command | |
| 132 | +| Metric polling hits the wrong listener | Low | Medium | Check a specific gw-listener container and assert the exact counter name | |
| 133 | + |
| 134 | +## Open Questions |
| 135 | + |
| 136 | +- [ ] Whether we want a second scenario later for `3-of-5` where consensus still succeeds with one bad coprocessor. |
| 137 | + |
| 138 | +## Decision Log |
| 139 | + |
| 140 | +| Decision | Rationale | Alternatives Considered | |
| 141 | +|----------|-----------|------------------------| |
| 142 | +| Keep the first e2e on `2-of-2` | Matches current workflow with the least change | Adding a new CI topology now | |
| 143 | +| Use DB-level digest mutation | Smallest realistic fault injection point | Patching workers or faking on-chain events | |
| 144 | +| Assert via Rust metrics | More stable than log text matching | Parsing logs or relying only on the JS watchdog | |
0 commit comments