Skip to content

Commit 3d776f7

Browse files
committed
Add ciphertext drift e2e workflow
1 parent 56d267f commit 3d776f7

File tree

7 files changed

+280
-1
lines changed

7 files changed

+280
-1
lines changed

.github/workflows/test-suite-e2e-tests.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,11 @@ jobs:
233233
run: |
234234
./fhevm-cli test hcu-block-cap
235235
236+
- name: Ciphertext drift test
237+
working-directory: test-suite/fhevm
238+
run: |
239+
./fhevm-cli test ciphertext-drift
240+
236241
- name: Host listener poller test
237242
working-directory: test-suite/fhevm
238243
run: |

PLAN.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Ciphertext Drift E2E Implementation Plan
2+
3+
## Overview
4+
Add one focused end-to-end test path that proves the Rust `gw-listener` drift detector fires on an intentionally corrupted ciphertext digest in the existing e2e stack. Keep it compatible with the current `2-of-2` coprocessor workflow and reuse the existing Hardhat tests instead of building a new scenario from scratch.
5+
6+
## Goals
7+
- Enable the Rust drift detector in the existing e2e coprocessor stack.
8+
- Add one deterministic drift-injection workflow that fits the current `fhevm-cli test` flow.
9+
- Verify detector output via metrics, not brittle log matching.
10+
11+
## Non-Goals
12+
- No new general-purpose fault-injection framework.
13+
- No changes to the existing consensus watchdog behavior outside the dedicated drift test path.
14+
- No new CI profile or topology change beyond what the current workflow already deploys.
15+
16+
## Assumptions and Constraints
17+
- Current workflow deploys `2` coprocessors with threshold `2`.
18+
- In `2-of-2`, one corrupted coprocessor means no consensus; this test should assert drift detection, not successful consensus.
19+
- The existing JS `consensusWatchdog` would intentionally fail this scenario, so the dedicated drift test run must disable it.
20+
- Host-side scripts can use `docker exec` against the shared Postgres container.
21+
22+
## Requirements
23+
24+
### Functional
25+
- `coprocessor-gw-listener` in e2e must receive `ciphertext_commits` and `gateway_config` addresses.
26+
- A script must mutate one ready-but-unsent `ciphertext_digest` row on a chosen coprocessor DB.
27+
- A dedicated `fhevm-cli test` target must orchestrate the injection and run an existing Hardhat test that produces ciphertext work.
28+
- The test must fail if the Rust `gw-listener` drift metric does not increment.
29+
30+
### Non-Functional
31+
- Keep the change surface small and local to the existing e2e workflow.
32+
- Avoid adding new runtime services or test dependencies.
33+
- Keep the workflow deterministic enough for CI.
34+
35+
## Technical Design
36+
37+
### Data Model
38+
- No schema changes.
39+
- Reuse `ciphertext_digest(handle, ciphertext, ciphertext128, txn_is_sent, created_at)`.
40+
41+
### Architecture
42+
- Extend the e2e `coprocessor-gw-listener` command with:
43+
- `--ciphertext-commits-address`
44+
- `--gateway-config-address`
45+
- Add a host-side drift injector script that:
46+
- targets one coprocessor DB
47+
- waits for a new ready row with `txn_is_sent = false`
48+
- flips one byte in `ciphertext`
49+
- Add a host-side runner script that:
50+
- stops one transaction sender
51+
- starts the injector
52+
- runs one existing Hardhat grep with the JS watchdog disabled for that process
53+
- restarts the transaction sender
54+
- checks `gw-listener` metrics for `coprocessor_gw_listener_drift_detected_counter`
55+
56+
---
57+
58+
## Implementation Plan
59+
60+
### Serial Dependencies (Must Complete First)
61+
62+
#### Phase 0: E2E Wiring
63+
**Prerequisite for:** Drift injection and runner script
64+
65+
| Task | Description | Output |
66+
|------|-------------|--------|
67+
| 0.1 | Add detector address args to e2e `coprocessor-gw-listener` compose command | Drift detector enabled in e2e stack |
68+
| 0.2 | Add `GATEWAY_CONFIG_ADDRESS` to coprocessor env template used for multi-copro copies | Address available in all coprocessor env files |
69+
| 0.3 | Add a small plan file documenting the scoped implementation | `PLAN.md` |
70+
71+
---
72+
73+
### Parallel Workstreams
74+
75+
#### Workstream A: Drift Injection
76+
**Dependencies:** Phase 0
77+
**Can parallelize with:** Workstream B
78+
79+
| Task | Description | Output |
80+
|------|-------------|--------|
81+
| A.1 | Add a host-side script to wait for a new ready `ciphertext_digest` row in one coprocessor DB | `inject-coprocessor-drift.sh` |
82+
| A.2 | Mutate one byte in `ciphertext` for the selected handle and print the handle | Deterministic DB-level drift injection |
83+
84+
#### Workstream B: Test Orchestration
85+
**Dependencies:** Phase 0
86+
**Can parallelize with:** Workstream A
87+
88+
| Task | Description | Output |
89+
|------|-------------|--------|
90+
| B.1 | Add a host-side runner that pauses one transaction sender, launches the injector, runs one existing Hardhat test, and checks metrics | `run-ciphertext-drift-e2e.sh` |
91+
| B.2 | Add one `fhevm-cli test ciphertext-drift` entrypoint that calls the runner | Existing CLI can trigger the new path |
92+
93+
---
94+
95+
### Merge Phase
96+
97+
#### Phase 1: Validation
98+
**Dependencies:** Workstreams A, B
99+
100+
| Task | Description | Output |
101+
|------|-------------|--------|
102+
| 1.1 | Run targeted checks on the changed scripts and CLI wiring | Verified local changes |
103+
| 1.2 | Document how to run the new e2e path and what it proves | Clear operator/developer usage |
104+
105+
---
106+
107+
## Testing and Validation
108+
109+
- Verify `coprocessor-gw-listener` command in e2e compose includes both new addresses.
110+
- Verify the injector script can discover and mutate a new unsent row for a chosen DB.
111+
- Verify the runner script disables the JS consensus watchdog only for this one intentional drift run.
112+
- Verify the runner fails if `coprocessor_gw_listener_drift_detected_counter` does not increase.
113+
114+
## Rollout and Migration
115+
116+
- No migration.
117+
- Hard cutover for the new e2e path: once merged, `fhevm-cli test ciphertext-drift` becomes the supported drift test entrypoint.
118+
119+
## Verification Checklist
120+
121+
- `bash -n test-suite/fhevm/scripts/inject-coprocessor-drift.sh`
122+
- `bash -n test-suite/fhevm/scripts/run-ciphertext-drift-e2e.sh`
123+
- `rg -n "ciphertext-commits-address|gateway-config-address" test-suite/fhevm/docker-compose/coprocessor-docker-compose.yml`
124+
- `rg -n "ciphertext-drift" test-suite/fhevm/fhevm-cli`
125+
126+
## Risk Assessment
127+
128+
| Risk | Likelihood | Impact | Mitigation |
129+
|------|------------|--------|------------|
130+
| The injector races before a ready row exists | Medium | Medium | Poll for a new ready row and gate on `txn_is_sent = false` |
131+
| The existing JS watchdog fails the intentional drift run | High | High | Disable it only for the dedicated drift test command |
132+
| Metric polling hits the wrong listener | Low | Medium | Check a specific gw-listener container and assert the exact counter name |
133+
134+
## Open Questions
135+
136+
- [ ] Whether we want a second scenario later for `3-of-5` where consensus still succeeds with one bad coprocessor.
137+
138+
## Decision Log
139+
140+
| Decision | Rationale | Alternatives Considered |
141+
|----------|-----------|------------------------|
142+
| Keep the first e2e on `2-of-2` | Matches current workflow with the least change | Adding a new CI topology now |
143+
| Use DB-level digest mutation | Smallest realistic fault injection point | Patching workers or faking on-chain events |
144+
| Assert via Rust metrics | More stable than log text matching | Parsing logs or relying only on the JS watchdog |

test-suite/fhevm/docker-compose/coprocessor-docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ services:
111111
- --verify-proof-req-database-channel=event_zkpok_new_work
112112
- --gw-url=${GATEWAY_WS_URL}
113113
- --input-verification-address=${INPUT_VERIFICATION_ADDRESS}
114+
- --ciphertext-commits-address=${CIPHERTEXT_COMMITS_ADDRESS}
115+
- --gateway-config-address=${GATEWAY_CONFIG_ADDRESS}
114116
- --kms-generation-address=${KMS_GENERATION_ADDRESS}
115117
- --error-sleep-initial-secs=1
116118
- --error-sleep-max-secs=10

test-suite/fhevm/env/staging/.env.coprocessor

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,5 @@ GATEWAY_WS_URL=ws://gateway-node:8546
4343
TX_SENDER_PRIVATE_KEY=0x8f82b3f482c19a95ac29c82cf048c076ed0de2530c64a73f2d2d7d1e64b5cc6e
4444
INPUT_VERIFICATION_ADDRESS=0x3b12Fc766Eb598b285998877e8E90F3e43a1F8d2
4545
CIPHERTEXT_COMMITS_ADDRESS=0xeAC2EfFA07844aB326D92d1De29E136a6793DFFA
46+
GATEWAY_CONFIG_ADDRESS=0x576Ea67208b146E63C5255d0f90104E25e3e04c7
4647
KMS_GENERATION_ADDRESS=0x35760912360E875DA50D40a74305575c23D55783

test-suite/fhevm/fhevm-cli

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ function usage {
8080
echo -e " ${YELLOW}deploy${RESET} ${CYAN}[--build] [--local] [--coprocessors N] [--coprocessor-threshold T]${RESET} Deploy fhevm stack"
8181
echo -e " ${YELLOW}pause${RESET} ${CYAN}[CONTRACTS]${RESET} Pause specific contracts (host|gateway)"
8282
echo -e " ${YELLOW}unpause${RESET} ${CYAN}[CONTRACTS]${RESET} Unpause specific contracts (host|gateway)"
83-
echo -e " ${YELLOW}test${RESET} ${CYAN}[TYPE]${RESET} Run tests (input-proof|user-decryption|public-decryption|delegated-user-decryption|random|random-subset|operators|erc20|hcu-block-cap|debug)"
83+
echo -e " ${YELLOW}test${RESET} ${CYAN}[TYPE]${RESET} Run tests (input-proof|ciphertext-drift|user-decryption|public-decryption|delegated-user-decryption|random|random-subset|operators|erc20|hcu-block-cap|debug)"
8484
echo -e " ${YELLOW}smoke${RESET} ${CYAN}[PROFILE]${RESET} Run multicoproc smoke profile (multi-2-2|multi-3-5)"
8585
echo -e " ${YELLOW}upgrade${RESET} ${CYAN}[SERVICE]${RESET} Upgrade specific service (host|gateway|connector|coprocessor|relayer|test-suite)"
8686
echo -e " ${YELLOW}clean${RESET} Remove all containers and volumes"
@@ -102,6 +102,7 @@ function usage {
102102
echo -e " ${PURPLE}./fhevm-cli smoke multi-2-2${RESET}"
103103
echo -e " ${PURPLE}./fhevm-cli smoke multi-3-5${RESET}"
104104
echo -e " ${PURPLE}./fhevm-cli test input-proof${RESET}"
105+
echo -e " ${PURPLE}./fhevm-cli test ciphertext-drift${RESET}"
105106
echo -e " ${PURPLE}./fhevm-cli test input-proof --no-hardhat-compile${RESET}"
106107
echo -e " ${PURPLE}./fhevm-cli test user-decryption ${RESET}"
107108
echo -e " ${PURPLE}./fhevm-cli test public-decrypt-http-ebool ${RESET}"
@@ -350,6 +351,10 @@ case $COMMAND in
350351
log_message="${LIGHT_BLUE}${BOLD}[TEST] INPUT PROOF (uint64)${RESET}"
351352
docker_args+=("-g" "test add 42 to uint64 input and decrypt")
352353
;;
354+
ciphertext-drift)
355+
echo -e "${LIGHT_BLUE}${BOLD}[TEST] CIPHERTEXT DRIFT${RESET}"
356+
exec "${SCRIPT_DIR}/scripts/run-ciphertext-drift-e2e.sh"
357+
;;
353358
user-decryption)
354359
log_message="${LIGHT_BLUE}${BOLD}[TEST] USER DECRYPTION${RESET}"
355360
docker_args+=("-g" "test user decrypt")
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
INSTANCE_INDEX="${1:-1}"
5+
TIMEOUT_SECONDS="${DRIFT_INJECT_TIMEOUT_SECONDS:-180}"
6+
POLL_INTERVAL_SECONDS="${DRIFT_INJECT_POLL_INTERVAL_SECONDS:-2}"
7+
POSTGRES_CONTAINER="${POSTGRES_CONTAINER:-coprocessor-and-kms-db}"
8+
POSTGRES_USER="${POSTGRES_USER:-postgres}"
9+
POSTGRES_PASSWORD="${POSTGRES_PASSWORD:-postgres}"
10+
11+
if ! [[ "$INSTANCE_INDEX" =~ ^[0-9]+$ ]]; then
12+
echo "instance index must be a non-negative integer" >&2
13+
exit 1
14+
fi
15+
16+
db_name="coprocessor"
17+
if [ "$INSTANCE_INDEX" -gt 0 ]; then
18+
db_name="coprocessor_${INSTANCE_INDEX}"
19+
fi
20+
21+
psql_query() {
22+
docker exec -e PGPASSWORD="$POSTGRES_PASSWORD" "$POSTGRES_CONTAINER" \
23+
psql -U "$POSTGRES_USER" -d "$db_name" -t -A -c "$1"
24+
}
25+
26+
baseline_handles="$(psql_query "SELECT encode(handle, 'hex') FROM ciphertext_digest;" || true)"
27+
deadline=$((SECONDS + TIMEOUT_SECONDS))
28+
29+
while [ "$SECONDS" -lt "$deadline" ]; do
30+
ready_handles="$(psql_query "SELECT encode(handle, 'hex') FROM ciphertext_digest WHERE txn_is_sent = false AND ciphertext IS NOT NULL AND ciphertext128 IS NOT NULL ORDER BY created_at DESC;")"
31+
while IFS= read -r handle_hex; do
32+
[ -z "$handle_hex" ] && continue
33+
if printf '%s\n' "$baseline_handles" | grep -Fxq "$handle_hex"; then
34+
continue
35+
fi
36+
37+
psql_query "UPDATE ciphertext_digest SET ciphertext = set_byte(ciphertext, 0, get_byte(ciphertext, 0) # 1) WHERE handle = decode('${handle_hex}', 'hex') AND txn_is_sent = false;"
38+
echo "$handle_hex"
39+
exit 0
40+
done <<< "$ready_handles"
41+
42+
sleep "$POLL_INTERVAL_SECONDS"
43+
done
44+
45+
echo "timed out waiting for a new ready ciphertext_digest row in ${db_name}" >&2
46+
exit 1
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
5+
FAULTY_INSTANCE_INDEX="${FAULTY_INSTANCE_INDEX:-1}"
6+
FAULTY_TX_SENDER_CONTAINER="${FAULTY_TX_SENDER_CONTAINER:-coprocessor${FAULTY_INSTANCE_INDEX}-transaction-sender}"
7+
TEST_CONTAINER="${TEST_CONTAINER:-fhevm-test-suite-e2e-debug}"
8+
GREP_PATTERN="${GREP_PATTERN:-test user input uint64 \\(non-trivial\\)}"
9+
METRIC_NAME="coprocessor_gw_listener_drift_detected_counter"
10+
METRIC_TIMEOUT_SECONDS="${DRIFT_METRIC_TIMEOUT_SECONDS:-180}"
11+
METRIC_POLL_INTERVAL_SECONDS="${DRIFT_METRIC_POLL_INTERVAL_SECONDS:-2}"
12+
HANDLE_FILE="$(mktemp)"
13+
injector_pid=""
14+
15+
metric_total() {
16+
local total=0
17+
local container
18+
while IFS= read -r container; do
19+
[ -z "$container" ] && continue
20+
local value
21+
value="$(docker exec "$container" curl -fsS http://127.0.0.1:9100/metrics 2>/dev/null | awk -v metric="$METRIC_NAME" '$1 == metric {sum += $2} END {print sum + 0}')"
22+
total=$((total + value))
23+
done < <(docker ps --format '{{.Names}}' | grep -E '^coprocessor([0-9]+)?-gw-listener$' || true)
24+
echo "$total"
25+
}
26+
27+
wait_for_metric_increment() {
28+
local baseline="$1"
29+
local deadline=$((SECONDS + METRIC_TIMEOUT_SECONDS))
30+
while [ "$SECONDS" -lt "$deadline" ]; do
31+
local current
32+
current="$(metric_total)"
33+
if [ "$current" -gt "$baseline" ]; then
34+
echo "$current"
35+
return 0
36+
fi
37+
sleep "$METRIC_POLL_INTERVAL_SECONDS"
38+
done
39+
return 1
40+
}
41+
42+
cleanup() {
43+
if [ -n "$injector_pid" ] && kill -0 "$injector_pid" >/dev/null 2>&1; then
44+
kill "$injector_pid" >/dev/null 2>&1 || true
45+
fi
46+
docker start "$FAULTY_TX_SENDER_CONTAINER" >/dev/null 2>&1 || true
47+
rm -f "$HANDLE_FILE"
48+
}
49+
50+
trap cleanup EXIT
51+
52+
baseline_metric="$(metric_total)"
53+
docker stop "$FAULTY_TX_SENDER_CONTAINER" >/dev/null
54+
55+
"${SCRIPT_DIR}/inject-coprocessor-drift.sh" "$FAULTY_INSTANCE_INDEX" > "$HANDLE_FILE" &
56+
injector_pid=$!
57+
58+
docker exec \
59+
-e GATEWAY_RPC_URL= \
60+
-e CIPHERTEXT_COMMITS_ADDRESS= \
61+
-e INPUT_VERIFICATION_ADDRESS= \
62+
"$TEST_CONTAINER" \
63+
./run-tests.sh -n staging -g "$GREP_PATTERN"
64+
65+
wait "$injector_pid"
66+
injector_pid=""
67+
handle_hex="$(cat "$HANDLE_FILE")"
68+
69+
docker start "$FAULTY_TX_SENDER_CONTAINER" >/dev/null
70+
71+
if ! updated_metric="$(wait_for_metric_increment "$baseline_metric")"; then
72+
echo "drift metric did not increase after injecting handle ${handle_hex}" >&2
73+
exit 1
74+
fi
75+
76+
echo "drift detected for handle ${handle_hex} (${METRIC_NAME}: ${baseline_metric} -> ${updated_metric})"

0 commit comments

Comments
 (0)