Skip to content

Commit 0c8ed1e

Browse files
committed
Merge branch 'codex/fhevm-orchestration-parity-refactor' into claude/test-fhEVM-bun-cli-1Fsxq
2 parents e1734b9 + cdff7de commit 0c8ed1e

19 files changed

+6024
-1398
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ logs/
3434
!.env.sample
3535
# Allow shared env templates in subprojects
3636
!test-suite/e2e/.env.devnet
37+
!test-suite/fhevm/env/staging/.env.versions.example
3738
# If you have .env files specific to subprojects, e.g. subproject/.env,
3839
# the .env rule above will catch them.
3940

gateway-contracts/tasks/addHostChains.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,11 @@ task("task:addHostChainsToGatewayConfig")
4040
// Add host chains
4141
const gatewayConfig = await hre.ethers.getContractAt("GatewayConfig", proxyAddress, deployer);
4242
for (const hostChain of hostChains) {
43+
const isAlreadyRegistered = await gatewayConfig.isHostChainRegistered(hostChain.chainId);
44+
if (isAlreadyRegistered) {
45+
console.log(`Host chain ${hostChain.chainId} already registered, skipping.`);
46+
continue;
47+
}
4348
await gatewayConfig.addHostChain(hostChain);
4449
}
4550

gateway-contracts/tasks/addPausers.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import { getRequiredEnvVar, loadGatewayAddresses } from "./utils/loadVariables";
77
// for local testing. By default, we use the PAUSER_SET_ADDRESS env var, as done in deployment
88
task("task:addGatewayPausers")
99
.addParam("useInternalPauserSetAddress", "If internal PauserSet address should be used", false, types.boolean)
10-
.setAction(async function ({ useInternalGatewayConfigAddress }, hre) {
10+
.setAction(async function ({ useInternalPauserSetAddress }, hre) {
1111
await hre.run("compile:specific", { contract: "contracts/immutable" });
1212
console.log("Adding pausers to PauserSet contract");
1313

@@ -21,14 +21,19 @@ task("task:addGatewayPausers")
2121
pausers.push(getRequiredEnvVar(`PAUSER_ADDRESS_${idx}`));
2222
}
2323

24-
if (useInternalGatewayConfigAddress) {
24+
if (useInternalPauserSetAddress) {
2525
loadGatewayAddresses();
2626
}
2727
const pauserSetAddress = getRequiredEnvVar("PAUSER_SET_ADDRESS");
2828

2929
// Add pauser(s)
3030
const pauserSet = await hre.ethers.getContractAt("PauserSet", pauserSetAddress, deployer);
3131
for (const pauser of pausers) {
32+
const isAlreadyPauser = await pauserSet.isPauser(pauser);
33+
if (isAlreadyPauser) {
34+
console.log(`Pauser ${pauser} already registered, skipping.`);
35+
continue;
36+
}
3237
await pauserSet.addPauser(pauser);
3338
}
3439

test-suite/README.md

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,16 @@ KMS can be configured to two modes:
1919
- [Quickstart](#quickstart)
2020
- [Forcing Local Builds](#wip---forcing-local-builds---build)
2121
- [Local Developer Optimizations](#local-developer-optimizations)
22+
- [CLI Reference](#cli-reference)
23+
- [Telemetry Checks](#telemetry-checks)
2224
- [Resuming a Deployment](#resuming-a-deployment)
2325
- [Deploying a Single Step](#deploying-a-single-step)
26+
- [Orchestration Source of Truth](#orchestration-source-of-truth)
27+
- [Troubleshooting Deploy Failures](#troubleshooting-deploy-failures)
28+
- [Behavior Parity Tests](#behavior-parity-tests)
29+
- [CLI Parity Diff Tests](#cli-parity-diff-tests)
30+
- [Validation Protocol (PR / Manual)](#validation-protocol-pr--manual)
31+
- [Docker Project Isolation](#docker-project-isolation)
2432
- [Security Policy](#security-policy)
2533
- [Handling Sensitive Data](#handling-sensitive-data)
2634
- [Environment Files](#environment-files)
@@ -44,6 +52,12 @@ cd test-suite/fhevm
4452
# Deploy with local BuildKit cache (disables provenance attestations)
4553
./fhevm-cli deploy --local
4654

55+
# Deploy and fail if telemetry services are not visible in Jaeger
56+
./fhevm-cli deploy --build --telemetry-smoke
57+
58+
# Deploy with versions scraped from the public testnet matrix
59+
./fhevm-cli deploy --network testnet
60+
4761
# Deploy with threshold 2 out of 2 coprocessors (local multicoprocessor mode)
4862
./fhevm-cli deploy --coprocessors 2 --coprocessor-threshold 2
4963

@@ -72,8 +86,29 @@ cd test-suite/fhevm
7286

7387
# Clean up
7488
./fhevm-cli clean
89+
90+
# Hard purge for reproducible A/B runs
91+
./fhevm-cli clean --purge
92+
```
93+
94+
If you prefer shorter commands with Bun scripts, you can run the same CLI via:
95+
96+
```sh
97+
cd test-suite/fhevm
98+
bun run deploy --network testnet
99+
bun run test input-proof
100+
bun run telemetry-smoke
101+
bun run clean --purge-build-cache
75102
```
76103

104+
All `clean` purge flags are fhevm-scoped:
105+
- `--purge-images` removes images referenced by fhevm compose services.
106+
- `--purge-build-cache` removes the local Buildx cache directory (`.buildx-cache` by default, or `FHEVM_BUILDX_CACHE_DIR` if set).
107+
- `--purge-local-cache` is a compatibility alias for `--purge-build-cache`.
108+
- `--purge-networks` removes Docker networks labeled with the active compose project only.
109+
110+
For `deploy --coprocessors N` with `N > 1`, `cast` (Foundry) must be installed locally to derive per-coprocessor accounts from `MNEMONIC`.
111+
77112
### WIP - Forcing Local Builds (`--build`)
78113

79114
⚠️ **IMPORTANT: THIS FEATURE IS STILL A WORK IN PROGRESS!** ⚠️
@@ -110,12 +145,63 @@ For faster local iteration, use `--local` to enable a local BuildKit cache (stor
110145
./fhevm-cli deploy --local
111146
```
112147

148+
For code-path validation, prefer `--build --local` so your local changes are rebuilt while keeping warm cache layers.
149+
150+
To align local versions with currently deployed environments, you can ask deploy to scrape the public version dashboard:
151+
152+
```sh
153+
./fhevm-cli deploy --network testnet
154+
./fhevm-cli deploy --network mainnet
155+
```
156+
157+
Notes:
158+
- This is best-effort scraping from the public Grafana dashboard DOM.
159+
- It applies known service version env vars (coprocessor services, kms-connector services, `CORE_VERSION`) before deployment.
160+
- Contract/relayer versions continue to use local defaults unless explicitly overridden.
161+
- If your Chromium path is custom, set `FHEVM_GRAFANA_CHROMIUM_BIN=/path/to/chromium`.
162+
- For deterministic testing, set `FHEVM_GRAFANA_DASHBOARD_HTML_FILE=/path/to/dashboard.html`.
163+
113164
When running tests and you know your Hardhat artifacts are already up to date, you can skip compilation:
114165

115166
```sh
116167
./fhevm-cli test input-proof --no-hardhat-compile
117168
```
118169

170+
### CLI Reference
171+
172+
For agent workflows, prefer explicit command+flag forms from this table.
173+
174+
| Command | Flags | Notes |
175+
| --- | --- | --- |
176+
| `deploy` | `--build` | Build buildable services before `up -d`. |
177+
| `deploy` | `--local` / `--dev` | Enable local BuildKit cache (`.buildx-cache` by default). |
178+
| `deploy` | `--network testnet\|mainnet` | Apply version profile from public dashboard before deploy. |
179+
| `deploy` | `--coprocessors <n>` | Configure local coprocessor topology size (`n`, max `5`). |
180+
| `deploy` | `--coprocessor-threshold <t>` | Override topology threshold (`t <= n`). |
181+
| `deploy` | `--resume <step>` | Redeploy from a specific step onward. |
182+
| `deploy` | `--only <step>` | Redeploy only one step. |
183+
| `deploy` | `--telemetry-smoke` | Run Jaeger service smoke-check after deployment. |
184+
| `deploy` | `--strict-otel` | Fail if OTEL endpoint expects Jaeger and Jaeger is not running. |
185+
| `test` | `-n, --network <name>` | Test-runtime network selection (default: `staging`). |
186+
| `test` | `-g, --grep <pattern>` | Override test grep pattern. |
187+
| `test` | `-v, --verbose` | Verbose test output. |
188+
| `test` | `-r, --no-relayer` | Disable Rust relayer in tests. |
189+
| `test` | `--no-hardhat-compile` | Skip compile when artifacts are already up-to-date. |
190+
| `clean` | `--purge` | Shorthand for `--purge-images --purge-build-cache --purge-networks`. |
191+
| `clean` | `--purge-images` | Remove images referenced by fhevm compose services only. |
192+
| `clean` | `--purge-build-cache` | Remove local Buildx cache dir (`.buildx-cache` or `FHEVM_BUILDX_CACHE_DIR`). |
193+
| `clean` | `--purge-networks` | Remove Docker networks labeled with the active compose project only. |
194+
| `clean` | `--purge-local-cache` | Alias of `--purge-build-cache` (kept for compatibility). |
195+
| `pause` / `unpause` | `host` or `gateway` | Contract pause controls. |
196+
| `upgrade` | `<service>` | Restart selected service compose stack. |
197+
| `logs` | `<service>` | Stream container logs for one service. |
198+
| `telemetry-smoke` | _none_ | Validate required Jaeger services are present. |
199+
200+
Notes:
201+
- `--network` on `deploy` selects a **version profile** (`testnet`/`mainnet`).
202+
- `--network` on `test` selects a **test runtime network** (default `staging`).
203+
- They are intentionally different and command-scoped.
204+
119205
### Resuming a deployment
120206

121207
If a deploy fails mid-way, you can resume from a specific step without tearing down containers or regenerating `.env` files:
@@ -132,6 +218,9 @@ When resuming:
132218
- Services **before** the resume step are preserved (containers + volumes kept)
133219
- Services **from** the resume step onwards are torn down and redeployed
134220

221+
Multicoprocessor safety note:
222+
- If you change multicoprocessor topology (`--coprocessors` and/or `--coprocessor-threshold`) while using `--resume` from `coprocessor` or later, the CLI intentionally forces resume from `minio` to reset key material coherently across all coprocessors.
223+
135224
### Deploying a single step
136225

137226
To redeploy only a single service without touching others:
@@ -149,6 +238,203 @@ You can combine `--only` or `--resume` with other flags:
149238
./fhevm-cli deploy --only gateway-sc --build --local
150239
```
151240

241+
### Telemetry Checks
242+
243+
The coprocessor env now ensures `OTEL_EXPORTER_OTLP_ENDPOINT` is present.
244+
If it is missing, deploy defaults it to `http://jaeger:4317` in `.env.coprocessor.local`.
245+
246+
Use strict endpoint validation (requires Jaeger to be up first):
247+
248+
```sh
249+
./fhevm-cli deploy --strict-otel
250+
```
251+
252+
Run smoke validation on demand:
253+
254+
```sh
255+
./fhevm-cli telemetry-smoke
256+
```
257+
258+
`telemetry-smoke` retries for a short warm-up window before failing, to reduce false negatives while traces are still starting up.
259+
260+
Or include it in deploy:
261+
262+
```sh
263+
./fhevm-cli deploy --telemetry-smoke
264+
```
265+
266+
### Orchestration source of truth
267+
268+
The orchestration is Bun-first with shell entrypoint wrappers:
269+
270+
- `test-suite/fhevm/fhevm-cli`
271+
- `test-suite/fhevm/scripts/deploy-fhevm-stack.sh`
272+
273+
Canonical deploy/test metadata now lives in one TypeScript source:
274+
275+
- `test-suite/fhevm/scripts/bun/manifest.ts`
276+
277+
Runtime implementation:
278+
279+
- `test-suite/fhevm/scripts/bun/cli.ts`
280+
- `test-suite/fhevm/scripts/bun/process.ts`
281+
282+
Compatibility snapshots used for parity verification:
283+
284+
- `test-suite/fhevm/fhevm-cli.legacy`
285+
- `test-suite/fhevm/scripts/deploy-fhevm-stack.legacy.sh`
286+
287+
You can force legacy mode explicitly with:
288+
289+
```sh
290+
FHEVM_CLI_IMPL=legacy ./fhevm-cli deploy
291+
```
292+
293+
Version updates do not require editing many per-service vars manually.
294+
You can override them in one place:
295+
296+
- `FHEVM_STACK_VERSION` (gateway/host/coprocessor/kms-connector/test-suite)
297+
- `FHEVM_CORE_VERSION`
298+
- `FHEVM_RELAYER_VERSION` (relayer + relayer-migrate)
299+
300+
These can be set as environment variables, or in an optional file:
301+
302+
- `test-suite/fhevm/env/staging/.env.versions`
303+
- (template: `test-suite/fhevm/env/staging/.env.versions.example`)
304+
305+
Example:
306+
307+
```sh
308+
FHEVM_STACK_VERSION=v0.12.0-rc.1 \
309+
FHEVM_CORE_VERSION=v0.14.0-rc.1 \
310+
FHEVM_RELAYER_VERSION=v0.10.0-rc.1 \
311+
./fhevm-cli deploy
312+
```
313+
314+
### Troubleshooting deploy failures
315+
316+
When deploy fails, the script now surfaces explicit hints for common operational failure modes.
317+
318+
- OOM-killed critical service:
319+
- Symptom: failure includes `looks OOM-killed`.
320+
- Action: increase Docker memory and resume from the failed step, for example:
321+
- `./fhevm-cli deploy --resume coprocessor`
322+
323+
- Key bootstrap / CRS not ready:
324+
- Symptom: failure includes `Detected key-bootstrap-not-ready state`.
325+
- Action: wait for keygen/CRS generation to settle, then resume from gateway contracts:
326+
- `./fhevm-cli deploy --resume gateway-sc`
327+
328+
- Gateway helper image export conflict (`already exists`):
329+
- Symptom: build fails while starting gateway contracts.
330+
- Action: deploy now auto-retries once after removing conflicting `gateway-contracts` tags.
331+
- Manual fallback for repeated collisions:
332+
- `./fhevm-cli clean --purge-images --purge-build-cache`
333+
334+
### Behavior parity tests
335+
336+
A behavior-level shell test suite validates deploy orchestration outcomes (ordering, `--resume`, `--only`, build semantics, env patch timing, actionable failure hints, strict OTEL checks, purge flags, and telemetry smoke checks).
337+
338+
Run it with:
339+
340+
```sh
341+
./test-suite/fhevm/scripts/tests/deploy-fhevm-stack.behavior.sh
342+
```
343+
344+
### CLI parity diff tests
345+
346+
A dry-run parity harness executes legacy Bash and Bun CLI flows under the same mocked Docker environment, then diffs command traces and exit codes for sampled command cases.
347+
348+
Run it with:
349+
350+
```sh
351+
./test-suite/fhevm/scripts/tests/fhevm-cli-parity-diff.sh
352+
```
353+
354+
### Validation protocol (PR / manual)
355+
356+
Use this when you want high confidence that the CLI is safe and functionally correct.
357+
358+
#### 1) Quick confidence protocol (10-20 min)
359+
360+
```sh
361+
cd test-suite/fhevm
362+
363+
# Optional but strongly recommended: isolate this run from any other local docker tests.
364+
export FHEVM_DOCKER_PROJECT=fhevm-pr
365+
366+
# Fresh start
367+
./fhevm-cli clean --purge
368+
369+
# Bring up stack with testnet profile + telemetry gate
370+
./fhevm-cli deploy --resume core --network testnet --telemetry-smoke
371+
372+
# Fast deterministic suite
373+
./fhevm-cli test input-proof --no-hardhat-compile
374+
./fhevm-cli test user-decryption --no-hardhat-compile
375+
./fhevm-cli test erc20 --no-hardhat-compile
376+
377+
# Operational commands
378+
./fhevm-cli pause host
379+
./fhevm-cli unpause host
380+
./fhevm-cli pause gateway
381+
./fhevm-cli unpause gateway
382+
./fhevm-cli upgrade coprocessor
383+
384+
# Cleanup
385+
./fhevm-cli clean
386+
```
387+
388+
#### 2) Full protocol (same matrix used for end-to-end QA)
389+
390+
```sh
391+
cd /path/to/fhevm
392+
bash /private/tmp/fhevm_full_qa.sh
393+
```
394+
395+
Expected outcome:
396+
- No `[QA][FAIL]` in the generated summary log
397+
- Final line is `[QA][DONE] artifact_dir=...`
398+
399+
Inspect results:
400+
401+
```sh
402+
ART_DIR=$(cat /tmp/fhevm-full-qa-last-artifacts.txt)
403+
cat "$ART_DIR/summary.log"
404+
```
405+
406+
#### 3) Safety check for cleanup scope (no side effects)
407+
408+
Use sentinels to confirm clean/purge commands do not affect unrelated docker resources.
409+
410+
```sh
411+
docker run -d --name qa-clean-sentinel alpine:3.20 sleep 36000
412+
docker network create qa-clean-sentinel-net
413+
414+
cd test-suite/fhevm
415+
./fhevm-cli clean --purge
416+
417+
docker ps -a --filter name='^qa-clean-sentinel$' --format '{{.Names}}'
418+
docker network ls --filter name='^qa-clean-sentinel-net$' --format '{{.Name}}'
419+
420+
docker rm -f qa-clean-sentinel
421+
docker network rm qa-clean-sentinel-net
422+
```
423+
424+
Both sentinel checks must still return their names after clean/purge.
425+
426+
### Docker project isolation
427+
428+
To avoid collisions with other local Docker-based test workflows, set a dedicated compose project name:
429+
430+
```sh
431+
export FHEVM_DOCKER_PROJECT=fhevm-dev
432+
cd test-suite/fhevm
433+
./fhevm-cli deploy
434+
```
435+
436+
All deploy/clean/purge operations are then scoped to that project.
437+
152438
## Security policy
153439

154440
### Handling sensitive data

test-suite/fhevm/env/staging/.env.coprocessor

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ FHE_KEY_ID=421c8116661b2150a46badd3956564ad8d4981718d30fa66a36342bee1b13dbf
3030
# host_listener uses RPC_WS_URL (ws/wss), poller uses RPC_HTTP_URL (http/https)
3131
RPC_HTTP_URL=http://host-node:8545
3232
RPC_WS_URL=ws://host-node:8545
33+
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
3334
CHAIN_ID=12345
3435
ACL_CONTRACT_ADDRESS=0x05fD9B5EFE0a996095f42Ed7e77c390810CF660c
3536
FHEVM_EXECUTOR_CONTRACT_ADDRESS=0xcCAe95fF1d11656358E782570dF0418F59fA40e1

0 commit comments

Comments
 (0)