@@ -19,8 +19,16 @@ KMS can be configured to two modes:
1919 - [ Quickstart] ( #quickstart )
2020 - [ Forcing Local Builds] ( #wip---forcing-local-builds---build )
2121 - [ Local Developer Optimizations] ( #local-developer-optimizations )
22+ - [ CLI Reference] ( #cli-reference )
23+ - [ Telemetry Checks] ( #telemetry-checks )
2224 - [ Resuming a Deployment] ( #resuming-a-deployment )
2325 - [ Deploying a Single Step] ( #deploying-a-single-step )
26+ - [ Orchestration Source of Truth] ( #orchestration-source-of-truth )
27+ - [ Troubleshooting Deploy Failures] ( #troubleshooting-deploy-failures )
28+ - [ Behavior Parity Tests] ( #behavior-parity-tests )
29+ - [ CLI Parity Diff Tests] ( #cli-parity-diff-tests )
30+ - [ Validation Protocol (PR / Manual)] ( #validation-protocol-pr--manual )
31+ - [ Docker Project Isolation] ( #docker-project-isolation )
2432- [ Security Policy] ( #security-policy )
2533 - [ Handling Sensitive Data] ( #handling-sensitive-data )
2634 - [ Environment Files] ( #environment-files )
@@ -44,6 +52,12 @@ cd test-suite/fhevm
4452# Deploy with local BuildKit cache (disables provenance attestations)
4553./fhevm-cli deploy --local
4654
55+ # Deploy and fail if telemetry services are not visible in Jaeger
56+ ./fhevm-cli deploy --build --telemetry-smoke
57+
58+ # Deploy with versions scraped from the public testnet matrix
59+ ./fhevm-cli deploy --network testnet
60+
4761# Deploy with threshold 2 out of 2 coprocessors (local multicoprocessor mode)
4862./fhevm-cli deploy --coprocessors 2 --coprocessor-threshold 2
4963
@@ -72,8 +86,29 @@ cd test-suite/fhevm
7286
7387# Clean up
7488./fhevm-cli clean
89+
90+ # Hard purge for reproducible A/B runs
91+ ./fhevm-cli clean --purge
92+ ```
93+
94+ If you prefer shorter commands with Bun scripts, you can run the same CLI via:
95+
96+ ``` sh
97+ cd test-suite/fhevm
98+ bun run deploy --network testnet
99+ bun run test input-proof
100+ bun run telemetry-smoke
101+ bun run clean --purge-build-cache
75102```
76103
104+ All ` clean ` purge flags are fhevm-scoped:
105+ - ` --purge-images ` removes images referenced by fhevm compose services.
106+ - ` --purge-build-cache ` removes the local Buildx cache directory (` .buildx-cache ` by default, or ` FHEVM_BUILDX_CACHE_DIR ` if set).
107+ - ` --purge-local-cache ` is a compatibility alias for ` --purge-build-cache ` .
108+ - ` --purge-networks ` removes Docker networks labeled with the active compose project only.
109+
110+ For ` deploy --coprocessors N ` with ` N > 1 ` , ` cast ` (Foundry) must be installed locally to derive per-coprocessor accounts from ` MNEMONIC ` .
111+
77112### WIP - Forcing Local Builds (` --build ` )
78113
79114⚠️ ** IMPORTANT: THIS FEATURE IS STILL A WORK IN PROGRESS!** ⚠️
@@ -110,12 +145,63 @@ For faster local iteration, use `--local` to enable a local BuildKit cache (stor
110145./fhevm-cli deploy --local
111146```
112147
148+ For code-path validation, prefer ` --build --local ` so your local changes are rebuilt while keeping warm cache layers.
149+
150+ To align local versions with currently deployed environments, you can ask deploy to scrape the public version dashboard:
151+
152+ ``` sh
153+ ./fhevm-cli deploy --network testnet
154+ ./fhevm-cli deploy --network mainnet
155+ ```
156+
157+ Notes:
158+ - This is best-effort scraping from the public Grafana dashboard DOM.
159+ - It applies known service version env vars (coprocessor services, kms-connector services, ` CORE_VERSION ` ) before deployment.
160+ - Contract/relayer versions continue to use local defaults unless explicitly overridden.
161+ - If your Chromium path is custom, set ` FHEVM_GRAFANA_CHROMIUM_BIN=/path/to/chromium ` .
162+ - For deterministic testing, set ` FHEVM_GRAFANA_DASHBOARD_HTML_FILE=/path/to/dashboard.html ` .
163+
113164When running tests and you know your Hardhat artifacts are already up to date, you can skip compilation:
114165
115166``` sh
116167./fhevm-cli test input-proof --no-hardhat-compile
117168```
118169
170+ ### CLI Reference
171+
172+ For agent workflows, prefer explicit command+flag forms from this table.
173+
174+ | Command | Flags | Notes |
175+ | --- | --- | --- |
176+ | ` deploy ` | ` --build ` | Build buildable services before ` up -d ` . |
177+ | ` deploy ` | ` --local ` / ` --dev ` | Enable local BuildKit cache (` .buildx-cache ` by default). |
178+ | ` deploy ` | ` --network testnet\|mainnet ` | Apply version profile from public dashboard before deploy. |
179+ | ` deploy ` | ` --coprocessors <n> ` | Configure local coprocessor topology size (` n ` , max ` 5 ` ). |
180+ | ` deploy ` | ` --coprocessor-threshold <t> ` | Override topology threshold (` t <= n ` ). |
181+ | ` deploy ` | ` --resume <step> ` | Redeploy from a specific step onward. |
182+ | ` deploy ` | ` --only <step> ` | Redeploy only one step. |
183+ | ` deploy ` | ` --telemetry-smoke ` | Run Jaeger service smoke-check after deployment. |
184+ | ` deploy ` | ` --strict-otel ` | Fail if OTEL endpoint expects Jaeger and Jaeger is not running. |
185+ | ` test ` | ` -n, --network <name> ` | Test-runtime network selection (default: ` staging ` ). |
186+ | ` test ` | ` -g, --grep <pattern> ` | Override test grep pattern. |
187+ | ` test ` | ` -v, --verbose ` | Verbose test output. |
188+ | ` test ` | ` -r, --no-relayer ` | Disable Rust relayer in tests. |
189+ | ` test ` | ` --no-hardhat-compile ` | Skip compile when artifacts are already up-to-date. |
190+ | ` clean ` | ` --purge ` | Shorthand for ` --purge-images --purge-build-cache --purge-networks ` . |
191+ | ` clean ` | ` --purge-images ` | Remove images referenced by fhevm compose services only. |
192+ | ` clean ` | ` --purge-build-cache ` | Remove local Buildx cache dir (` .buildx-cache ` or ` FHEVM_BUILDX_CACHE_DIR ` ). |
193+ | ` clean ` | ` --purge-networks ` | Remove Docker networks labeled with the active compose project only. |
194+ | ` clean ` | ` --purge-local-cache ` | Alias of ` --purge-build-cache ` (kept for compatibility). |
195+ | ` pause ` / ` unpause ` | ` host ` or ` gateway ` | Contract pause controls. |
196+ | ` upgrade ` | ` <service> ` | Restart selected service compose stack. |
197+ | ` logs ` | ` <service> ` | Stream container logs for one service. |
198+ | ` telemetry-smoke ` | _ none_ | Validate required Jaeger services are present. |
199+
200+ Notes:
201+ - ` --network ` on ` deploy ` selects a ** version profile** (` testnet ` /` mainnet ` ).
202+ - ` --network ` on ` test ` selects a ** test runtime network** (default ` staging ` ).
203+ - They are intentionally different and command-scoped.
204+
119205### Resuming a deployment
120206
121207If a deploy fails mid-way, you can resume from a specific step without tearing down containers or regenerating ` .env ` files:
@@ -132,6 +218,9 @@ When resuming:
132218- Services ** before** the resume step are preserved (containers + volumes kept)
133219- Services ** from** the resume step onwards are torn down and redeployed
134220
221+ Multicoprocessor safety note:
222+ - If you change multicoprocessor topology (` --coprocessors ` and/or ` --coprocessor-threshold ` ) while using ` --resume ` from ` coprocessor ` or later, the CLI intentionally forces resume from ` minio ` to reset key material coherently across all coprocessors.
223+
135224### Deploying a single step
136225
137226To redeploy only a single service without touching others:
@@ -149,6 +238,203 @@ You can combine `--only` or `--resume` with other flags:
149238./fhevm-cli deploy --only gateway-sc --build --local
150239```
151240
241+ ### Telemetry Checks
242+
243+ The coprocessor env now ensures ` OTEL_EXPORTER_OTLP_ENDPOINT ` is present.
244+ If it is missing, deploy defaults it to ` http://jaeger:4317 ` in ` .env.coprocessor.local ` .
245+
246+ Use strict endpoint validation (requires Jaeger to be up first):
247+
248+ ``` sh
249+ ./fhevm-cli deploy --strict-otel
250+ ```
251+
252+ Run smoke validation on demand:
253+
254+ ``` sh
255+ ./fhevm-cli telemetry-smoke
256+ ```
257+
258+ ` telemetry-smoke ` retries for a short warm-up window before failing, to reduce false negatives while traces are still starting up.
259+
260+ Or include it in deploy:
261+
262+ ``` sh
263+ ./fhevm-cli deploy --telemetry-smoke
264+ ```
265+
266+ ### Orchestration source of truth
267+
268+ The orchestration is Bun-first with shell entrypoint wrappers:
269+
270+ - ` test-suite/fhevm/fhevm-cli `
271+ - ` test-suite/fhevm/scripts/deploy-fhevm-stack.sh `
272+
273+ Canonical deploy/test metadata now lives in one TypeScript source:
274+
275+ - ` test-suite/fhevm/scripts/bun/manifest.ts `
276+
277+ Runtime implementation:
278+
279+ - ` test-suite/fhevm/scripts/bun/cli.ts `
280+ - ` test-suite/fhevm/scripts/bun/process.ts `
281+
282+ Compatibility snapshots used for parity verification:
283+
284+ - ` test-suite/fhevm/fhevm-cli.legacy `
285+ - ` test-suite/fhevm/scripts/deploy-fhevm-stack.legacy.sh `
286+
287+ You can force legacy mode explicitly with:
288+
289+ ``` sh
290+ FHEVM_CLI_IMPL=legacy ./fhevm-cli deploy
291+ ```
292+
293+ Version updates do not require editing many per-service vars manually.
294+ You can override them in one place:
295+
296+ - ` FHEVM_STACK_VERSION ` (gateway/host/coprocessor/kms-connector/test-suite)
297+ - ` FHEVM_CORE_VERSION `
298+ - ` FHEVM_RELAYER_VERSION ` (relayer + relayer-migrate)
299+
300+ These can be set as environment variables, or in an optional file:
301+
302+ - ` test-suite/fhevm/env/staging/.env.versions `
303+ - (template: ` test-suite/fhevm/env/staging/.env.versions.example ` )
304+
305+ Example:
306+
307+ ``` sh
308+ FHEVM_STACK_VERSION=v0.12.0-rc.1 \
309+ FHEVM_CORE_VERSION=v0.14.0-rc.1 \
310+ FHEVM_RELAYER_VERSION=v0.10.0-rc.1 \
311+ ./fhevm-cli deploy
312+ ```
313+
314+ ### Troubleshooting deploy failures
315+
316+ When deploy fails, the script now surfaces explicit hints for common operational failure modes.
317+
318+ - OOM-killed critical service:
319+ - Symptom: failure includes ` looks OOM-killed ` .
320+ - Action: increase Docker memory and resume from the failed step, for example:
321+ - ` ./fhevm-cli deploy --resume coprocessor `
322+
323+ - Key bootstrap / CRS not ready:
324+ - Symptom: failure includes ` Detected key-bootstrap-not-ready state ` .
325+ - Action: wait for keygen/CRS generation to settle, then resume from gateway contracts:
326+ - ` ./fhevm-cli deploy --resume gateway-sc `
327+
328+ - Gateway helper image export conflict (` already exists ` ):
329+ - Symptom: build fails while starting gateway contracts.
330+ - Action: deploy now auto-retries once after removing conflicting ` gateway-contracts ` tags.
331+ - Manual fallback for repeated collisions:
332+ - ` ./fhevm-cli clean --purge-images --purge-build-cache `
333+
334+ ### Behavior parity tests
335+
336+ A behavior-level shell test suite validates deploy orchestration outcomes (ordering, ` --resume ` , ` --only ` , build semantics, env patch timing, actionable failure hints, strict OTEL checks, purge flags, and telemetry smoke checks).
337+
338+ Run it with:
339+
340+ ``` sh
341+ ./test-suite/fhevm/scripts/tests/deploy-fhevm-stack.behavior.sh
342+ ```
343+
344+ ### CLI parity diff tests
345+
346+ A dry-run parity harness executes legacy Bash and Bun CLI flows under the same mocked Docker environment, then diffs command traces and exit codes for sampled command cases.
347+
348+ Run it with:
349+
350+ ``` sh
351+ ./test-suite/fhevm/scripts/tests/fhevm-cli-parity-diff.sh
352+ ```
353+
354+ ### Validation protocol (PR / manual)
355+
356+ Use this when you want high confidence that the CLI is safe and functionally correct.
357+
358+ #### 1) Quick confidence protocol (10-20 min)
359+
360+ ``` sh
361+ cd test-suite/fhevm
362+
363+ # Optional but strongly recommended: isolate this run from any other local docker tests.
364+ export FHEVM_DOCKER_PROJECT=fhevm-pr
365+
366+ # Fresh start
367+ ./fhevm-cli clean --purge
368+
369+ # Bring up stack with testnet profile + telemetry gate
370+ ./fhevm-cli deploy --resume core --network testnet --telemetry-smoke
371+
372+ # Fast deterministic suite
373+ ./fhevm-cli test input-proof --no-hardhat-compile
374+ ./fhevm-cli test user-decryption --no-hardhat-compile
375+ ./fhevm-cli test erc20 --no-hardhat-compile
376+
377+ # Operational commands
378+ ./fhevm-cli pause host
379+ ./fhevm-cli unpause host
380+ ./fhevm-cli pause gateway
381+ ./fhevm-cli unpause gateway
382+ ./fhevm-cli upgrade coprocessor
383+
384+ # Cleanup
385+ ./fhevm-cli clean
386+ ```
387+
388+ #### 2) Full protocol (same matrix used for end-to-end QA)
389+
390+ ``` sh
391+ cd /path/to/fhevm
392+ bash /private/tmp/fhevm_full_qa.sh
393+ ```
394+
395+ Expected outcome:
396+ - No ` [QA][FAIL] ` in the generated summary log
397+ - Final line is ` [QA][DONE] artifact_dir=... `
398+
399+ Inspect results:
400+
401+ ``` sh
402+ ART_DIR=$( cat /tmp/fhevm-full-qa-last-artifacts.txt)
403+ cat " $ART_DIR /summary.log"
404+ ```
405+
406+ #### 3) Safety check for cleanup scope (no side effects)
407+
408+ Use sentinels to confirm clean/purge commands do not affect unrelated docker resources.
409+
410+ ``` sh
411+ docker run -d --name qa-clean-sentinel alpine:3.20 sleep 36000
412+ docker network create qa-clean-sentinel-net
413+
414+ cd test-suite/fhevm
415+ ./fhevm-cli clean --purge
416+
417+ docker ps -a --filter name=' ^qa-clean-sentinel$' --format ' {{.Names}}'
418+ docker network ls --filter name=' ^qa-clean-sentinel-net$' --format ' {{.Name}}'
419+
420+ docker rm -f qa-clean-sentinel
421+ docker network rm qa-clean-sentinel-net
422+ ```
423+
424+ Both sentinel checks must still return their names after clean/purge.
425+
426+ ### Docker project isolation
427+
428+ To avoid collisions with other local Docker-based test workflows, set a dedicated compose project name:
429+
430+ ``` sh
431+ export FHEVM_DOCKER_PROJECT=fhevm-dev
432+ cd test-suite/fhevm
433+ ./fhevm-cli deploy
434+ ```
435+
436+ All deploy/clean/purge operations are then scoped to that project.
437+
152438## Security policy
153439
154440### Handling sensitive data
0 commit comments