refactor(test-suite): rewrite fhevm orchestration in bun with parity checks#1986
refactor(test-suite): rewrite fhevm orchestration in bun with parity checks#1986
Conversation
🧪 CI InsightsHere's what we observed from your CI run for ed4ae52. 🟢 All jobs passed!But CI Insights is watching 👀 |
|
Pushed a hardening pass on top of v0:
Validation run:
Commit: |
|
@claude /pr-review Please review with one goal in mind: improved local UX while preserving deploy behavior parity. Context:
Review objective: Acceptance criteria to verify (pass/fail, with evidence):
Local-UX-specific checks:
Output format required:
|
|
Claude finished @Eikix's task in 11m 44s —— View job PR Review: refactor(test-suite) — fhevm orchestration in Bun with parity checksReview scope: Local orchestration refactor/hardening in Review objective: Validate that local developer experience is measurably better without regressing existing deploy behavior semantics. I read all 15 changed files line-by-line and ran three parallel review passes (bug hunt, error handling audit, acceptance criteria verification). Results below. Findings (ordered by severity)P1 —
|
| # | Criterion | Verdict | Evidence |
|---|---|---|---|
| 1 | Existing workflows remain equivalent | Pass | All commands verified identical via line-by-line comparison. Parity diff test (fhevm-cli-parity-diff.sh) validates command traces + exit codes for 12 sampled flows. All 14 test grep patterns match exactly. |
| 2 | Resume/single-step semantics preserved | Pass | --resume/--only parsing, validation, mutual exclusion all match legacy. Reverse teardown order identical. MinIO IP patch timing validated by behavior test (assert_order). Skipped minio step still fetches IP when container is running. |
| 3 | Single canonical source, no duplicated registries | Pass | manifest.ts is the single source of truth for Bun. Legacy files are explicitly labeled compatibility snapshots and validated via parity diff tests. DEPLOYMENT_STEPS (13 steps), VERSION_ENTRIES (18 entries), TEST_TYPE_CONFIG (14 types) all match legacy exactly. .env.versions + groupOverrideEnv is additive Bun-only functionality. |
| 4 | Operational failures are actionable | Pass | OOM → "looks OOM-killed" + resume command. Key-bootstrap → "Detected key-bootstrap-not-ready" + resume to gateway-sc. OTEL → defaults missing endpoint, --strict-otel fails fast. All three validated by behavior tests. |
| 5 | Cleanup supports reproducible A/B runs | Pass | Default clean = safe compose down only. --purge = all three granular flags. --purge-images, --purge-build-cache, --purge-networks individually available. Behavior test validates all prune commands. Caveat: --purge-images is system-wide, not scoped (see P1 finding). |
| 6 | Safe migration path | Pass | fhevm-cli wrapper: Bun-first → command -v bun check → fallback to legacy. FHEVM_CLI_IMPL=legacy override works. Same pattern in deploy-fhevm-stack.sh. All 4 legacy files preserved intact. |
| 7 | Docs reflect actual behavior and flags | Pass | README covers all new flags (--strict-otel, --telemetry-smoke, --purge, --purge-images, --purge-build-cache, --purge-networks, --local, --no-hardhat-compile, telemetry-smoke command). Resume step list matches DEPLOYMENT_STEPS exactly (13 steps in order). Troubleshooting sections match actual error messages. |
Local-UX-specific checks
| Check | Verdict | Evidence |
|---|---|---|
deploy --strict-otel fails fast if Jaeger not running |
Pass | ensureCoprocessorTelemetryEnv with validateReachability=true checks isContainerRunningExact("jaeger") and throws actionable error. Behavior test validates. |
telemetry-smoke verifies Jaeger service set |
Pass | Queries http://localhost:16686/api/services, compares against TELEMETRY_REQUIRED_JAEGER_SERVICES (6 services including host-listener-poller). Actionable failure text with service names. |
gateway-contracts:* already exists conflict has mitigation/retry |
Pass | detectGatewayImageConflict regex + removeGatewayConflictImages + single retry. Caveat: removal failures are silent (see P2 finding). |
| Coprocessor readiness includes poller service | Pass | manifest.ts line 97: { service: "coprocessor-host-listener-poller", state: "running" } is in coprocessor step's serviceChecks. |
clean --purge / granular flags discoverable and documented |
Pass | Usage help (cli.ts line 83) lists all flags. README documents all variants with examples. |
Recommendation
Ready to merge with two recommended P1 fixes (non-blocking but strongly encouraged as fast follow-ups):
- Guard
usage()behind error type — prevents help text from drowning actionable failure hints, which is the primary UX improvement this PR aims to deliver. - Scope
docker image prune -af— or add an explicit warning, since--purge-imagesbeing system-wide is surprising behavior for a project-scoped tool.
The P2 items (readEnvValue quote stripping, silent gateway image removal, telemetry smoke false positives) are real but low-probability in practice and can be addressed in a follow-up.
Overall assessment: This is a well-structured refactor with excellent parity validation. The Bun rewrite is faithful to the legacy behavior (verified by both manual comparison and automated diff tests), the manifest-based architecture is a clear maintainability improvement, and the new operational features (OOM hints, OTEL validation, gateway conflict retry, granular purge) are properly tested and documented.
·
|
Synced branch with What I changed during conflict resolution:
Validation rerun after conflict resolution:
Note:
|
|
Addressed Claude review findings in Implemented fixes:
Added/updated behavior tests:
Validation rerun:
|
|
@claude /pr-review Please run another review pass on this PR. Context for rerun:
Please verify acceptance criteria again with emphasis on:
|
|
Claude finished @Eikix's task in 2m 24s —— View job PR Review (Re-review Pass) —
|
| # | Criterion | Verdict | Evidence |
|---|---|---|---|
| 1 | Existing workflows remain equivalent | Pass | All commands (deploy/up, test, pause/unpause, upgrade, logs, clean) verified identical via line-by-line comparison. Parity diff test validates 12 sampled flows. fhevm-cli wrapper correctly falls back to legacy for smoke command and --coprocessors/--coprocessor-threshold flags (lines 14–25). |
| 2 | Resume/single-step semantics preserved | Pass | --resume/--only parsing, validation, mutual exclusion all match legacy. Reverse teardown order identical (cleanupFromStep line 704: reverse loop). MinIO IP patch timing validated by assert_order in behavior tests. Skipped minio step still fetches IP when container is running (line 875–877). |
| 3 | Single canonical source, no duplicated registries | Pass | manifest.ts is the single source of truth for Bun. Legacy shell manifests (deploy-manifest.sh, version-manifest.sh) are compatibility snapshots validated via parity diff. Step counts match exactly: 13 deployment steps, 18 version entries, 14 test types. LOCAL_CACHE_SERVICES (15 entries) matches legacy exactly. |
| 4 | Operational failures are now actionable | Pass | OOM → "looks OOM-killed" + resume command (line 431). Key-bootstrap → "Detected key-bootstrap-not-ready" + resume to gateway-sc (line 436). OTEL → defaults missing endpoint, --strict-otel fails fast (line 360–364). New: runtime failures no longer print usage text (verified by assert_not_contains "Usage:" in behavior tests). |
| 5 | Cleanup supports reproducible A/B runs | Pass | Default clean = safe compose down + remove-orphans (line 1141). --purge = all three granular flags. Network purge is now scoped to fhevm_ prefix (line 1149). --purge-images and --purge-build-cache are system-wide but explicitly opt-in. |
| 6 | Safe migration path | Pass | fhevm-cli wrapper: Bun-first → should_fallback_to_legacy() check for unimplemented flows → command -v bun check → fallback to legacy. FHEVM_CLI_IMPL=legacy override works. Same pattern in deploy-fhevm-stack.sh. All 4 legacy files preserved intact. New fallback paths added for --coprocessors/--coprocessor-threshold flags (lines 19–25 of fhevm-cli). |
| 7 | Docs reflect actual behavior and flags | Pass | README covers all new flags, resume step list matches DEPLOYMENT_STEPS exactly (13 steps in order), troubleshooting sections match actual error messages. Multicoprocessor deploy flow preserved in quickstart examples (line 56). |
Local-UX-specific checks (re-verified)
| Check | Verdict | Evidence |
|---|---|---|
deploy --strict-otel fails fast if Jaeger not running |
Pass | ensureCoprocessorTelemetryEnv with validateReachability=true checks isContainerRunningExact("jaeger") (line 360) and throws actionable error with docker compose command to start tracing. |
telemetry-smoke verifies Jaeger service set |
Pass | Queries http://localhost:16686/api/services, compares against TELEMETRY_REQUIRED_JAEGER_SERVICES (6 services: host-listener, host-listener-poller, tfhe-worker, txn-sender, sns-executor, zkproof-worker). Actionable failure text with missing service names. |
gateway-contracts:* already exists conflict has mitigation/retry |
Pass | detectGatewayImageConflict regex + removeGatewayConflictImages + single retry (lines 534–605). |
| Coprocessor readiness includes poller service | Pass | manifest.ts line 97: { service: "coprocessor-host-listener-poller", state: "running" }. |
clean --purge / granular flags discoverable and documented |
Pass | Usage help (cli.ts line 85), README examples, behavior test (test_clean_purge_invokes_prunes). |
| Runtime failures don't print usage text | Pass (New) | CliUsageError path ensures only argument errors print usage. Verified by assert_not_contains "Usage:" in OOM and key-bootstrap tests. |
| Quoted OTEL endpoint accepted | Pass (New) | readEnvValue now strips quotes. Verified by test_quoted_otel_endpoint_is_accepted. |
Recommendation
Ready to merge. ✅
All three prior review findings (P1 usage() noise, P2 readEnvValue quotes, P3 sleep spawning) have been correctly fixed and are covered by new behavior tests. The remaining docker image prune -af scope concern is acceptable as a P3 follow-up given the explicit opt-in nature of the flag.
Summary of what's been validated:
- 15 files, ~3,883 lines added, read line-by-line across two review passes
- 7/7 acceptance criteria pass
- 6/6 local UX checks pass (+ 2 new checks for fix verification)
- Prior P1 and P2 findings fully resolved with test coverage
- No regressions found in deploy/test/clean flows
- Safe migration path preserved with new fallbacks for multicoprocessor flags
|
@claude /pr-review Quick final pass please on top of commit Focus only on the incremental hardening from this commit:
Goal: ensure no regressions and confirm UX is clearer for local operators. |
|
Claude finished @Eikix's task in 2m 1s —— View job PR Review (Final Pass) — Incremental hardening on
|
| Area | Status | Evidence |
|---|---|---|
| Deploy/up flow | No regression | test_default_flow_and_env_patch unchanged, still validates full 13-step flow |
| Resume/only semantics | No regression | test_resume_preserves_prior_steps_and_restarts_tail and test_only_runs_single_step unchanged |
| Test flows | No regression | No changes to test command paths |
| Clean (default) | No regression | Default clean path unchanged — only purgeImages and purgeBuildCache branches have new logWarn calls |
| Parity diff | No regression | Parity test file unchanged in this commit |
| CI | All green | 22/22 jobs passed, 2 skipped (expected: test-suite-orchestrate-e2e-tests and claude-review) |
Verdict
No findings requiring changes. Ready to merge. ✅
This commit cleanly resolves the remaining P1 and P2 items from prior reviews:
- Telemetry smoke false positives → fixed with 6-attempt retry loop + progress logging
- Gateway conflict silent failures → fixed with per-tag warning + Docker error details
- System-wide purge scope → fixed with runtime warnings + README caveat
All three changes are well-tested, well-documented, and introduce no regressions to existing flows.
|
@claude /pr-review Please review latest commit Scope of this pass:
Please focus on:
Output requested:
|
|
Claude finished @Eikix's task in 4m 1s —— View job PR Review (Incremental Pass) — commit
|
| Concern | Status | Evidence |
|---|---|---|
--resume semantics preserved |
No impact | applyNetworkProfileVersions runs at line 1076, before cleanup/env-prep. --resume skip logic at line 1101 is unchanged. Version env vars are in process.env before any compose step runs, including resumed steps. |
--only semantics preserved |
No impact | Same reasoning — version overrides happen before the step loop. |
| Env patch timing | Correct | Ordering in deploy(): (1) applyNetworkProfileVersions sets process.env, (2) prepareAllEnvFiles copies base files to .local, (3) ensureCoprocessorTelemetryEnv patches OTEL. The process.env overrides take precedence over --env-file values in docker compose (docker compose spec: host env > env-file). MinIO IP patch timing unchanged. |
| Existing non-network deploys | No impact | applyNetworkProfileVersions is gated behind if (options.networkProfile). When --network is not passed, networkProfile is undefined and no scraping runs. |
| Version summary display | Correct | printVersionSummary at line 1098 reads from process.env[version.envVar] ?? version.defaultValue. Scraped versions will appear correctly in the summary output. |
Behavior tests assessment
| Test | Coverage | Verdict |
|---|---|---|
test_deploy_network_profile_applies_versions |
Validates fixture-based mainnet scraping, asserts 8 overrides applied, checks specific version strings in output | Good — uses FHEVM_GRAFANA_DASHBOARD_HTML_FILE to avoid real network calls |
test_deploy_network_profile_rejects_invalid_value |
Validates --network foo produces actionable error with allowed values, confirms Usage: is shown (argument error path) |
Good |
| Missing: testnet profile test | Only mainnet is tested; testnet panel parsing is not exercised | Minor gap — the fixture includes testnet data but no test asserts against it |
| Missing: Chromium unavailability test | No test for the case where both FHEVM_GRAFANA_DASHBOARD_HTML_FILE and Chromium are unavailable |
Minor gap — this is the most common failure path for new users |
Docs assessment
File: test-suite/README.md — 17 lines added
- Quickstart example (
./fhevm-cli deploy --network testnet) is clear and discoverable - Notes section correctly frames this as "best-effort scraping"
FHEVM_GRAFANA_CHROMIUM_BINandFHEVM_GRAFANA_DASHBOARD_HTML_FILEenv vars are documented- Contract/relayer version scoping is explicitly called out ("continue to use local defaults unless explicitly overridden")
One missing note: The docs don't mention that --network requires a Chromium/Chrome binary to be available (either system-installed or via Playwright). The FHEVM_GRAFANA_CHROMIUM_BIN hint is there but the prerequisite isn't stated.
Recommendation
Ready to merge with two recommended P1 fixes (non-blocking but strongly encouraged as fast follow-ups):
- Warn when
FHEVM_GRAFANA_CHROMIUM_BINpath doesn't exist — silent fallthrough is confusing for users who explicitly configured this. - Add
--no-sandboxto Chromium args — headless Chrome in Docker/CI as root is a common scenario and the failure is non-actionable without this flag.
The P2 items (duplicate service registry, missing COPROCESSOR_DB_MIGRATION_VERSION, --network + --build interaction, regex fragility) are design trade-offs that are acceptable given the "best-effort" framing in docs and the fixture-based testing escape hatch.
Overall assessment: This is a well-scoped incremental feature. The scraping approach is inherently fragile (coupled to Grafana's DOM), but this is explicitly acknowledged in the README, and the FHEVM_GRAFANA_DASHBOARD_HTML_FILE fixture path provides a deterministic fallback. The version mapping correctly flows through process.env to docker compose, existing deploy semantics are unaffected, and the behavior tests cover the happy path and validation path effectively.
·
|
Follow-up for disk-growth issue from local runs: Pushed What changed:
Validation:
This addresses the silent local cache accumulation path that can balloon disk usage over repeated |
|
@claude /pr-review Please run a fresh review on latest HEAD (includes Focus:
Output format:
|
|
Claude finished @Eikix's task in 3m 59s —— View job PR Review (Fresh Pass) — latest HEAD (
|
| Concern | Status | Evidence |
|---|---|---|
--resume ordering |
✅ No regression | cleanupFromStep (line 912) collects components from resume index to end, then iterates in reverse for cleanup (line 928). Services before resume point are preserved. Behavior test test_resume_preserves_prior_steps_and_restarts_tail validates both positive (kms-connector onwards deployed) and negative (minio/core/etc. not touched) cases with ordering assertions. |
--only isolation |
✅ No regression | cleanupSingleStep (line 935) cleans only the specified step's component. shouldSkipStep (line 622) returns true for all other steps. Behavior test test_only_runs_single_step validates coprocessor is the only step touched. |
--resume/--only mutual exclusion |
✅ Correct | Line 1071–1072: usageError("Cannot use --resume and --only together") |
| Env patch timing | ✅ Correct | Deploy function (line 1090–1151) order: (1) parse args, (2) apply network profile versions to process.env, (3) configure local build env, (4) cleanup, (5) prepareAllEnvFiles() — copies base → .local, (6) ensureCoprocessorTelemetryEnv() — patches OTEL into already-created .local file, (7) deploy steps. MinIO IP patch happens both after minio step (line 1141–1143) AND when minio is skipped but running (line 1126–1128). Behavior test test_default_flow_and_env_patch validates minio inspect occurs before coprocessor compose up. |
| Network profile timing | ✅ Correct | applyNetworkProfileVersions at line 1093 sets process.env BEFORE prepareAllEnvFiles. Since docker compose reads env vars from both --env-file and host env (host env takes precedence per compose spec), scraped versions override file defaults. |
--build only on buildable steps |
✅ Correct | Line 1138: const useBuild = options.forceBuild && step.buildable. Non-buildable steps (minio, core, kms-signer, database) always get plain up -d. Behavior test test_build_flag_applies_only_to_buildable_steps validates. |
Cleanup Hardening Verification
| Feature | Status | Evidence |
|---|---|---|
--purge is shorthand for all 4 purge flags |
✅ Correct | parseCleanArgs line 1384–1388: sets all four booleans to true. |
--purge-local-cache removes .buildx-cache |
✅ Correct | Line 1440–1448: fs.rmSync(cacheRoot, { recursive: true, force: true }). Respects FHEVM_BUILDX_CACHE_DIR env var. Behavior test test_clean_purge_invokes_prunes creates a .buildx-cache/coprocessor/index.json, runs clean --purge, and asserts the directory is gone. |
| System-wide purge warnings | ✅ Correct | Lines 1431, 1436: logWarn messages explicitly state "system-wide" scope before executing prunes. Behavior test asserts both warning messages appear in output. |
| Network purge is fhevm-scoped | ✅ Correct | Line 1424: if (network.startsWith(\${PROJECT}`))— only removesfhevm*` networks. |
| Default clean is safe | ✅ Correct | Line 1416: docker compose -p fhevm down -v --remove-orphans only — no purge behavior without explicit flags. |
.buildx-cache in .gitignore |
✅ Correct | .gitignore line 43: .buildx-cache/ |
CLI Help Completeness Audit
| Flag | In usage() |
In README table | In Examples | Status |
|---|---|---|---|---|
deploy --build |
✅ line 93 | ✅ line 159 | ✅ line 117 | Complete |
deploy --local/--dev |
✅ line 94 | ✅ line 160 | ✅ line 118 | Complete |
deploy --network |
✅ line 95 | ✅ line 161 | ✅ line 120 | Complete |
deploy --resume |
✅ line 96 | ✅ line 162 | ✅ line 121 | Complete |
deploy --only |
✅ line 97 | ✅ line 163 | ✅ line 122 | Complete |
deploy --telemetry-smoke |
✅ line 98 | ✅ line 164 | ✅ line 119 | Complete |
deploy --strict-otel |
✅ line 99 | ✅ line 165 | — | Complete (no example needed) |
test -v/--verbose |
✅ line 102 | ✅ line 168 | — | Complete |
test -n/--network |
✅ line 103 | ✅ line 166 | ✅ line 127 | Complete |
test -g/--grep |
✅ line 104 | ✅ line 167 | — | Complete |
test -r/--no-relayer |
✅ line 105 | ✅ line 169 | — | Complete |
test --no-hardhat-compile |
✅ line 106 | ✅ line 170 | ✅ line 124 | Complete |
clean --purge |
✅ line 109 | ✅ line 171 | ✅ line 131 | Complete |
clean --purge-images |
✅ line 110 | ✅ line 172 | — | Complete |
clean --purge-build-cache |
✅ line 111 | ✅ line 173 | — | Complete |
clean --purge-networks |
✅ line 112 | ✅ line 174 | — | Complete |
clean --purge-local-cache |
✅ line 113 | ✅ line 175 | ✅ line 132 | Complete |
telemetry-smoke command |
✅ line 89 | ✅ line 179 | ✅ line 130 | Complete |
Result: All flags are documented in all three locations (help, README table, and examples where applicable). The --network semantic distinction (deploy = version profile, test = runtime network) is explicitly called out in both README (lines 182–184) and help (lines 95, 103).
README Agent-Oriented Usability
| Criterion | Status | Evidence |
|---|---|---|
| Command reference table is machine-parseable | ✅ | Markdown table with consistent `command |
Explicit flag semantics for --network disambiguation |
✅ | Lines 182–184: "They are intentionally different and command-scoped." |
| Resume step list matches manifest exactly | ✅ | README line 195 lists 13 steps in order, matches DEPLOYMENT_STEPS in manifest.ts exactly |
| Troubleshooting sections give actionable commands | ✅ | OOM → deploy --resume coprocessor, key-bootstrap → deploy --resume gateway-sc, gateway conflict → clean --purge-images --purge-build-cache |
| System-wide purge caveat documented | ✅ | Line 92: explicit warning about --purge-images and --purge-build-cache being system-wide |
| Local buildx cache caveat documented | ✅ | Line 93: --purge also removes the local Buildx cache directory |
| Orchestration source of truth documented | ✅ | Lines 244–290: clear hierarchy (manifest.ts → cli.ts → wrappers → legacy) |
| Legacy override path documented | ✅ | Line 268: FHEVM_CLI_IMPL=legacy ./fhevm-cli deploy |
| Version override examples | ✅ | Lines 283–290: env var override example |
Recommendation
Ready to merge. ✅
The cleanup hardening (--purge-local-cache, buildx cache removal, system-wide warnings) is well-implemented and tested. CLI help is complete for all deploy/test/clean flags. README command reference is accurate and agent-friendly. Deploy semantics (--resume, --only, env patch timing) are preserved and validated by both behavior tests and parity diff tests.
One recommended P2 fix as a fast follow-up:
- Resolve
--purge-local-cachepath relative toFHEVM_DIRinstead of CWD to prevent silent no-ops when invoked from unexpected directories.
The other P2/P3 items (duplicate VERSION_ENTRIES display, Docker version regex anchors, incomplete test type listing in help, README caveat for multicoprocessor fallback) are minor and can be addressed in follow-ups.
Overall assessment: This is a mature refactor with excellent parity validation infrastructure. The 13 behavior tests + 12 parity diff cases provide strong coverage. The cleanup hardening adds meaningful safety guardrails (system-wide warnings, scoped network purge, buildx cache purge) that improve reproducibility for local A/B testing workflows.
·
|
Quick validation protocol for reviewers (simple path): cd test-suite/fhevm
export FHEVM_DOCKER_PROJECT=fhevm-pr
./fhevm-cli clean --purge
./fhevm-cli deploy --resume core --network testnet --telemetry-smoke
./fhevm-cli test input-proof --no-hardhat-compile
./fhevm-cli test user-decryption --no-hardhat-compile
./fhevm-cli test erc20 --no-hardhat-compile
./fhevm-cli deploy --resume coprocessor --coprocessors 2 --coprocessor-threshold 2
./fhevm-cli test input-proof-compute-decrypt --no-hardhat-compile
./fhevm-cli clean --purgeWhat this should prove quickly:
Full end-to-end matrix was rerun on latest changes and is green:
@claude please re-review this PR with focus on local UX and operational safety:
|
|
Claude finished @Eikix's task in 4m 9s —— View job PR Re-Review: Local UX & Operational Safety FocusScope: Fresh review at latest HEAD, focused on 5 areas per @Eikix's request. All 16 changed files read line-by-line.
Findings (ordered by severity)P1 —
|
| Cleanup path | Scope | Safe? | Evidence |
|---|---|---|---|
Default clean |
docker compose -p fhevm down -v --remove-orphans |
✅ Project-scoped | Only removes containers/volumes in the fhevm compose project (cli.ts line 1792) |
--purge-networks |
Only fhevm_* networks |
✅ Project-scoped | network.startsWith(\${PROJECT}_`)` guard at line 1800 |
--purge-images |
docker compose down --rmi all per fhevm component |
✅ Project-scoped | Uses purgeProjectImages() → purgeComponentImages() which runs docker compose -p fhevm -f <component>.yml down --rmi all per component (lines 1196–1206). Only removes images defined in fhevm compose files. Previous P1 (system-wide docker image prune -af) is now fixed. |
--purge-build-cache / --purge-local-cache |
Removes .buildx-cache/ directory |
✅ Directory-scoped | purgeLocalBuildxCache() removes only the specific .buildx-cache dir relative to FHEVM_DIR (line 1222). Not system-wide. |
FHEVM_DOCKER_PROJECT env var |
❌ Not wired | See P1 finding above |
Verdict: Pass (with P1 caveat about FHEVM_DOCKER_PROJECT). All cleanup paths are properly fhevm-scoped. The system-wide prune issue from prior reviews has been fully resolved.
Area 2: Multicoprocessor behavior and resume ordering semantics
--coprocessors N --coprocessor-threshold T flow verified:
-
Argument validation (cli.ts lines 1432–1440):
T > N→usageError✅N > MAX_LOCAL_COPROCESSORS (5)→usageError✅- Behavior test
test_multicoprocessor_flags_are_validatedcovers these ✅
-
Env configuration (cli.ts
configureMulticoprocessorEnvs, lines 657–727):NUM_COPROCESSORSandCOPROCESSOR_THRESHOLDwritten to bothgateway-sc.localandhost-sc.local✅- Per-coprocessor
COPROCESSOR_TX_SENDER_ADDRESS_N,COPROCESSOR_SIGNER_ADDRESS_N,COPROCESSOR_S3_BUCKET_URL_Nderived fromcast walletusingCOPROCESSOR_ACCOUNT_INDICES(mnemonic indices 5, 8, 9, 10, 11) ✅ - Additional instance env files (
.env.coprocessor.N.local) created with uniqueDATABASE_URLandTX_SENDER_PRIVATE_KEY✅ - Behavior test
test_multicoprocessor_env_and_extra_instancesvalidates all of the above ✅
-
Deploy-time additional instances (cli.ts lines 1511–1515):
- After the
coprocessorstep, loopidx = 1tocoprocessorCount - 1 - Each instance gets generated compose file (
coprocessor-N.generated.yml) with service name renaming (coprocessor-→coprocessorN-) - DB migration → runtime services, with per-service readiness checks ✅
- After the
-
Resume ordering (cli.ts lines 1236–1257):
cleanupFromStepcollects components from resume index to end, then iterates in reverse order for teardown (line 1252:for (let i = components.length - 1; i >= 0; i -= 1))shouldSkipStepcorrectly skips all steps before the resume index- MinIO IP patch still applied when minio step is skipped but container is running (line 1496–1498)
- Behavior test
test_resume_preserves_prior_steps_and_restarts_tailvalidates: prior services untouched, teardown in reverse order, minio IP fetch before subsequent compose ups ✅
-
--onlysemantics (cli.ts lines 1259–1269):- Only the specified step's component is cleaned and redeployed
- All other steps skipped
- Behavior test
test_only_runs_single_stepvalidates ✅
-
--resume/--onlymutual exclusion (cli.ts lines 1428–1430):usageError("Cannot use --resume and --only together")✅
-
Multicoprocessor cleanup on
clean(cli.ts lines 1147–1148, 616–633):cleanupComponent("coprocessor")callscleanupAdditionalCoprocessorInstances()first, which discovers indices from both env files and running containers, then tears down each instance ✅
Note on fallback behavior: The fhevm-cli wrapper (lines 14–18) falls back to legacy for smoke command only. Previous versions also fell back for --coprocessors/--coprocessor-threshold but that fallback was removed — these flags are now handled by the Bun implementation. This is correct and validated by the behavior tests.
Verdict: Pass. Multicoprocessor topology, env derivation, resume/only ordering, and cleanup are all correctly implemented and tested.
Area 3: Telemetry/OTEL handling and error actionability
| Feature | Implementation | Actionable? | Evidence |
|---|---|---|---|
| Missing OTEL endpoint | ensureCoprocessorTelemetryEnv() defaults to http://jaeger:4317 if OTEL_EXPORTER_OTLP_ENDPOINT is missing |
✅ Warns: Missing OTEL_EXPORTER_OTLP_ENDPOINT. Defaulting to http://jaeger:4317. |
cli.ts line 820 |
| Quoted OTEL endpoint | readEnvValue strips single/double quotes |
✅ No error for "http://jaeger:4317" |
cli.ts lines 495–500; behavior test test_quoted_otel_endpoint_is_accepted |
| Invalid OTEL endpoint | new URL(endpoint) validates, throws with specific message |
✅ Invalid OTEL_EXPORTER_OTLP_ENDPOINT value |
cli.ts lines 823–828 |
--strict-otel |
Checks isContainerRunningExact("jaeger"), fails fast if not running |
✅ Error: Telemetry endpoint http://jaeger:4317 is configured but Jaeger is not running. Start tracing first: docker compose -f docker-compose/tracing-docker-compose.yml up -d |
cli.ts line 830–834; behavior test test_strict_otel_requires_jaeger |
telemetry-smoke |
Queries Jaeger API, compares against 6 required services, retries 6×5s=30s warm-up | ✅ Reports missing services by name: Missing Jaeger services: host-listener-poller, ... |
cli.ts lines 1567–1607 |
| OOM-killed | Detects exit code 137 and OOMKilled=true, provides resume command |
✅ looks OOM-killed + ./fhevm-cli deploy --resume coprocessor |
cli.ts lines 907–910; behavior test test_oom_failure_is_actionable |
| Key-bootstrap-not-ready | Detects 7 patterns (CrsNotGenerated, KeygenNotRequested, etc.) | ✅ Detected key-bootstrap-not-ready state + ./fhevm-cli deploy --resume gateway-sc |
cli.ts lines 892–915; behavior test test_key_bootstrap_failure_is_actionable |
| Runtime errors don't print help | CliUsageError class distinguishes argument errors from runtime failures |
✅ Only argument errors print Usage: banner |
cli.ts lines 37, 155, 1892; behavior tests assert_not_contains "Usage:" |
Required Jaeger services include poller:
TELEMETRY_REQUIRED_JAEGER_SERVICES (manifest.ts lines 334–341):
host-listener, host-listener-poller, tfhe-worker, txn-sender, sns-executor, zkproof-worker
And coprocessor readiness checks include the poller service:
{ service: "coprocessor-host-listener-poller", state: "running" }
(manifest.ts line 99) ✅
Verdict: Pass. All telemetry/OTEL paths produce actionable output with concrete next steps.
Area 4: Docs/help clarity for humans and agent-driven usage
CLI help completeness audit:
| Flag | In usage() |
In README | In Examples |
|---|---|---|---|
deploy --build |
✅ line 97 | ✅ line 172 | ✅ line 123 |
deploy --local/--dev |
✅ line 98 | ✅ line 173 | ✅ line 124 |
deploy --network |
✅ line 99 | ✅ line 174 | ✅ line 126 |
deploy --coprocessors |
✅ line 100 | ✅ line 175 | ✅ line 127 |
deploy --coprocessor-threshold |
✅ line 101 | ✅ line 176 | ✅ line 127 |
deploy --resume |
✅ line 102 | ✅ line 177 | ✅ line 128 |
deploy --only |
✅ line 103 | ✅ line 178 | ✅ line 129 |
deploy --telemetry-smoke |
✅ line 104 | ✅ line 179 | ✅ line 125 |
deploy --strict-otel |
✅ line 105 | ✅ line 180 | — |
test -v/-n/-g/-r/--no-hardhat-compile |
✅ lines 108–112 | ✅ lines 181–185 | ✅ lines 131, 134 |
clean --purge* flags |
✅ lines 115–119 | ✅ lines 186–190 | ✅ lines 138–139 |
telemetry-smoke |
✅ line 93 | ✅ line 194 | ✅ line 137 |
Agent-oriented UX:
--networkdisambiguation is clear: deploy = version profile, test = runtime network (README lines 197–199) ✅- Resume step list matches
DEPLOYMENT_STEPSexactly (README lines 210–211, 13 steps in order) ✅ - Troubleshooting sections give concrete commands (README lines 307–325) ✅
- CLI reference table is machine-parseable markdown (README lines 170–194) ✅
- Orchestration source of truth hierarchy documented (README lines 259–296) ✅
bun runshortcuts available viapackage.json✅
Minor gap: usage() line 89 lists 9 test types but TEST_TYPE_CONFIG has 13 entries. Missing from help: input-proof-compute-decrypt, public-decrypt-http-ebool, public-decrypt-http-mixed, paused-host-contracts, paused-gateway-contracts. These are discoverable from the README examples but not from fhevm-cli help.
Verdict: Pass. Docs and help are comprehensive and accurate. Minor gap in test type discoverability from help text.
Area 5: Regression risk vs legacy behavior
Parity validation infrastructure:
fhevm-cli-parity-diff.shruns 12 sampled flows (deploy, deploy --build, resume, only, pause, unpause, test input-proof, test operators, test debug, upgrade, logs, clean) through both legacy and Bun implementations with mocked Docker, then diffs command logs and exit codes ✅deploy-fhevm-stack.behavior.shruns 16 tests covering default flow, resume/only, build semantics, OOM/key-bootstrap hints, strict OTEL, network profiles, quoted env values, multicoprocessor topology, clean/purge, telemetry smoke, and a comprehensive flag/command matrix ✅
Safe migration path:
fhevm-cliwrapper: Bun → legacy fallback ifbunnot found ✅FHEVM_CLI_IMPL=legacyoverride works ✅smokecommand falls back to legacy (not yet implemented in Bun) ✅- All 4 legacy files preserved (
fhevm-cli.legacy,deploy-fhevm-stack.legacy.sh,deploy-manifest.sh,version-manifest.sh) ✅
Key behavioral equivalences verified:
- Step ordering: 13 steps in identical order ✅
- Compose project name:
-p fhevmin all compose calls ✅ - Env file handling: base →
.localcopy pattern preserved ✅ - MinIO IP patch timing: minio IP fetched after minio step, before coprocessor ✅
- Build flag:
--buildonly onbuildable: truesteps ✅ - Service checks: identical service names and expected states between
DEPLOYMENT_STEPS(manifest.ts) andFHEVM_DEPLOYMENT_MANIFEST(deploy-manifest.sh) ✅
One regression risk to watch: The Bun CLI now handles --coprocessors / --coprocessor-threshold directly (previously fell back to legacy). The fhevm-cli wrapper no longer has special-case fallback for these flags (lines 9–19 only fall back for smoke). This is intentional per the fix comments, and the behavior test test_multicoprocessor_env_and_extra_instances covers the Bun path. However, if a user had previously relied on legacy multicoprocessor behavior with additional features not yet ported to Bun, they would need FHEVM_CLI_IMPL=legacy.
Verdict: Pass. Parity infrastructure is solid. Migration path is safe with explicit legacy override.
Summary
| # | Criterion | Verdict |
|---|---|---|
| 1 | Cleanup/purge scope cannot impact unrelated docker setups | Pass (all paths are project/component-scoped) |
| 2 | Multicoprocessor behavior and resume ordering semantics | Pass (topology config, env derivation, resume/only, cleanup all correct and tested) |
| 3 | Telemetry/OTEL handling and error actionability | Pass (all failure modes produce actionable messages with next steps) |
| 4 | Docs/help clear for humans and agents | Pass (comprehensive and accurate) |
| 5 | Regression risk vs legacy behavior | Pass (strong parity infrastructure, safe migration path) |
Recommendation
Ready to merge. ✅
One P1 to address (as a follow-up or inline fix):
FHEVM_DOCKER_PROJECTenv var is referenced in the validation protocol but not supported by the code. Either wire it intomanifest.ts(one-line fix:export const PROJECT = process.env.FHEVM_DOCKER_PROJECT ?? "fhevm";) or remove it from the validation protocol documentation.
One P2 observation:
--purge-build-cacheand--purge-local-cacheare functionally identical. Consider consolidating to reduce confusion.
The core claims in this PR are validated: cleanup is safe and project-scoped, multicoprocessor semantics are correct, telemetry errors are actionable, and behavioral parity with legacy is maintained through automated testing.
·
|
Pushed latest updates in What changed in this push:
@claude please run a fresh review on this latest head commit with focus on:
|
|
Claude finished @Eikix's task in 4m 21s —— View job PR Review (Fresh Pass) — latest HEAD (
|
| Mechanism | Status | Evidence |
|---|---|---|
All compose commands use -p PROJECT |
✅ Scoped | Every docker compose invocation includes -p ${PROJECT} (manifest.ts line 37: defaults to "fhevm", overridable via FHEVM_DOCKER_PROJECT). |
| Container listing uses project label filter | ✅ Scoped | listProjectContainers() (line 808): --filter label=com.docker.compose.project=${PROJECT} |
| Container removal uses managed whitelist | ✅ Scoped | cleanupManagedProjectContainers() (line 867): regex whitelist `^(fhevm-minio |
--purge-images uses compose --rmi all |
✅ Scoped | purgeComponentImages() (line 1775): docker compose -p PROJECT -f <compose> down --rmi all. This only removes images referenced by the compose file, NOT docker image prune -af. This is a significant improvement from the prior version that used system-wide docker image prune -af. |
--purge-networks uses project label filter |
✅ Scoped | clean() (line 2611): docker network ls --filter label=com.docker.compose.project=${PROJECT}. Only fhevm-project networks are removed. |
--purge-build-cache / --purge-local-cache |
✅ Scoped | purgeLocalBuildxCache() (line 1822): removes .buildx-cache directory (resolved relative to FHEVM_DIR). This is a project-local directory, not a Docker system-wide operation. |
FHEVM_DOCKER_PROJECT validation |
✅ Correct | manifest.ts line 36: validated with /^[a-zA-Z0-9][a-zA-Z0-9_.-]*$/, falls back to "fhevm" if invalid. |
| Behavior test validates no system-wide prune | ✅ Covered | test_clean_purge_invokes_prunes (behavior test line 733–734): assert_not_contains "docker image prune -af" and assert_not_contains "docker builder prune -af". |
Verdict: Clean/purge safety is excellent. All purge operations are now scoped to fhevm compose services/project — no system-wide Docker prune commands remain. The prior P1 (docker image prune -af) has been replaced with compose-scoped --rmi all. The README sentinel safety check protocol (lines 402–420) provides a clear manual verification procedure for cautious operators.
Focus Area 2: Multicoprocessor Resume/Test Correctness
| Concern | Status | Evidence |
|---|---|---|
| Threshold validation | ✅ Correct | cli.ts line 2058: coprocessorThresholdOverride > coprocessorCount → usageError. |
| Max coprocessor bound | ✅ Correct | cli.ts line 2064: coprocessorCount > MAX_LOCAL_COPROCESSORS (5) → usageError. |
cast availability check |
✅ Correct | cli.ts line 963: checks cast --version before trying account derivation, throws clear error if missing. |
| Account index bounds | ✅ Correct | cli.ts line 968: coprocessorCount > COPROCESSOR_ACCOUNT_INDICES.length → throws error. |
| Per-instance env file creation | ✅ Correct | cli.ts lines 990–993: copies base coprocessor env to .env.coprocessor.{idx}.local, overrides DATABASE_URL and TX_SENDER_PRIVATE_KEY per instance. |
| Per-instance compose generation | ✅ Correct | cli.ts line 999: createGeneratedCoprocessorCompose(instanceIdx) generates coprocessor-{idx}.generated.yml. |
| Resume + multicoprocessor forces full reset | ✅ Safe (conservative) | cli.ts line 2106: when resumeStep >= coprocessor and multicoprocessor requested, forces resume from "minio" with warning. This ensures key material coherence. See P2 finding for nuance. |
| Cleanup includes additional instances | ✅ Correct | cli.ts line 852–863: cleanupAdditionalCoprocessorInstances() iterates all generated compose files and cleans each. |
| Stale env file cleanup | ✅ Correct | cli.ts line 903–910: removeStaleAdditionalCoprocessorEnvFiles() removes leftover per-instance envs when topology changes to n=1. |
| Gateway/host env topology propagation | ✅ Correct | cli.ts lines 948–951: NUM_COPROCESSORS and COPROCESSOR_THRESHOLD written to both gateway-sc and host-sc env files. |
| Behavior test coverage | ✅ Good | test_multicoprocessor_flags_are_validated (threshold > count rejection), test_multicoprocessor_env_and_extra_instances (n=2/t=2 topology: account derivation, per-instance env, generated compose). |
Verdict: Multicoprocessor implementation is correct and well-tested. The key safety invariant (coherent key material for new topology) is enforced by the forced minio reset. Per-instance env files, compose files, and cleanup are all properly handled.
One gap worth noting: No automated test covers multicoprocessor + --resume interaction (the forced reset to minio). This is validated by the resolveEffectiveResumeStep logic but would benefit from an explicit test case.
Focus Area 3: Docs Clarity for Human and Agent Users
| Criterion | Status | Evidence |
|---|---|---|
| CLI help lists all flags | ✅ Complete | All 17 deploy/test/clean flags appear in usage() (lines 89–122) with dedicated Deploy/Test/Clean option sections. |
| README CLI reference table | ✅ Complete | All flags documented with notes (lines 173–197). Agent-oriented instruction at line 171: "For agent workflows, prefer explicit command+flag forms from this table." |
--network disambiguation |
✅ Clear | Lines 200–202: explicit note that --network on deploy ≠ --network on test, "They are intentionally different and command-scoped." |
| Validation protocol | ✅ Excellent | Three-tier protocol (lines 350–420): (1) quick confidence protocol (10-20 min), (2) full QA matrix, (3) safety check for cleanup scope with sentinel containers. All three are copy-pasteable. |
| Docker project isolation | ✅ Clear | Dedicated section (lines 422–432) with example. Quick protocol starts with export FHEVM_DOCKER_PROJECT=fhevm-pr. |
| Purge scoping documented | ✅ Clear | Lines 104–107: explicit statement that all purge flags are fhevm-scoped. Each flag's scope described. |
| Multicoprocessor prerequisites | ✅ Documented | Line 109: cast requirement for --coprocessors N > 1 stated directly after quickstart. |
| Troubleshooting actionability | ✅ Good | OOM, key-bootstrap, and gateway conflict all have symptom → action pairs with exact resume commands (lines 310–328). |
| Source of truth hierarchy | ✅ Clear | Lines 262–308: file hierarchy (manifest.ts → cli.ts → wrappers → legacy) and version override strategy documented. |
| Legacy fallback path | ✅ Documented | Line 283–287: FHEVM_CLI_IMPL=legacy override with example. |
One minor observation: The help output (line 92) lists only 9 test types but TEST_TYPE_CONFIG in manifest.ts has 14 entries. The command matrix test (behavior test line 785) exercises all 14, which is good. Users relying only on fhevm-cli help won't discover types like input-proof-compute-decrypt, paused-host-contracts, paused-gateway-contracts. Not blocking — the README examples show additional types.
Verdict: Documentation is excellent for both human and agent use. The three-tier validation protocol is particularly strong — it provides copy-pasteable sequences for quick confidence, full QA, and cleanup safety verification. The sentinel-based safety check is a creative and practical approach to proving cleanup isolation.
Regression Risk Assessment
| Area | Risk | Rationale |
|---|---|---|
| Existing deploy/test/clean flows | None | All 13 deployment steps preserved in order. Parity diff tests validate command traces and exit codes against legacy. |
| Resume/only semantics | None | Behavior tests cover default flow, resume from mid-stack, only-step isolation. MinIO IP patch timing validated by ordering assertions. |
| Env patch timing | None | Deploy function ordering is deterministic: network profile → local build → env prep → telemetry env → legacy compat → multicoprocessor → deploy loop. |
| Legacy fallback | None | fhevm-cli wrapper (line 14): smoke command falls back to legacy. Bun-not-found falls back to legacy. FHEVM_CLI_IMPL=legacy override works. |
| CI | None | All CI jobs green per Mergify report. |
Recommendation
Ready to merge. ✅
This commit (3eea4e3c) represents a significant improvement to clean/purge safety compared to the earlier version reviewed. The most important change: --purge-images now uses compose-scoped docker compose down --rmi all instead of the system-wide docker image prune -af that was flagged as P1 in the initial review. All purge operations are now correctly scoped to fhevm compose resources.
Summary of what's been validated across all review passes:
- 19 files, ~5,900 lines added, read line-by-line across 5 review iterations
- Clean/purge: all 4 purge flags are fhevm-scoped (no system-wide Docker operations remain)
- Multicoprocessor: topology config, account derivation, per-instance env/compose, cleanup all correct
- Resume semantics: preserved with conservative safety (multicoprocessor forces full reset)
- Docs: comprehensive, agent-oriented, includes sentinel-based safety verification protocol
- Behavior tests: 15 test functions covering all major paths including purge scoping negative tests
- Parity diff: 12 sampled flows validated against legacy implementation
- No regressions found in any deploy/test/clean flow
P2 follow-ups (non-blocking):
- Consolidate
--purge-build-cache/--purge-local-cache(they're identical) - Consider less aggressive resume adjustment for multicoprocessor (database instead of minio)
· Branch
|
Addressed Claude follow-ups in What I changed:
Self-review / validation:
@claude /pr-review
|
|
Claude finished @Eikix's task in 5m 46s —— View job PR Review (Incremental Pass) — commit
|
| Area | Status | Evidence |
|---|---|---|
| Clean default (no flags) | No regression | clean() line 2607: cleanupKnownStack(true) runs regardless of flags. Purge paths are gated behind if checks. |
--purge expansion |
Correct | parseCleanArgs line 2576–2580: sets 3 booleans. No purgeLocalCache in the set — it correctly maps to purgeBuildCache. |
--purge-local-cache alias |
Correct | parseCleanArgs line 2594–2596: maps to purgeBuildCache = true. Identical effect to --purge-build-cache. |
| Deploy resume semantics | No regression | resolveEffectiveResumeStep logic unchanged — only the warning message text was improved. |
| Test type parsing | No regression | parseTestArgs still validates against TEST_TYPE_CONFIG keys. Dynamic help listing reads from the same source. |
| Parity diff | Passing | Per validation: fhevm-cli-parity-diff.sh ✅ |
| Behavior tests | Passing | Per validation: deploy-fhevm-stack.behavior.sh ✅ (17 tests) |
Cross-reference: All prior review findings status
| Finding | Status |
|---|---|
P1: usage() on all errors |
✅ Fixed (commit e50e38d5) — CliUsageError path |
P1: docker image prune -af system-wide |
✅ Fixed (commit 3eea4e3c) — compose-scoped --rmi all |
P1: FHEVM_DOCKER_PROJECT not wired |
✅ Fixed (commit 3eea4e3c) — manifest.ts reads env |
P2: readEnvValue quote stripping |
✅ Fixed (commit e50e38d5) |
P2: --purge-build-cache / --purge-local-cache redundancy |
✅ Fixed (this commit cdff7de2) — alias relationship clarified |
P2: --purge-local-cache CWD resolution |
✅ Fixed (commit 3eea4e3c) — resolves relative to FHEVM_DIR |
| P2: Multicoprocessor+resume messaging | ✅ Improved (this commit cdff7de2) — intentional reset semantics explained |
| P3: Help test type discoverability | ✅ Fixed (this commit cdff7de2) — dynamic from TEST_TYPE_CONFIG |
| P3: Sleep spawning external process | ✅ Fixed (commit e50e38d5) — Atomics.wait |
| P3: Missing Linux Chromium paths | Open (P3, non-blocking) |
P3: htmlDecode limited entity set |
Open (P3, non-blocking) |
Recommendation
Ready to merge. ✅
All three focus areas have been correctly addressed:
- Clean flag UX consistency:
--purge-local-cacheis now explicitly an alias of--purge-build-cache. Single code path (purgeBuildCache), clean type definition (3 fields), alias labeled in help/README/table. No duplication. - Multicoprocessor+resume forced-reset: Warning message now explains what, why, and intent. README documents the safety invariant. Dedicated behavior test (
test_multicoprocessor_resume_forces_minio_reset) validates the forced reset path. - Help/docs test type discoverability:
usage()now dynamically lists all 14 test types fromTEST_TYPE_CONFIG. Behavior test asserts previously-missing types appear in help output. Full 14-type matrix test exercises all types.
No regressions found. All prior P1/P2 findings from the review history are resolved. Only P3 cosmetic/edge-case items remain open (Linux Chromium paths, HTML entity decoding, synopsis line width) — none of which affect correctness or local UX.
· · Branch
Run 40 scenarios (S01-S40) testing the Bun CLI from codex/fhevm-orchestration-parity-refactor against all 5 personas (new dev, dapp dev, infra/telemetry, CI operator, multi-stack user). Key findings: - CLI uses deploy/clean (not up/down as spec assumed) - No trace subcommand or --no-tracing flag exists - --network testnet/mainnet Grafana scrape needs --no-sandbox - FHEVM_DOCKER_PROJECT rejects uppercase (Docker Compose rule) - Argument parsing, validation, resume logic all work correctly - Clean operations are idempotent and project-scoped - 5-cycle restart loop is deterministic with no leaks Verdict: BLOCKED on command naming parity and missing trace subcommand; CLI logic itself is architecturally sound. https://claude.ai/code/session_011ktTrkokU6NHgPsiipatnb
|
@claude /pr-review Scope for this review:
What to verify:
Validation already run locally:
Output format:
|
|
Claude finished @Eikix's task in 5m 41s —— View job PR Review: Module split refactor + latest HEADReview scope: Module split (no intended behavior changes), Bun CLI command module extraction, deploy/network-profile extraction, test fixture hardening. Also covers I read all 12 Bun module files line-by-line and both test harnesses, tracing the full import graph and DI wiring. Results below.
Module Architecture SummaryThe monolithic Import graph is strictly acyclic ✅ — leaf nodes ( Findings (ordered by severity)P2 —
|
| # | Criterion | Verdict | Evidence |
|---|---|---|---|
| 1 | No behavior drift vs legacy orchestration | Pass | Deploy function structure, env patch ordering, compose step logic all identical. |
| 2 | --resume/--only/env patch timing unchanged |
Pass | All three deploy-option handlers are extracted line-for-line. DI wiring verified. |
| 3 | Network profile fetch/cache preserved exactly | Pass | Full extraction with identical Grafana API flow, cache merge semantics, and fallback behavior. |
| 4 | Module boundaries coherent, no circular/implicit coupling | Pass | Acyclic import graph. Explicit DI interfaces. No circular deps. |
| 5 | Tests provide robust parity coverage after split | Pass | Wildcard glob copy. 17 behavior tests + 12 parity diff tests all pass per author validation. |
Recommendation
Ready to merge. ✅
The module split is a clean, well-executed structural refactor with no behavior changes. The dependency-injection pattern via factory functions (createCommandHandlers, createDeployOptionHandlers, createNetworkProfileHandlers) is idiomatic and testable. The import graph is acyclic. Test fixture hardening with glob patterns eliminates maintenance burden for future module additions.
P2 items for follow-up:
- Remove duplicate
isContainerRunning/isContainerRunningExact(identical functions) - Consolidate
CommandResult/RunResulttype duplication - Decide whether gateway-contracts idempotency changes belong in this PR or a separate one
P3 items:
4. Remove unused isJaegerRunning() wrapper
5. Follow-up extraction of deploy orchestration from cli.ts into command-deploy.ts (acknowledged by author)
· Branch
|
Addressed Claude’s module-split follow-ups in What I changed:
Validation rerun after patch:
|
|
@claude /pr-review Please review latest commit Focus:
Validation rerun locally:
|
|
Claude finished @Eikix's task in 3m 41s —— View job PR Review (Incremental Pass) — commit
|
| # | Criterion | Verdict | Evidence |
|---|---|---|---|
| 1 | Duplicate helper cleanup correct | Pass | isContainerRunning and isJaegerRunning fully removed. Single isContainerRunningExact definition with 4 direct + DI-based usage sites. |
| 2 | Type consolidation correct | Pass | CommandResult removed. RunResult from process.ts is the single source of truth, imported by command-contracts.ts. |
| 3 | Scope restored | Pass | git diff origin/main...HEAD returns empty for all 4 gateway/host task files. |
| 4 | Relayer keyurl parity preserved | Pass | Default http://fhevm-relayer:3000/v2 fallback, IP-based retry, reachability verification, 24×5s retry window. Behavior test covers hostname-failure → IP-fallback path. |
| 5 | No test command flow drift | Pass | runTests dispatch, arg parsing, relayer readiness gate, --no-hardhat-compile, retryOnProofRejected, and NO_RELAYER env injection all structurally identical to prior versions. |
Recommendation
Ready to merge. ✅
All four review focus areas are clean:
- Duplicate functions and types are fully consolidated (single
isContainerRunningExact, singleRunResult) - Gateway/host task files are untouched — PR scope is strictly test-suite orchestration
- Relayer keyurl readiness preserves legacy default with a well-tested IP fallback retry path
- Test command flow has no behavior drift
No new P0/P1/P2 findings. The only P3 is a minor redundant name resolution call — cosmetic, non-blocking.
· Branch
|
Addressed the remaining P3 nit from Claude in Change:
Validation rerun:
|
|
Fixed the telemetry port mismatch reported in local repro ( Commit: What changed:
Behavior now:
Validation:
|
49f7d7f to
2977715
Compare
|
will pause this effort, reconvene with QA team + Amina and let's see what we do |
2977715 to
ed4ae52
Compare
Summary
This PR ports
test-suite/fhevmorchestration from shell-heavy logic to Bun/TypeScript in the monorepo, while keeping legacy entrypoints available during migration.Goal: improve maintainability and reduce CLI flakiness without changing intended operator workflows.
What Changed
1) Bun orchestration runtime (core)
test-suite/fhevm/scripts/bun/*.cli.ts(orchestration coordinator)manifest.ts(single source of truth for steps/services/versions)deploy-options.ts,network-profile.tsdeploy-env.ts(env preparation + compatibility sync)deploy-runtime.ts(compose/wait/bootstrap runtime orchestration)local-build.ts(local build cache + worktree build-context handling)command-test.ts,command-clean.ts,command-trace.ts,command-admin.ts)2) Entrypoints + migration safety
test-suite/fhevm/fhevm-clitest-suite/fhevm/scripts/deploy-fhevm-stack.shtest-suite/fhevm/fhevm-cli.legacytest-suite/fhevm/scripts/deploy-fhevm-stack.legacy.sh3) Compose/env wiring updates
setup-kms-signer-address.shintegration points used by deploy flow.4) Parity/behavior harnesses
test-suite/fhevm/scripts/tests/deploy-fhevm-stack.behavior.shtest-suite/fhevm/scripts/tests/fhevm-cli-parity-diff.sh5) Docs/help updates
test-suite/README.mdand CLI help so deploy/test/clean flags and semantics are explicit.User-Facing Behavior Covered
up/deploy,test,pause/unpause,upgrade,logs,clean,trace,telemetry-smoke.--resumeand--only.--local) keeps BuildKit cache behavior and now handles git worktrees by creating a temporary build context.deploy --network testnet|mainnet.clean --purge-local-cachesupport (alias of build-cache purge).Out of Scope
Validation
Review Guide (fast path)
Review commit-by-commit in this order:
feat(test-suite): introduce bun orchestration CLI with legacy fallbackfeat(test-suite): wire compose manifests for project-scoped orchestrationtest(test-suite): add bun-vs-legacy parity and behavior harnessesdocs(test-suite): document bun orchestration workflow and QA guidanceTracking
Refs: https://github.com/zama-ai/fhevm-internal/issues/1042