Skip to content

feat(host-contracts): implement context-aware KMSVerifier#2028

Merged
obatirou merged 12 commits intofeat/kms-context-switchfrom
context-aware-kms-verifier
Feb 27, 2026
Merged

feat(host-contracts): implement context-aware KMSVerifier#2028
obatirou merged 12 commits intofeat/kms-context-switchfrom
context-aware-kms-verifier

Conversation

@obatirou
Copy link
Copy Markdown
Contributor

@obatirou obatirou commented Feb 25, 2026

Closes https://github.com/zama-ai/fhevm-internal/issues/1062

Add per-context signer sets and thresholds so that old KMS contexts remain verifiable after governance rotates signers via defineNewContext. Includes destroyKmsContext, v0/v1 extraData parsing, reinitializeV2 migration path, legacy view redirection, and IKMSVerifier interface updates.

@cla-bot cla-bot bot added the cla-signed label Feb 25, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Feb 25, 2026

🧪 CI Insights

Here's what we observed from your CI run for 8774604.

🟢 All jobs passed!

But CI Insights is watching 👀

@obatirou obatirou force-pushed the context-aware-kms-verifier branch from 719134c to abc448e Compare February 26, 2026 08:46
@obatirou obatirou changed the title feat(host-contracts): implement context-aware signer storage and veri… feat(host-contracts): implement context-aware KMSVerifier Feb 26, 2026
@obatirou obatirou force-pushed the context-aware-kms-verifier branch 2 times, most recently from 91aa0da to 92fece8 Compare February 26, 2026 09:19
Add per-context signer sets and thresholds so that old KMS contexts
remain verifiable after governance rotates signers via defineNewContext.
Includes destroyKmsContext, v0/v1 extraData parsing, reinitializeV2
migration path, legacy view redirection, and IKMSVerifier interface
updates.
@obatirou obatirou force-pushed the context-aware-kms-verifier branch from 92fece8 to f668449 Compare February 26, 2026 09:33
Remove redundant tests and merge overlapping ones (55 → 47) without
losing coverage. Removed tests were either strict subsets of other
tests or exercised identical code paths.
@obatirou
Copy link
Copy Markdown
Contributor Author

@claude /pr-review

@claude
Copy link
Copy Markdown

claude bot commented Feb 26, 2026

Claude finished @obatirou's task in 7m 8s —— View job


PR Review: Context-Aware KMSVerifier

  • Pre-review validation
  • Gather context (PR details, guidelines, linked issue)
  • Run parallel review agents (bug hunting, guidelines, error handling)
  • Validate findings
  • Post final review

Summary

[AUTOMATED]

This PR implements context-aware KMS verification, allowing per-context signer sets and thresholds so that old KMS contexts remain verifiable after governance rotates signers. Two independent bug-hunting passes (opus), an error handling audit, and a test coverage analysis were run in parallel.

No critical or high-severity bugs were found. The core security properties are correctly implemented:

  • ✅ Context isolation (cross-context signer rejection via per-context mappings)
  • ✅ EIP-712 signature binding (extraData is part of the signed digest, preventing context ID swapping)
  • ✅ Transient storage cleanup on all non-reverting paths (reverts auto-rollback tstore)
  • ✅ Upgrade safety (reinitializer(3) guard, intentional __EIP712_init omission in reinitializeV2)
  • ✅ Storage layout (new fields appended at offsets +3 through +7, no collision with legacy slots)
  • ✅ Access control on all state-mutating functions (onlyACLOwner)
  • ✅ Assembly correctness (mload(add(extraData, 33)) correctly reads 32 bytes at content offset 1)

🟡 Design Observations (Non-blocking)

1. setThreshold mutates a live context's threshold, breaking invariant of context immutability (Medium, Confidence: 85)

KMSVerifier.sol L201-205

setThreshold modifies the threshold of the current context in-place without creating a new context. Unlike defineNewContext (which increments the counter and preserves old contexts), setThreshold retroactively changes the rules. Any in-flight decryption proofs signed under the old threshold (via v0 or v1 extraData) will be rejected if the threshold was raised. This breaks the immutability invariant that the rest of the context-aware design carefully preserves.

Consider whether setThreshold should instead call defineNewContext with the existing signers and the new threshold, preserving the old threshold for in-flight proofs.

2. setThreshold emits NewContextSet — potentially misleading for indexers (Low, Confidence: 90)

KMSVerifier.sol L204

setThreshold emits NewContextSet when only the threshold changes (no new context ID is created). Downstream indexers keying on NewContextSet may interpret this as a context rotation. Consider a dedicated ThresholdUpdated event or documenting this clearly.

3. Inconsistent return-vs-revert in signature verification (Low, Informational, Confidence: 95)

KMSVerifier.sol L497-537

_verifySignaturesDigestForContext reverts for invalid context, zero sigs, below-threshold count, and invalid signer — but returns false if duplicates reduce the unique count below threshold. This asymmetry is by design (matching existing behavior, confirmed by test_VerifyDecryptionEIP712KMSSignaturesFailAsExpectedIfSameSignerIsUsedTwice), but the natspec should document this explicitly for integrators.

4. v0 extraData TOCTOU in batched transactions (Low, Informational, Confidence: 85)

KMSVerifier.sol L464-470

v0 extraData (0x00 or empty) resolves to currentKmsContextId at call time. In batched/multicall scenarios, if defineNewContext is called earlier in the same transaction, subsequent v0-routed verifications would resolve to the new context. Using v1 extraData with explicit context IDs eliminates this. Not a bug per se, but worth documenting for integrators.


🧪 Test Coverage Gaps (Recommended additions)

The test suite is thorough with 40+ tests covering context management, verification, migration, and edge cases. However, three high-priority gaps were identified:

1. v1 extraData with trailing bytes not tested (High priority)

The contract explicitly documents forward-compatibility at L474-475: "Trailing bytes after byte 33 are ignored for forward-compatibility." There's test_V0ExtraDataWithTrailingBytesUsesCurrentContext but no v1 equivalent. A test with abi.encodePacked(uint8(0x01), contextId, uint256(12345)) would exercise this.

Add v1 trailing bytes test →

2. getContextSignersAndThresholdFromExtraData has zero direct tests (High priority)

This is the new external function called by FHE.sol. It is never called directly in the test file — only indirectly through verification flow. Specifically untested: revert on destroyed contexts (L344-345), revert on non-existent contexts, v0 path, v1 path, malformed v1, unsupported version.

Add getContextSignersAndThreshold tests →

3. reinitializeV2 after initializeFromEmptyProxy not tested (High priority)

Both use reinitializer(3). There's test_ReinitializeV2CannotBeCalledTwice for double-reinitializeV2, but no test verifying that reinitializeV2() reverts on a contract already initialized via initializeFromEmptyProxy().

Add reinitialize guard test →

Medium priority gaps:

  • reinitializeV2 ownership test (non-owner should be rejected)
  • initializeFromEmptyProxy with duplicate/null signers or invalid thresholds
  • Threshold == signers count edge case (requires all signatures)
  • Multiple context transitions (5+ contexts) stress test

✅ Verified Correct

Concern Result
_extractKmsContextId assembly (mload(add(extraData, 33))) ✅ Correct — reads 32 bytes at memory offset 33 (after 32-byte length + 1-byte version)
reinitializeV2 omitting __EIP712_init ✅ Intentional — V2 already initialized EIP-712 domain; annotation confirms
Context ID overflow ✅ Not a concern — Solidity 0.8+ reverts on overflow; ~10⁷⁷ rotations needed
ERC-7201 storage layout ✅ No collisions — new fields at offsets +3 through +7
FHE.sol ↔ KMSVerifier consistency ✅ Same EIP-712 digest; single getContextSignersAndThresholdFromExtraData call replaces two separate calls
Transient storage reentrancy ✅ Safe — cleanup on all non-revert paths; EVM revert rolls back tstore
_setContextSigners append-only enforcement ✅ Enforced implicitly — second call for same context would revert with KMSAlreadySigner
destroyKmsContext storage cleanup ✅ Not needed — context IDs are monotonic, never reused
Signature malleability ✅ Handled by OZ ECDSA.recover (KMSVerifier) and FhevmECDSA (FHE.sol)
calldatamemory conversion in getContextSignersAndThresholdFromExtraData ✅ Solidity handles automatically

Verdict

The contract implementation is solid and well-tested. The core security properties (context isolation, signature binding, upgrade safety) are correctly implemented. The design observation about setThreshold mutating live contexts is worth discussing but is a design choice, not a bug. The three high-priority test gaps would strengthen confidence and are recommended before merge.

LGTM 🟢 — with the suggested test coverage improvements.


@zama-ai zama-ai deleted a comment from claude bot Feb 26, 2026
… external API surface

- Add v1 extraData trailing bytes test (symmetric with existing v0 test)
- Add getContextSignersAndThresholdFromExtraData happy path covering v0,
  v1, and v1-against-old-context return values
- Add destroyed context revert test for the same function
@obatirou obatirou marked this pull request as ready for review February 26, 2026 10:55
@obatirou obatirou requested review from a team as code owners February 26, 2026 10:55
Rename the revert in _extractKmsContextId to DeserializingExtraDataFail
so callers can distinguish a malformed extraData payload from a truncated
decryption proof envelope (DeserializingDecryptionProofFail).
…fier tests

Factor out the repeated context rotation setup (upgrade with 3 signers,
then define a second context with signer3) into a shared helper used
by 7 tests.
…y helper in KMSVerifier tests

Factor out the repeated fresh-proxy deploy+upgrade block used by all
4 reinitializeV2 tests into a shared helper.
Copy link
Copy Markdown
Contributor

@isaacdecoded isaacdecoded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@obatirou obatirou merged commit 5f2a07b into feat/kms-context-switch Feb 27, 2026
52 checks passed
@obatirou obatirou deleted the context-aware-kms-verifier branch February 27, 2026 14:39
@obatirou
Copy link
Copy Markdown
Contributor Author

Just for posterity: isValidKmsContext was removed following the fact that Connector will consider a new context activated as soon as it sees the request and not rely on a KMSVerifier call for verification

isaacdecoded added a commit that referenced this pull request Mar 20, 2026
* feat(host-contracts): implement context-aware KMSVerifier (#2028)

* feat(kms-connector): context-aware extraData handling for decryption (#2032)

* chore(kms-connector): rename fhe module to handle

* chore(kms-connector): add and use helper function

* chore(kms-connector): add kms_context table

* chore(kms-connector): prepare ethereum listener

* feat(kms-connector): kms context validation

* chore(kms-connector): kms context tests

* chore(kms-connector): ethereum listener termination

* feat(gateway-contracts): implement context-aware KMS node configs and decryption

* feat: implement context-aware KMS node configs and decryption

* chore(gateway-contracts): apply a few arguments renaming

* fix(gateway-contracts): refresh rust bindings

* chore(gateway-contracts): reuse setter methods and adjust NatSpecs

* chore(gateway-contracts): refresh rust bindings

* refactor: apply suggested naming

* refactor(gateway-contracts): apply suggested renaming

* refactor: revert updateKmsContext naming

* refactor(gateway-contracts): enable decryption upgrade workflow

* chore(gateway-contracts): refresh bindings

* chore(test-suite): introduce getExtraData() method from SDK

* chore(test-suite): restore missed user decrypt ebool test case

* feat(kms-connector): propagate empty extra_data for 0x00

* feat(kms-connector): propagate empty extra_data for 0x00

* chore(kms-connector): add TODO comment for the workaround and upgrade quinn-proto

* chore(kms-connector): add TODO comment for the workaround and upgrade quinn-proto

* chore(kms-connector): use dedicated core config for tests

---------

Co-authored-by: Simon Eudeline <simon.eudeline@zama.ai>

* chore(test-suite): upgrade relayer-sdk version

* chore(test-suite): upgrade test-suite version in fhevm-cli

---------

Co-authored-by: Oba <obatirou@gmail.com>
Co-authored-by: Simon E. <simon.eudeline@zama.ai>
mergify bot pushed a commit that referenced this pull request Mar 23, 2026
* feat(host-contracts): implement context-aware KMSVerifier (#2028)

* feat(kms-connector): context-aware extraData handling for decryption (#2032)

* chore(kms-connector): rename fhe module to handle

* chore(kms-connector): add and use helper function

* chore(kms-connector): add kms_context table

* chore(kms-connector): prepare ethereum listener

* feat(kms-connector): kms context validation

* chore(kms-connector): kms context tests

* chore(kms-connector): ethereum listener termination

* feat(gateway-contracts): implement context-aware KMS node configs and decryption

* feat: implement context-aware KMS node configs and decryption

* chore(gateway-contracts): apply a few arguments renaming

* fix(gateway-contracts): refresh rust bindings

* chore(gateway-contracts): reuse setter methods and adjust NatSpecs

* chore(gateway-contracts): refresh rust bindings

* refactor: apply suggested naming

* refactor(gateway-contracts): apply suggested renaming

* refactor: revert updateKmsContext naming

* refactor(gateway-contracts): enable decryption upgrade workflow

* fix(host-contracts): suppress NewContextSet event on init/reinit (#2040)

fix(host-contracts): suppress NewContextSet event during init/reinit

Extract _defineContext internal helper so init and reinit paths set
context state without emitting NewContextSet, preventing spurious
events that cause context/epoch ID drift in KMS core.

* chore(kms-connector): helm chart update (#2097)

* chore(coprocessor): remove legacy tfhe-worker gRPC path (#1982)

* chore(coprocessor): remove legacy tfhe-worker grpc path

* fix(tfhe-worker): resolve clippy dead_code in bench/test utils

* refactor(tfhe-worker): remove unused computation module

* test(tfhe-worker): cap event operator coverage at uint64

* fix(coprocessor): address review noise and typos

* chore(tfhe-worker): reduce bench fmt churn in dex migration

* chore(tfhe-worker): revert formatting-only bench_id wraps

* chore(tfhe-worker): remove remaining bench format-only churn

* bench(tfhe-worker): restore dex workload parity with legacy grpc

* test(tfhe-worker): restore non-ignored coverage after grpc removal

* test(tfhe-worker): deduplicate operator event coverage

* test(tfhe-worker): harden event test stability

* test(tfhe-worker): run full event type matrix in CI

* test(tfhe-worker): default full event matrix with mode logging

* test(tfhe-worker): simplify event matrix selection

* docs(tfhe-worker): document event test matrix modes

* test(tfhe-worker): expand random event tests across types

* test(tfhe-worker): restore random type matrix parity

* test(tfhe-worker): use query! in invalid operation event test

* fix(bench): stabilize benchmark pipeline after grpc refactor

* fix(bench): allow dex setup trivial encrypt handles

* charts: bump coprocessor chart version

* tfhe-worker: propagate gpu feature to test-harness

* test(tfhe-worker): allow dependent schedule setup handle

* test(tfhe-worker): fix event test matrix CI regressions

* refactor(tfhe-worker): deduplicate test helpers and remove dead code

- Migrate operators_from_events.rs to use shared event_helpers
  (setup_event_harness, next_handle, to_ty, tfhe_event, log_with_tx)
- Remove duplicate test_invalid_operation_marks_error (kept in errors.rs)
- Move wait_for_error to event_helpers for shared use
- Extract TEST_CHAIN_ID const, remove debug eprintln calls
- Remove 16 dead CoprocessorError variants from types.rs

* refactor(tfhe-worker): destructure EventHarness to reduce PR diff

Destructure setup_event_harness() return into {app, pool, listener_db}
so variable names match the original code, minimising the review diff.

* chore(tfhe-worker): remove dead deps and batch event test waits

Remove 6 Cargo dependencies that were only used by the deleted gRPC
server (sha3, lru, rayon, tfhe-zk-pok, regex, actix-web).

Restructure 4 event tests (unary, cast, if-then-else, rand) to use
batch-then-wait pattern: insert all events first, call
wait_until_all_allowed_handles_computed once, then verify. This
eliminates ~200 redundant waits in CI, saving ~10 minutes of sleep.

Also remove unnecessary pub(super) from test_fhe_rand_events.

* refactor(tfhe-worker): address PR review feedback

- Upgrade as_scalar_uint to accept &BigInt directly
- Deduplicate helpers in operators_from_events.rs (delete
  insert_tfhe_event, allow_handle, as_scalar_uint copies; use
  event_helpers versions)
- Delete redundant test_fhe_rand_events (subset of random.rs tests)
- Expand test_op_trivial_encrypt to cover all supported types with
  edge-case values
- Add 5 error test scenarios: circular dependency, too many inputs,
  scalar division by zero, binary boolean inputs, unary boolean inputs

* fix(tfhe-worker): replace validation-time error tests with execution-time ones

Remove 3 error tests (circular dependency, too many inputs, scalar div
by zero) that trigger validation-time errors in check_fhe_operand_types.
These errors propagate via ? without being persisted to the DB, causing
an infinite retry loop in event-driven mode.

Replace with test_type_mismatch_error (FheAdd on uint8 + uint16) which
passes validation but properly fails at execution time with
UnsupportedFheTypes.

The validation-path error propagation is tracked as a separate issue.

* docs: update FHE computation diagram to reflect event-driven architecture

Replace the obsolete AsyncCompute gRPC flow with the current
host-listener event-driven architecture in the sequence diagram.

* fix(tfhe-worker): fix GPU test failures in error and random bounded tests

test_coprocessor_computation_errors: Replace Cast-to-type-255 with
FheSub on mismatched types (uint32 + uint64).  The old test panicked on
the GPU path during memory reservation in trivial_encrypt_be_bytes,
preventing the error from being persisted to the DB.  Type-mismatch
errors return a proper Result::Err on both CPU and GPU.

test_fhe_random_bounded: Use per-type bounds from the old gRPC test
instead of upper_bound=1.  The 0-random-bits edge case (bound=1)
behaves differently on GPU vs CPU.  Also check bool results as
true/false rather than assuming a specific numeric value, since CPU
and GPU produce different deterministic outputs for the same seed.

* docs(tfhe-worker): fix stale README heading after gRPC removal

The server was removed; only the background worker remains.

* test(coprocessor): strengthen error and random bounded test assertions (#2029)

Error tests now assert the specific error message instead of only
checking is_error == true.

The bounded random test now generates two samples per type with different
seeds and asserts they differ, catching any constant-output RNG
implementation including always-zero.

Closes zama-ai/fhevm-internal#1077

* fix(coprocessor): force compress/decompress for all ciphertexts (#2036)

* fix(coprocessor): decompress all ciphertexts per operation

* fix(coprocessor): sanity-check that only scalars are uncompressed

* fix(coprocessor): add compressed ct type

* fix(coprocessor): propagate DecompressionError

* fix(host-contracts): add domain separator and prev block hash to handle hashing (#2014)

* fix(host-contracts): add missing domain separator when hashing to construct handles

* fix(host-contracts): update rust bindings

* feat(host-contracts): add previous block hash to the hashes used to generate computed handles

* ci(host-contracts, gateway-contracts): auto-detect contract upgrades (#2037)

* ci(host-contracts): add cross-version upgrade test workflow (#1097)

Add CI that deploys host-contracts from v0.11.0 via Docker, then
upgrades each contract (ACL, FHEVMExecutor, KMSVerifier, InputVerifier,
HCULimit) to the current branch using hardhat upgrade tasks.

Unlike the gateway-contracts equivalent, all upgrade steps are enabled.

* fix(ci): correct misleading CHAIN_ID_GATEWAY comment

* ci(host-contracts): skip upgrades for contracts with unchanged reinitializer

Only FHEVMExecutor (2→3) and HCULimit (2→3) actually bumped their
REINITIALIZER_VERSION between v0.11.0 and current. ACL (3→3),
KMSVerifier (2→2), and InputVerifier (3→3) are unchanged, so their
reinitializeV2 would revert with InvalidInitialization.

* ci: add actionlint ignore for host-contracts-upgrade-tests

Same constant-condition exemption as gateway-contracts-upgrade-tests,
needed for the `if: false` on skipped upgrade steps.

* ci(host-contracts): auto-detect contract upgrades via REINITIALIZER_VERSION

Replace 5 hardcoded upgrade steps (with manual `if: false` guards) with
a single loop that compares REINITIALIZER_VERSION between the previous
release and current code. Only contracts whose version actually changed
are upgraded.

- Add upgrade-manifest.json as single source of truth for upgradeable contracts
- Extract PREVIOUS_RELEASE_TAG to env var (one place to bump per release)
- Remove actionlint exemption (no more constant `if: false` conditions)

Refs: zama-ai/fhevm-internal#379

* ci(gateway-contracts): auto-detect contract upgrades via REINITIALIZER_VERSION

Apply the same auto-detection pattern from host-contracts to gateway:
replace 7 hardcoded upgrade steps (all with `if: false`) with a single
loop that compares REINITIALIZER_VERSION between the previous release
and current code.

- Add upgrade-manifest.json listing all 7 upgradeable gateway contracts
- Extract PREVIOUS_RELEASE_TAG to env var (one place to bump per release)
- Remove last actionlint exemption for constant `if: false` conditions

Closes: zama-ai/fhevm-internal#379

* fix(ci): skip upgrade for contracts not present in previous release

A contract that's new (not in the previous release tag) has no previous
deployment to upgrade from. Without this guard, the loop would attempt
an upgrade because the missing file defaults to version 0, which differs
from the current version.

Verified against the v0.9.8 → v0.10.0 gateway cycle: ProtocolPayment
is correctly skipped (didn't exist in v0.9.8), matching the original
manual `if: false` behavior.

* ci: align deployment check steps between host and gateway workflows

* ci: rename PREVIOUS_RELEASE_TAG to UPGRADE_FROM_TAG, bump gateway to v0.11.0

* ci: add inline comments to upgrade loop for readability

* ci: verify contract versions with cast call after upgrades

* ci: assert getVersion() matches expected version from source constants

* ci: replace associative arrays with indirect expansion for bash 3 compat

* ci: strip quotes from cast call output before version comparison

* ci: suppress shellcheck SC2034 for indirect expansion address vars

* fix(ci): add shellcheck disable SC2034 to each indirect-expansion variable

The directive only suppresses the next line, not a block.

* fix(ci): address PR review feedback

- Remove ProtocolPayment from gateway upgrade manifest (no hardhat
  upgrade task exists for it yet)
- Add existence check for contracts/${name}.sol in current code to
  fail fast if manifest is out of sync
- Fix misleading CHAIN_ID_GATEWAY comment in host workflow

* fix(ci): fail-fast on missing REINITIALIZER_VERSION, skip verify for new contracts

- Replace silent :-0 fallback with explicit error when
  REINITIALIZER_VERSION cannot be parsed from an existing .sol file
- Skip version verification for contracts with no deployment address
  (new contracts not present in previous release)

* fix(ci): fail-hard when address mapping is missing for existing contracts

Only skip version verification for genuinely new contracts (not in
previous release). If a contract existed in the previous release but
has no address variable mapped, fail with a clear error instead of
silently skipping.

* ci: trigger fresh workflow run

* fix(ci): address PR review nits — consistent PascalCase naming, add comment on UPGRADE_FROM_TAG

* fix(ci): clean Hardhat cache and OZ manifest between sequential upgrades

The upgrade loop runs each contract upgrade as a separate npx hardhat
process. The .openzeppelin manifest and Hardhat cache/artifacts persist
on disk between invocations, causing flaky failures where the OZ plugin
reuses stale bytecode-hash entries or Hardhat resolves wrong artifacts
when previous-contracts/ and contracts/ share the same contract name.

Add `npx hardhat clean` and `rm -rf .openzeppelin` before each upgrade
to ensure a clean slate for compilation and deployment deduplication.

* fix: wait for upgradeToAndCall tx receipt before declaring upgrade success

The OZ hardhat-upgrades plugin's upgradeProxy() does NOT call .wait() on
the upgradeToAndCall transaction — it returns as soon as the tx is
submitted to the node. With Anvil's interval mining (--block-time 0.5),
the tx may not be mined when the plugin returns, and if it reverts during
mining, the revert goes completely undetected.

This caused flaky CI failures where the upgrade task printed "Proxy
contract successfully upgraded!" but getVersion() still returned the old
version — the upgradeToAndCall tx had silently reverted.

Fix: explicitly .wait() on the upgrade transaction, check the receipt
status, and read the EIP-1967 implementation slot to confirm the upgrade
took effect on-chain.

* Revert "fix: wait for upgradeToAndCall tx receipt before declaring upgrade success"

This reverts commit 847266f.

* fix(ci): force-mine and verify on-chain state after each contract upgrade

The OZ hardhat-upgrades plugin does NOT call .wait() on the
upgradeToAndCall transaction — it returns as soon as the tx is submitted.
With Anvil's interval mining (--block-time 0.5), the tx may still be
pending when the plugin returns, and if it reverts during mining the
revert goes undetected.

After each upgrade task:
1. Force-mine a block via `cast rpc evm_mine` to flush pending txs
2. Immediately verify getVersion() returns the expected value
3. Fail fast with a clear diagnostic if the upgrade was silently dropped

This catches the silent revert at the point of failure rather than later
in the separate verify step, making the error message actionable.

* fix(ci): simplify upgrade step — remove redundant per-upgrade verification

The inline per-upgrade version check duplicated the existing "Verify
contract versions" step.  Keep only the `cast rpc evm_mine` workaround
(OZ upgradeProxy does not wait for the upgradeToAndCall tx to be mined)
and let the dedicated verification step handle on-chain assertions.

* fix(ci): remove unnecessary hardhat clean/OZ manifest wipe between upgrades

The per-iteration `npx hardhat clean` + `rm -rf .openzeppelin` was a
speculative fix from before the real root cause was identified (missing
evm_mine).  Each contract has unique bytecode (no OZ manifest hash
collision) and tasks already use fully qualified artifact names, so
there is nothing to clean between iterations.

* feat(coprocessor): make multi-chain DB migration backwards compatible (#2043)

* feat(coprocessor): make multi-chain DB migration backwards compatible

Make sure both new and old versions can work with the same DB, should we
want to revert the new one to the old one.

* feat(coprocessor): add defaults needed for old code

* fix(coprocessor): fix zkproof-worker test DB field name

* fix(coprocessor): fix bad stress-test-generator DB field name

* ci(common): sandbox Claude Code behind Squid proxy + iptables (#2063)

* ci(common): sandbox Claude Code behind Squid proxy + iptables

Run the claude-code-action inside a network sandbox to prevent
data exfiltration to unauthorized hosts. Two layers of defense:
- Squid proxy: L7 domain allowlist (.anthropic.com, .github.com, etc.)
- iptables: blocks direct outbound TCP from the runner UID

All dependencies (Bun, action node_modules, Claude Code CLI, OIDC
token exchange) are pre-installed before lockdown because the action's
internal installers use fetch() which ignores HTTP_PROXY.

Also switches from --allowedTools to --dangerously-skip-permissions
since the network sandbox handles security at the infrastructure level.

update claude file with proper container setup

fix: shellchecks

fix zizmor warning

ci(claude): rewrite workflow from template, address PR #1995 security review

- Drop action wrapper, run claude CLI directly (avoids MCP stdin blocking)
- Remove dead pull_request trigger
- Separate GH_TOKEN from system prompt construction step
- Tighten iptables: resolve Squid IP dynamically, block UDP/ICMP
- Restrict squid allowlist to 3 domains (api.anthropic.com, platform.claude.com, github.com)
- Cache Squid Docker image, add iptables save/restore cleanup
- Add tracking comment for run visibility
- Fix token revocation to use HTTPS_PROXY

fix: replace A && B || C with proper if-then-else (SC2015)

fix: capture error details instead of silent suppression

OIDC exchange and token revocation now log the server response
on failure instead of swallowing it with -sf/--silent/2>/dev/null.

fix: shellcheck SC2001 and SC2015 in claude workflow

Replace sed prompt extraction with parameter expansion (SC2001).

chore: harden security practices

chore: update claude action from secutiry

* chore: rename claude.yml to claude-review.yml

* chore: enforces changes in sandboxed claude-* workflow

---------

Co-authored-by: Roger Carhuatocto <chilcano@intix.info>

* Revert "ci(common): sandbox Claude Code behind Squid proxy + iptables" (#2080)

Revert "ci(common): sandbox Claude Code behind Squid proxy + iptables (#2063)"

This reverts commit 9587546.

* fix(coprocessor): remove tx-sender dependency on hostchain for multichain (#1826)

* fix(coprocessor): add block finalization in HL, remove hostchain from tx-sender

* fix(coprocessor): review fix

* fix(coprocessor): review fix

* fix(coprocessor): e2e tests

* test(corpocessor): debug e2e

* fix(coprocessor): e2e tests

* test(test-suite): add e2e block cap tests for HCU metering (#2081)

* test(test-suite): add e2e block cap tests for HCU metering (#1099)

Add 5 block-cap scenarios to the E2E test suite exercising HCULimit
through real EncryptedERC20 FHE operations on the deployed stack:
multi-user accumulation, cap exhaustion, block rollover, whitelist
removal, and non-owner rejection.

Wire into CI via `fhevm-cli test hcu-block-cap` and a new workflow step.

* fix(test-suite): address review feedback on HCU block cap tests

- Rework block rollover test to actually block a caller in block N,
  then verify that same caller succeeds after rollover in block N+1
- Add missing DEPLOYER_PRIVATE_KEY to .env.example

* fix(test-suite): fix HCU block cap tests for real stack

- Accumulation test: use greaterThan instead of exact equality
  (block meter vs receipt HCU have a small discrepancy on real infra)
- Cap exhaustion + rollover tests: pass explicit gasLimit to bypass
  estimateGas, which reverts against pending state when cap is filled

* fix(test-suite): tighten accumulation assertion with 2% tolerance

Replace loose greaterThan check with near-sum assertion allowing ~2%
drift between receipt-reported HCU and on-chain block meter.

* fix(test-suite): replace HCU tolerance with self-consistent accumulation assertion

The receipt parser reconstructs HCU from the @fhevm/solidity npm price
table while the block meter uses the deployed contract's hardcoded
prices. A version skew between the two causes a small discrepancy.
Instead of cross-comparing with tolerance, assert the block meter
exceeds each individual tx's HCU — proving accumulation without
depending on price table parity.

* fix(test-suite): use revertedWithCustomError for non-owner assertion

Add NotHostOwner error to HCU_LIMIT_ABI and assert the specific custom
error instead of generic revert.

* refactor(test-suite): simplify HCU block cap test structure

- Scope save/restore of HCU limits to only the 2 tests that lower them
  (nested describe with its own beforeEach/afterEach)
- Extract mintAndDistribute helper for repeated mint+transfer preamble
- Remove blanket whitelist cleanup from afterEach (test cleans up itself)
- Parallelize 3 sequential view calls with Promise.all

* refactor(test-suite): simplify accumulation test to use block meter only

Replace receipt-based HCU comparison with three block meter readings:
1. Single-tx block → baseline meter
2. Two-tx block → meter exceeds baseline (proves accumulation)
3. Single-tx block → meter resets and matches baseline

No cross-comparison of price tables, no getTxHCUFromTxReceipt needed.

* fix(test-suite): assert meter starts at 0 before first operation

* refactor(test-suite): tighten accumulation assertions

- Assert meter2 == 2 * meter1 (exact, same ops in both txs)
- Remove unnecessary mineNBlocks between blocks (meter resets
  automatically in each new block)

* ci: temporarily skip all tests except HCU block cap

DO NOT MERGE — revert before merge. Added `if: false` to all test
steps except HCU block cap to validate in isolation.

* fix(ci): build test-suite from source to include new HCU tests

The CI was pulling the pre-built test-suite Docker image (v0.11.0-1)
which doesn't contain the new block cap scenarios tests. Use --build
so the image is built from the current checkout.

* fix(test): fix NotHostOwner ABI signature and relax accumulation assertion

- NotHostOwner takes an address parameter: error NotHostOwner(address)
- Relax meter2 == meter1*2 to meter2 > meter1 since alice→bob and
  bob→alice can differ slightly in HCU due to balance init paths

* fix(test): relax meter3 assertion — same op can differ by ~18 HCU

The same alice→bob transfer produces slightly different HCU across
runs due to balance state changes from intermediate transfers.
Assert reset behavior (meter3 > 0 and meter3 < meter2) instead of
exact equality with meter1.

* fix(test): disable Anvil interval mining when batching txs in one block

Anvil runs with --block-time 1, so blocks keep getting mined even
with evm_setAutomine(false). Use evm_setIntervalMining(0) to fully
pause block production, then restore both after mining.

* refactor(test): centralize interval mining control in beforeEach/afterEach

Disable interval mining once in beforeEach (deterministic blocks),
restore in afterEach. Tests only toggle automine for batching.

* revert: restore per-test interval mining control (beforeEach hangs)

Disabling interval mining in beforeEach hangs because Anvil's
evm_setIntervalMining(0) overrides automine. Revert to the per-test
pattern (disable interval+automine before batching, restore after)
which passed in CI run 22733231829.

* revert(ci): remove temporary test filters and --build flag

Restore workflow to match main, keeping only the new HCU block cap
test step addition.

* test(test-suite): always restore HCU state after block cap tests

* fix(test-suite): restore HCU whitelist state safely

* fix(test-suite): stabilize HCU meter assertions

* fix(test-suite): harden HCU e2e tests and add build dispatch

* ci(test-suite): avoid expression expansion in deploy step

* fix(test-suite): stabilize HCU whitelist removal test

* test(test-suite): instrument HCU whitelist tx waits

* fix(test-suite): use manual mining for HCU whitelist removal test

The automine=true + intervalMining=0 combo is unreliable in CI —
Anvil hangs for ~5min before mining the mint tx, causing Mocha timeout.

Switch to automine=false + explicit evm_mine after each tx, matching
the proven pattern used by the "with lowered limits" tests that pass
consistently. Also add gasLimit overrides to bypass estimateGas against
pending state.

* feat(test-suite): add --resume/--only to fhevm-cli and optimize CI deploy

Forward --resume STEP and --only STEP flags from fhevm-cli to the
underlying deploy-fhevm-stack.sh script, with step validation and
mutual exclusivity check.

Use --only test-suite in CI when deploy-build is set, so only the
test-suite image is rebuilt from the branch instead of the entire stack.

* fix(test-suite): remove --remove-orphans from selective cleanup

cleanup_single_step and cleanup_from_step used --remove-orphans with
a single compose file, causing Docker Compose to tear down every
container in the project not defined in that file. This destroyed the
entire stack when running e.g. --only test-suite.

* fix(ci): revert --only test-suite optimization in deploy step

The --only test-suite approach rebuilds only the test container but
uses pre-built host-sc images that lack the HCULimit contract. The
HCU block cap tests need host-sc built from the branch, so we must
use the full --build deploy for now.

The --resume/--only CLI flags and the --remove-orphans fix in the
deploy script are kept — they're useful for local development and
future CI optimizations.

* chore(test-suite): revert unrelated fhevm-cli and deploy script changes

Keep the PR scoped to the HCU whitelist test fix and the deploy-build
workflow input. The --resume/--only CLI flags and --remove-orphans fix
can be submitted in a separate PR.

* fix(test-suite): re-add hcu-block-cap test type to fhevm-cli

* ci(common): sandbox Claude Code behind Squid proxy + iptables (#2083)

* ci(common): sandbox Claude Code behind Squid proxy + iptables

Run the claude-code-action inside a network sandbox to prevent
data exfiltration to unauthorized hosts. Two layers of defense:
- Squid proxy: L7 domain allowlist (.anthropic.com, .github.com, etc.)
- iptables: blocks direct outbound TCP from the runner UID

All dependencies (Bun, action node_modules, Claude Code CLI, OIDC
token exchange) are pre-installed before lockdown because the action's
internal installers use fetch() which ignores HTTP_PROXY.

Also switches from --allowedTools to --dangerously-skip-permissions
since the network sandbox handles security at the infrastructure level.

update claude file with proper container setup

fix: shellchecks

fix zizmor warning

ci(claude): rewrite workflow from template, address PR #1995 security review

- Drop action wrapper, run claude CLI directly (avoids MCP stdin blocking)
- Remove dead pull_request trigger
- Separate GH_TOKEN from system prompt construction step
- Tighten iptables: resolve Squid IP dynamically, block UDP/ICMP
- Restrict squid allowlist to 3 domains (api.anthropic.com, platform.claude.com, github.com)
- Cache Squid Docker image, add iptables save/restore cleanup
- Add tracking comment for run visibility
- Fix token revocation to use HTTPS_PROXY

fix: replace A && B || C with proper if-then-else (SC2015)

fix: capture error details instead of silent suppression

OIDC exchange and token revocation now log the server response
on failure instead of swallowing it with -sf/--silent/2>/dev/null.

fix: shellcheck SC2001 and SC2015 in claude workflow

Replace sed prompt extraction with parameter expansion (SC2001).

chore: harden security practices

chore: update claude action from secutiry

* chore: rename claude.yml to claude-review.yml

* chore: enforces changes in sandboxed claude-* workflow

* ci(common): fix zizmor issues

---------

Co-authored-by: enitrat <msaug@protonmail.com>
Co-authored-by: Roger Carhuatocto <chilcano@intix.info>

* fix(coprocessor): stop logging errors for unknown input verif events (#2077)

* fix(coprocessor): stop logging errors for unknown input verif events

* fix(coprocessor): update cargo dependence

* ci(test-suite): run e2e tests with 2-of-2 coprocessor consensus (#2052)

* ci(test-suite): run e2e tests with 2-of-2 coprocessor consensus

Deploy with --coprocessors 2 --coprocessor-threshold 2 so both
coprocessors must independently compute identical ciphertext digests
for on-chain consensus to be reached. All existing tests pass
unchanged — consensus enforcement is transparent.

Adds a consensus watchdog (Mocha root hook) that monitors gateway
chain events during tests:
- Detects ciphertext digest divergence immediately
- Detects consensus stalls within 3 minutes
- No-op when GATEWAY_RPC_URL is unset (single-coprocessor runs)

Closes zama-ai/fhevm-internal#1132

* fix(test-suite): address code review findings in consensus watchdog

- Add public flush() method instead of casting to any to call private poll()
- Add polling guard to prevent overlapping poll cycles from setInterval
- Remove non-null assertion on INPUT_VERIFICATION_ADDRESS before null check
- Prune resolved entries from maps (delete on consensus + track count via integers)
- Remove consensusReached field from interfaces (no longer needed)
- Simplify summary() to use map.size and counters instead of 4 array copies

* test(consensus-watchdog): add unit tests for watchdog logic

12 tests covering:
- Ciphertext digest divergence detection
- SNS digest divergence detection
- Input verification divergence detection
- Consensus stall timeout detection
- Map pruning on consensus resolution
- Polling guard preventing overlapping polls
- Summary output for resolved and pending entries
- Graceful no-op when env vars are not set

Also exports ConsensusWatchdog class for testability.

* fix(test): cleanup resource leaks in watchdog unit tests

- Destroy real ethers provider before replacing with stub in mockWatchdog()
- Wrap env var mutation in try/finally to guarantee cleanup on test failure

* fix(test-suite): skip proof monitoring when input verification is unset

* ci(test-suite): install foundry in e2e workflow

* fix(test-suite): avoid rerunning db migration for extra coprocessors

* revert(ci): drop validation-only e2e changes

* fix(test-suite): harden consensus watchdog

* ci(test-suite): enable build-based e2e validation

* fix(test-suite): avoid rerunning extra coprocessor migration

* test-suite: clarify consensus watchdog summary

* Revert "fix(test-suite): avoid rerunning extra coprocessor migration"

This reverts commit 3a73efb.

* Revert "ci(test-suite): enable build-based e2e validation"

This reverts commit 818e565.

* ci(test-suite): install foundry for 2-of-2 e2e deploys

* fix(test-suite): avoid rerunning extra coprocessor migration

* feat(coprocessor): re-randomise input ciphertexts before first compression (#2073)

* feat(coprocessor): add re-randomisation of input ciphertexts

* test(coprocessor): add regression tests for input re-randomisation

* feat(common): simple acl (#2072)

* refactor(coprocessor): remove ACL propagate ops (#1825)

* chore(gateway-contracts): remove MultichainACL contract (#1904)

* chore(gateway-contracts): remove MultichainACL from gateway-contracts

* chore(coprocessor): remove multichainACL contract from coprocessor

* chore(gateway-contracts): remove unused param from internal Decryption.sol function

* chore(gateway-contracts): remove multichainACL checks from Decryption.sol tests

* chore(gateway-contracts): update rust bindings

* chore(gateway-contracts): make conformance

* chore(gateway-contracts): remove multichainACL test from delegated user decrypt

* chore(gateway-contracts): update bindings with foundry v1.3.1 as in CI

* chore(gateway-contracts): bump Decryption.sol upgradeable version

* chore(gateway-contracts): fix lint

* refactor: remove arbitrum expiration date constraint from host ACL

* refactor: remove unused params from isUserDecryptionReady & isDelegatedUserDecryptionReady

* chore: fix comments

* fix: fix ci upgrade contract flag

* fix: remove test related to legacy expiry-too-soon constraint

* chore: make conformance

* chore(test-suite): use acl relayer (#2064)

* chore(test-suite): update copro params

* chore(test-suite): update contract addresses

* chore(gateway-contracts): pauser task minor fix

* chore(test-suite): update relayer

* chore(test-suite): update relayer-sdk v0.4.1

* chore(test-suite): draft add negative acl tests

* chore(test-suite): fix expired delegation, acl not allow tests (#2060)

* chore(test-suite): update acl failure test for delegated user decr

- Previously: negative delegated user decryption tests asserted on raw
  Solidity selector 0x0190c506 and was using relayer-sdk v0.4.1 that was
  not handling the label 'now_allowed_on_host_acl' from relayer.

- Now: bump relayer-sdk to v0.4.2 that handles the label and asserts on
  relayer-sdk error label not_allowed_on_host_acl, matching the
  structured error returned by the relayer on HTTP 400

* chore(test-suite): fix expired delegation test

- Issue: setting pastExpiration=1 reverted at the
  contract level (ExpirationDateBeforeOneHour) so
  the test never reached the decryption step

- Fix: delegate with a valid expiration (now + 1h1m),
  then use evm_increaseTime to fast-forward past it
  before attempting decryption

* chore(common): update package-lock.json

---------

Co-authored-by: Simon Eudeline <simon.eudeline@zama.ai>

* fix(test-suite): relayer and copro config update

* chore(test-suite): update test-suite versions

---------

Co-authored-by: Manoranjith <manoranjith.ponnuraj@zama.ai>

* chore(coprocessor): versions bump

* chore(gateway-contracts): remove acl from upgrade manifest

* tests(coprocessor): fix stop_retrying_verify_proof_on_gw_config_error()

---------

Co-authored-by: Petar Ivanov <29689712+dartdart26@users.noreply.github.com>
Co-authored-by: malatrax <71888134+zmalatrax@users.noreply.github.com>
Co-authored-by: Manoranjith <manoranjith.ponnuraj@zama.ai>

* feat(test-suite): introduce context-aware extraData changes

* feat(host-contracts): implement context-aware KMSVerifier (#2028)

* feat(kms-connector): context-aware extraData handling for decryption (#2032)

* chore(kms-connector): rename fhe module to handle

* chore(kms-connector): add and use helper function

* chore(kms-connector): add kms_context table

* chore(kms-connector): prepare ethereum listener

* feat(kms-connector): kms context validation

* chore(kms-connector): kms context tests

* chore(kms-connector): ethereum listener termination

* feat(gateway-contracts): implement context-aware KMS node configs and decryption

* feat: implement context-aware KMS node configs and decryption

* chore(gateway-contracts): apply a few arguments renaming

* fix(gateway-contracts): refresh rust bindings

* chore(gateway-contracts): reuse setter methods and adjust NatSpecs

* chore(gateway-contracts): refresh rust bindings

* refactor: apply suggested naming

* refactor(gateway-contracts): apply suggested renaming

* refactor: revert updateKmsContext naming

* refactor(gateway-contracts): enable decryption upgrade workflow

* chore(gateway-contracts): refresh bindings

* chore(test-suite): introduce getExtraData() method from SDK

* chore(test-suite): restore missed user decrypt ebool test case

* feat(kms-connector): propagate empty extra_data for 0x00

* feat(kms-connector): propagate empty extra_data for 0x00

* chore(kms-connector): add TODO comment for the workaround and upgrade quinn-proto

* chore(kms-connector): add TODO comment for the workaround and upgrade quinn-proto

* chore(kms-connector): use dedicated core config for tests

---------

Co-authored-by: Simon Eudeline <simon.eudeline@zama.ai>

* chore(test-suite): upgrade relayer-sdk version

* chore(test-suite): upgrade test-suite version in fhevm-cli

---------

Co-authored-by: Oba <obatirou@gmail.com>
Co-authored-by: Simon E. <simon.eudeline@zama.ai>

* docs: update integration guide to add details about wrapping/unwrapping (#2132)

* docs: Expand wallet guide to cover CEXs

* docs: Add details about wrapping/unwrapping process

* docs: Fixed menu links

* chore: fix typo

* fix: standardize BSD-clear license files (#2136)

* ci(kms-connector): fix check-changes of bindings (#2138)

* fix(gateway-contracts):  overload `isUserDecryptionReady` with old signature (#2137)

fix(gateway-contracts): overload isUserDecryptionReady with old signature

* chore(gateway-contracts): refresh rust bindings

* chore(test-suite): replace component versions

* chore(gateway-contracts): refresh contract Charts

---------

Co-authored-by: Oba <obatirou@gmail.com>
Co-authored-by: Simon E. <simon.eudeline@zama.ai>
Co-authored-by: Elias Tazartes <66871571+Eikix@users.noreply.github.com>
Co-authored-by: Antoniu <90181190+antoniupop@users.noreply.github.com>
Co-authored-by: Petar Ivanov <29689712+dartdart26@users.noreply.github.com>
Co-authored-by: Mathieu <60658558+enitrat@users.noreply.github.com>
Co-authored-by: Roger Carhuatocto <chilcano@intix.info>
Co-authored-by: immortal tofu <clement@danjou.io>
Co-authored-by: rudy-6-4 <rudy.sicard@zama.ai>
Co-authored-by: enitrat <msaug@protonmail.com>
Co-authored-by: malatrax <71888134+zmalatrax@users.noreply.github.com>
Co-authored-by: Manoranjith <manoranjith.ponnuraj@zama.ai>
Co-authored-by: Ankur Banerjee <ankurdotb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants