|
| 1 | +# ADR-001: Cryptographic Hash Function Selection |
| 2 | + |
| 3 | +**Status**: Accepted |
| 4 | +**Date**: 2026-02-04 |
| 5 | +**Authors**: ruv.io, RuVector Team |
| 6 | +**Deciders**: Architecture Review Board |
| 7 | +**SDK**: Claude-Flow |
| 8 | + |
| 9 | +## Version History |
| 10 | + |
| 11 | +| Version | Date | Author | Changes | |
| 12 | +|---------|------|--------|---------| |
| 13 | +| 0.1 | 2026-02-04 | ruv.io | Initial proposal based on CoSMeTIC paper analysis | |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Context |
| 18 | + |
| 19 | +### The Hash Function Challenge in Computational Sparse Merkle Trees |
| 20 | + |
| 21 | +The CoSMeTIC framework (Computational Sparse Merkle Trees with Inclusion/exclusion Certificates) requires cryptographic hash functions in two fundamentally different operational contexts: |
| 22 | + |
| 23 | +1. **Standard tree operations**: Leaf hashing, internal node hashing, key derivation, domain separation, attestation binding. These run on conventional hardware (native x86_64, ARM64) and in WebAssembly runtimes (browser, edge). Performance is measured in wall-clock throughput. |
| 24 | + |
| 25 | +2. **Zero-knowledge circuit operations**: When a prover must demonstrate knowledge of a valid Merkle path (inclusion) or the absence of a leaf (exclusion) inside a ZK circuit, every hash invocation translates into arithmetic constraints over a prime field. The dominant cost metric shifts from CPU cycles to **constraint count** (R1CS gates or PLONKish rows). |
| 26 | + |
| 27 | +The CoSMeTIC paper (Ramanan et al., arXiv:2601.12136, January 2026) defines computational sparse Merkle trees that embed reduction operations at each recursion level. Unlike conventional SMTs that merely concatenate-and-hash child nodes, CSMTs apply an aggregation function at each internal node. This means hash functions are invoked at every level of the tree (up to 256 levels deep for a 256-bit address space), making hash performance critical in both contexts. |
| 28 | + |
| 29 | +### Constraint Cost Comparison |
| 30 | + |
| 31 | +The following table summarizes the constraint overhead of candidate hash functions when arithmetized inside a ZK-SNARK circuit: |
| 32 | + |
| 33 | +| Hash Function | R1CS Constraints (per invocation) | Relative Cost | ZK-Native Design | Maturity | |
| 34 | +|---------------|-----------------------------------|---------------|-------------------|----------| |
| 35 | +| **SHA-256** | ~25,000-30,000 | 1x (baseline) | No | Very High | |
| 36 | +| **SHA-3 / Keccak** | ~75,000-90,000 | 3x | No | High | |
| 37 | +| **Blake3** | ~15,000-20,000 | 0.6x | No | High | |
| 38 | +| **Poseidon** | ~250-350 | **0.01x** | Yes | Medium | |
| 39 | +| **MiMC** | ~300-700 | 0.02x | Yes | Medium | |
| 40 | +| **Rescue** | ~400-600 | 0.02x | Yes | Medium | |
| 41 | +| **Anemoi** | ~150-250 | 0.007x | Yes | Low | |
| 42 | +| **Griffin** | ~200-300 | 0.009x | Yes | Low | |
| 43 | + |
| 44 | +Traditional hash functions (SHA-256, Blake3) rely on bitwise operations (XOR, rotation, shifts) that are fundamentally expensive to arithmetize because arithmetic circuits operate over finite field elements using addition and multiplication. Each XOR gate must be decomposed into field operations, inflating constraint counts by orders of magnitude. |
| 45 | + |
| 46 | +ZK-native hash functions (Poseidon, MiMC, Rescue) are designed around algebraic operations (field addition and exponentiation) that map directly to arithmetic circuit primitives, yielding 50-100x fewer constraints. |
| 47 | + |
| 48 | +### WASM Deployment Constraints |
| 49 | + |
| 50 | +The `cosmetic-wasm` crate targets `wasm32-unknown-unknown` with `crate-type = ["cdylib", "rlib"]`. This imposes: |
| 51 | + |
| 52 | +- **Binary size budget**: Compressed WASM should remain under 200KB for reasonable browser load times. Full ZK libraries (e.g., arkworks, halo2_proofs with gadgets) can exceed 5MB compiled to WASM. |
| 53 | +- **No native intrinsics**: WASM lacks SIMD instructions in the base spec (though WASM SIMD 128-bit is increasingly available). SHA-256 benefits from hardware acceleration (SHA-NI on x86_64) that is unavailable in WASM. |
| 54 | +- **Memory model**: 32-bit linear memory with a 4GB ceiling. Large circuit witness generation can pressure memory limits. |
| 55 | +- **Execution model**: Single-threaded unless Web Workers are used via `wasm-bindgen-rayon`. Proof generation benefits greatly from parallelism. |
| 56 | + |
| 57 | +### Existing Implementation State |
| 58 | + |
| 59 | +The current `hasher.rs` implements: |
| 60 | +- SHA-256 via the `sha2` crate for all standard operations |
| 61 | +- Domain-separated hashing: `0x00` prefix for leaves, `0x01` for internal nodes, `0x02` for attestations |
| 62 | +- A placeholder `poseidon_hash` behind the `poseidon` feature flag that wraps SHA-256 (for API shape testing) |
| 63 | +- A pre-computed `DEFAULT_EMPTY` constant for absent leaf positions |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## Decision |
| 68 | + |
| 69 | +### Dual-Hash Architecture: Poseidon for ZK Circuits, SHA-256/Blake3 for Non-ZK Operations |
| 70 | + |
| 71 | +We adopt a **stratified hash function architecture** with the following assignments: |
| 72 | + |
| 73 | +#### Layer 1: Standard Tree Operations (Non-ZK) |
| 74 | + |
| 75 | +**SHA-256** remains the default hash function for all non-ZK tree operations: |
| 76 | + |
| 77 | +- Leaf hashing: `H(0x00 || key || value)` |
| 78 | +- Internal node hashing: `H(0x01 || left || right)` |
| 79 | +- Key derivation: `H(data)` to compute 256-bit leaf addresses |
| 80 | +- Attestation binding: `H(0x02 || input_root || output_root || function_id || params)` |
| 81 | +- Default empty leaf: Pre-computed `SHA-256("cosmetic_empty_leaf")` |
| 82 | + |
| 83 | +**Rationale**: SHA-256 is battle-tested, widely available in WASM (`sha2` crate compiles cleanly to `wasm32`), and provides the 256-bit output that directly maps to the tree's address space. The `sha2` crate adds approximately 15KB to WASM binary size. |
| 84 | + |
| 85 | +**Future consideration**: Blake3 offers approximately 3-5x higher throughput than SHA-256 on conventional hardware and compiles well to WASM. If non-ZK hashing becomes a bottleneck (e.g., bulk tree construction with thousands of leaves), Blake3 can be introduced as an optional backend behind a feature flag without changing the tree architecture, since both produce 256-bit outputs. |
| 86 | + |
| 87 | +#### Layer 2: Zero-Knowledge Circuit Operations |
| 88 | + |
| 89 | +**Poseidon** is the designated hash function for all operations that must be verified inside a ZK proof circuit: |
| 90 | + |
| 91 | +- Leaf Transform Ratio (LTR) proofs: Proving correct leaf-level computation |
| 92 | +- Merkle Record Path (MRP) proofs: Proving valid paths from leaf to root |
| 93 | +- Inclusion certificate generation: Proving a key exists with its value |
| 94 | +- Exclusion certificate generation: Proving a key is absent (default-empty at its position) |
| 95 | + |
| 96 | +**Rationale**: The CoSMeTIC paper's use of the ezkl framework with Halo2 backend confirms algebraic-operation-native hashing is essential for practical proof generation. Poseidon requires approximately 250-350 R1CS constraints per hash invocation versus approximately 25,000-30,000 for SHA-256 -- a reduction of roughly 80-100x. For a tree of depth K, each inclusion/exclusion proof requires K hash evaluations inside the circuit. At K=256, this means: |
| 97 | + |
| 98 | +- **SHA-256 in circuit**: ~6.4-7.7 million constraints per proof path |
| 99 | +- **Poseidon in circuit**: ~64,000-89,600 constraints per proof path |
| 100 | + |
| 101 | +This difference determines whether proof generation completes in seconds (Poseidon) or minutes (SHA-256) on consumer hardware, and whether it is feasible at all in WASM with its memory constraints. |
| 102 | + |
| 103 | +**Poseidon configuration**: |
| 104 | +- Field: BN254 scalar field (matching existing Rust ecosystem: `ark-bn254`, `halo2curves`) |
| 105 | +- Arity: 2 (binary tree, width-3 sponge: rate=2, capacity=1) |
| 106 | +- Security level: 128 bits |
| 107 | +- Full rounds: 8, partial rounds: 56 (per Poseidon reference specification) |
| 108 | + |
| 109 | +#### Layer 3: Bridge Layer (Hash Compatibility) |
| 110 | + |
| 111 | +A **bridge hash** mechanism links the two layers when a proof must attest to the same data that exists in the SHA-256-based tree: |
| 112 | + |
| 113 | +1. The prover holds the pre-image data (leaf value, key) |
| 114 | +2. Inside the ZK circuit, the prover re-hashes using Poseidon to produce a Poseidon-based root |
| 115 | +3. A separate algebraic commitment binds the Poseidon root to the SHA-256 root via a signed attestation outside the circuit |
| 116 | +4. The verifier checks: (a) the ZK proof is valid against the Poseidon root, and (b) the attestation correctly binds the Poseidon root to the published SHA-256 root |
| 117 | + |
| 118 | +This avoids the need to hash SHA-256 inside the ZK circuit while maintaining a verifiable link between the two hash domains. |
| 119 | + |
| 120 | +### Feature Flag Design |
| 121 | + |
| 122 | +```toml |
| 123 | +[features] |
| 124 | +default = ["console_error_panic_hook"] |
| 125 | +# Enable Poseidon hash for ZK-friendly circuits |
| 126 | +poseidon = ["dep:poseidon-permutation"] |
| 127 | +# Enable Blake3 as an alternative non-ZK hash |
| 128 | +blake3 = ["dep:blake3"] |
| 129 | +# Enable computation attestation proofs |
| 130 | +attestation = [] |
| 131 | +# Enable all features |
| 132 | +full = ["poseidon", "attestation", "blake3"] |
| 133 | +``` |
| 134 | + |
| 135 | +### Hash Trait Abstraction |
| 136 | + |
| 137 | +```rust |
| 138 | +/// Trait abstracting over hash function implementations |
| 139 | +pub trait TreeHasher: Clone + Send + Sync { |
| 140 | + /// Hash output size in bytes |
| 141 | + const OUTPUT_SIZE: usize; |
| 142 | + |
| 143 | + /// Hash arbitrary data |
| 144 | + fn hash(data: &[u8]) -> Hash; |
| 145 | + |
| 146 | + /// Hash a leaf node with domain separation |
| 147 | + fn hash_leaf(key: &Hash, value: &[u8]) -> Hash; |
| 148 | + |
| 149 | + /// Hash an internal node with domain separation |
| 150 | + fn hash_internal(left: &Hash, right: &Hash) -> Hash; |
| 151 | + |
| 152 | + /// The default hash for empty/absent leaves |
| 153 | + fn default_empty() -> Hash; |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +This trait allows the tree implementation to be generic over the hash function, enabling testing with fast non-cryptographic hashes and production use with either SHA-256 or Poseidon depending on context. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## Consequences |
| 162 | + |
| 163 | +### Positive |
| 164 | + |
| 165 | +1. **Practical ZK proof generation**: Poseidon reduces circuit constraint count by ~80-100x compared to SHA-256, making in-browser WASM proof generation feasible for trees of practical depth. |
| 166 | + |
| 167 | +2. **Backward compatibility**: SHA-256 remains the default for non-ZK operations, preserving compatibility with the existing `hasher.rs` implementation and all current tests. |
| 168 | + |
| 169 | +3. **Minimal WASM binary impact**: The `sha2` crate adds ~15KB. Poseidon is opt-in behind a feature flag and only pulled in when ZK functionality is needed. |
| 170 | + |
| 171 | +4. **Domain separation preserved**: Both hash layers use domain-separated inputs (leaf vs. internal vs. attestation), preventing cross-domain collision attacks regardless of which hash function is active. |
| 172 | + |
| 173 | +5. **Ecosystem alignment**: Poseidon over BN254 aligns with the dominant Rust ZK ecosystem (arkworks, halo2, circom). Multiple production Rust crates exist: `poseidon-merkle`, `dusk-poseidon`, `light-poseidon`. |
| 174 | + |
| 175 | +### Negative |
| 176 | + |
| 177 | +1. **Dual-root complexity**: The bridge layer introduces a second root hash (Poseidon-based) that must be kept in sync with the SHA-256 root. This adds implementation complexity and a potential consistency failure mode. |
| 178 | + |
| 179 | +2. **Poseidon security maturity**: Poseidon has received significant cryptanalysis but has not been subjected to the decades of scrutiny that SHA-256 has. The security margin of ZK-friendly hash functions remains an active area of research. Newer candidates (Anemoi, Griffin) claim better efficiency but have even less cryptanalytic history. |
| 180 | + |
| 181 | +3. **Performance asymmetry**: Poseidon is optimized for arithmetic circuits, not for raw throughput on conventional hardware. On native/WASM without ZK circuits, Poseidon is approximately 10-50x slower than SHA-256 for the same input size. It should never be used as a general-purpose hash outside of ZK contexts. |
| 182 | + |
| 183 | +4. **Library dependency surface**: Adding Poseidon introduces dependencies on finite field arithmetic libraries (e.g., `ark-ff`, `pasta_curves`, or `halo2curves`), which increase WASM binary size by 100-500KB depending on the implementation chosen. |
| 184 | + |
| 185 | +### Risks and Mitigations |
| 186 | + |
| 187 | +| Risk | Likelihood | Impact | Mitigation | |
| 188 | +|------|------------|--------|------------| |
| 189 | +| Poseidon algebraic attack discovered | Low | High | Monitor ePrint/IACR; trait abstraction allows swapping to MiMC/Rescue/Anemoi | |
| 190 | +| WASM binary exceeds budget with Poseidon deps | Medium | Medium | Feature-flag isolation; consider vendoring minimal Poseidon implementation | |
| 191 | +| Dual-root consistency bug | Medium | High | Extensive property-based testing; root binding attestation includes both roots | |
| 192 | +| Blake3 needed for bulk operations | Low | Low | Already designed as feature flag; drop-in compatible via `TreeHasher` trait | |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## References |
| 197 | + |
| 198 | +- Ramanan, P. et al. "CoSMeTIC: Zero-Knowledge Computational Sparse Merkle Trees with Inclusion-Exclusion Proofs for Clinical Research." arXiv:2601.12136, January 2026. |
| 199 | +- Grassi, L. et al. "Poseidon: A New Hash Function for Zero-Knowledge Proof Systems." USENIX Security 2021. |
| 200 | +- Bowe, S. et al. "Halo: Recursive Proof Composition without a Trusted Setup." IACR ePrint 2019/1021. |
| 201 | +- Dahlberg, R. et al. "Efficient Sparse Merkle Trees." Nordic Conference on Secure IT Systems, 2016. |
| 202 | +- ZK-Plus. "Benchmarks of Hashing Algorithms in ZoKrates." https://zk-plus.github.io/tutorials/basics/hashing-algorithms-benchmarks |
| 203 | +- Zellic Research. "ZK-Friendly Hash Functions." https://www.zellic.io/blog/zk-friendly-hash-functions/ |
0 commit comments