Skip to content

Commit 5502a53

Browse files
committed
feat: Add CoSMeTIC WASM crate with Sparse Merkle Trees and computation attestations
Implement the CoSMeTIC (Computational Sparse Merkle Trees with Inclusion/exclusion Certificates) framework as a WASM crate in examples/experiments/. Includes: - 256-bit Sparse Merkle Tree with lazy evaluation and per-level empty hash precomputation - Inclusion and exclusion proof generation with compact encoding - Computation attestation system linking input/output states through cryptographic commitments with hash-chained audit trails - WASM bindings via wasm-bindgen for JavaScript interop - Domain-separated SHA-256 hashing (leaf/internal/attestation) - ADR and DDD documentation (hash function selection, SMT architecture, ZK proof system design, domain model) All 35 tests pass. https://claude.ai/code/session_01LcbbUBDm1oV2CdeAk3UtTB
1 parent 5925746 commit 5502a53

13 files changed

Lines changed: 3664 additions & 1 deletion

Cargo.lock

Lines changed: 14 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ members = [
7171
"crates/ruvector-delta-index",
7272
"crates/ruvector-delta-graph",
7373
"crates/ruvector-delta-consensus",
74+
"examples/experiments/cosmetic-wasm",
7475
]
7576
resolver = "2"
7677

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
[package]
2+
name = "cosmetic-wasm"
3+
version = "0.1.0"
4+
edition = "2021"
5+
authors = ["RuVector Team"]
6+
license = "MIT"
7+
description = "CoSMeTIC: Computational Sparse Merkle Trees with Inclusion/exclusion Certificates - WASM"
8+
repository = "https://github.com/ruvnet/ruvector"
9+
keywords = ["wasm", "merkle-tree", "zero-knowledge", "cryptography", "sparse-merkle-tree"]
10+
categories = ["wasm", "cryptography", "data-structures"]
11+
12+
[lib]
13+
crate-type = ["cdylib", "rlib"]
14+
path = "src/lib.rs"
15+
16+
[features]
17+
default = ["console_error_panic_hook"]
18+
# Enable Poseidon hash for ZK-friendly circuits
19+
poseidon = []
20+
# Enable computation attestation proofs
21+
attestation = []
22+
# Enable all features
23+
full = ["poseidon", "attestation"]
24+
25+
[dependencies]
26+
# WASM bindings
27+
wasm-bindgen = "0.2"
28+
js-sys = "0.3"
29+
30+
# Serialization
31+
serde = { version = "1.0", features = ["derive"] }
32+
serde_json = "1.0"
33+
34+
# Cryptographic hashing
35+
sha2 = { version = "0.10", default-features = false }
36+
37+
# Error handling for WASM
38+
console_error_panic_hook = { version = "0.1", optional = true }
39+
40+
[dev-dependencies]
41+
wasm-bindgen-test = "0.3"
42+
43+
[profile.release]
44+
lto = true
45+
opt-level = "s"
46+
codegen-units = 1
47+
48+
[package.metadata.wasm-pack.profile.release]
49+
wasm-opt = ["-Os", "--enable-bulk-memory", "--enable-nontrapping-float-to-int"]
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# ADR-001: Cryptographic Hash Function Selection
2+
3+
**Status**: Accepted
4+
**Date**: 2026-02-04
5+
**Authors**: ruv.io, RuVector Team
6+
**Deciders**: Architecture Review Board
7+
**SDK**: Claude-Flow
8+
9+
## Version History
10+
11+
| Version | Date | Author | Changes |
12+
|---------|------|--------|---------|
13+
| 0.1 | 2026-02-04 | ruv.io | Initial proposal based on CoSMeTIC paper analysis |
14+
15+
---
16+
17+
## Context
18+
19+
### The Hash Function Challenge in Computational Sparse Merkle Trees
20+
21+
The CoSMeTIC framework (Computational Sparse Merkle Trees with Inclusion/exclusion Certificates) requires cryptographic hash functions in two fundamentally different operational contexts:
22+
23+
1. **Standard tree operations**: Leaf hashing, internal node hashing, key derivation, domain separation, attestation binding. These run on conventional hardware (native x86_64, ARM64) and in WebAssembly runtimes (browser, edge). Performance is measured in wall-clock throughput.
24+
25+
2. **Zero-knowledge circuit operations**: When a prover must demonstrate knowledge of a valid Merkle path (inclusion) or the absence of a leaf (exclusion) inside a ZK circuit, every hash invocation translates into arithmetic constraints over a prime field. The dominant cost metric shifts from CPU cycles to **constraint count** (R1CS gates or PLONKish rows).
26+
27+
The CoSMeTIC paper (Ramanan et al., arXiv:2601.12136, January 2026) defines computational sparse Merkle trees that embed reduction operations at each recursion level. Unlike conventional SMTs that merely concatenate-and-hash child nodes, CSMTs apply an aggregation function at each internal node. This means hash functions are invoked at every level of the tree (up to 256 levels deep for a 256-bit address space), making hash performance critical in both contexts.
28+
29+
### Constraint Cost Comparison
30+
31+
The following table summarizes the constraint overhead of candidate hash functions when arithmetized inside a ZK-SNARK circuit:
32+
33+
| Hash Function | R1CS Constraints (per invocation) | Relative Cost | ZK-Native Design | Maturity |
34+
|---------------|-----------------------------------|---------------|-------------------|----------|
35+
| **SHA-256** | ~25,000-30,000 | 1x (baseline) | No | Very High |
36+
| **SHA-3 / Keccak** | ~75,000-90,000 | 3x | No | High |
37+
| **Blake3** | ~15,000-20,000 | 0.6x | No | High |
38+
| **Poseidon** | ~250-350 | **0.01x** | Yes | Medium |
39+
| **MiMC** | ~300-700 | 0.02x | Yes | Medium |
40+
| **Rescue** | ~400-600 | 0.02x | Yes | Medium |
41+
| **Anemoi** | ~150-250 | 0.007x | Yes | Low |
42+
| **Griffin** | ~200-300 | 0.009x | Yes | Low |
43+
44+
Traditional hash functions (SHA-256, Blake3) rely on bitwise operations (XOR, rotation, shifts) that are fundamentally expensive to arithmetize because arithmetic circuits operate over finite field elements using addition and multiplication. Each XOR gate must be decomposed into field operations, inflating constraint counts by orders of magnitude.
45+
46+
ZK-native hash functions (Poseidon, MiMC, Rescue) are designed around algebraic operations (field addition and exponentiation) that map directly to arithmetic circuit primitives, yielding 50-100x fewer constraints.
47+
48+
### WASM Deployment Constraints
49+
50+
The `cosmetic-wasm` crate targets `wasm32-unknown-unknown` with `crate-type = ["cdylib", "rlib"]`. This imposes:
51+
52+
- **Binary size budget**: Compressed WASM should remain under 200KB for reasonable browser load times. Full ZK libraries (e.g., arkworks, halo2_proofs with gadgets) can exceed 5MB compiled to WASM.
53+
- **No native intrinsics**: WASM lacks SIMD instructions in the base spec (though WASM SIMD 128-bit is increasingly available). SHA-256 benefits from hardware acceleration (SHA-NI on x86_64) that is unavailable in WASM.
54+
- **Memory model**: 32-bit linear memory with a 4GB ceiling. Large circuit witness generation can pressure memory limits.
55+
- **Execution model**: Single-threaded unless Web Workers are used via `wasm-bindgen-rayon`. Proof generation benefits greatly from parallelism.
56+
57+
### Existing Implementation State
58+
59+
The current `hasher.rs` implements:
60+
- SHA-256 via the `sha2` crate for all standard operations
61+
- Domain-separated hashing: `0x00` prefix for leaves, `0x01` for internal nodes, `0x02` for attestations
62+
- A placeholder `poseidon_hash` behind the `poseidon` feature flag that wraps SHA-256 (for API shape testing)
63+
- A pre-computed `DEFAULT_EMPTY` constant for absent leaf positions
64+
65+
---
66+
67+
## Decision
68+
69+
### Dual-Hash Architecture: Poseidon for ZK Circuits, SHA-256/Blake3 for Non-ZK Operations
70+
71+
We adopt a **stratified hash function architecture** with the following assignments:
72+
73+
#### Layer 1: Standard Tree Operations (Non-ZK)
74+
75+
**SHA-256** remains the default hash function for all non-ZK tree operations:
76+
77+
- Leaf hashing: `H(0x00 || key || value)`
78+
- Internal node hashing: `H(0x01 || left || right)`
79+
- Key derivation: `H(data)` to compute 256-bit leaf addresses
80+
- Attestation binding: `H(0x02 || input_root || output_root || function_id || params)`
81+
- Default empty leaf: Pre-computed `SHA-256("cosmetic_empty_leaf")`
82+
83+
**Rationale**: SHA-256 is battle-tested, widely available in WASM (`sha2` crate compiles cleanly to `wasm32`), and provides the 256-bit output that directly maps to the tree's address space. The `sha2` crate adds approximately 15KB to WASM binary size.
84+
85+
**Future consideration**: Blake3 offers approximately 3-5x higher throughput than SHA-256 on conventional hardware and compiles well to WASM. If non-ZK hashing becomes a bottleneck (e.g., bulk tree construction with thousands of leaves), Blake3 can be introduced as an optional backend behind a feature flag without changing the tree architecture, since both produce 256-bit outputs.
86+
87+
#### Layer 2: Zero-Knowledge Circuit Operations
88+
89+
**Poseidon** is the designated hash function for all operations that must be verified inside a ZK proof circuit:
90+
91+
- Leaf Transform Ratio (LTR) proofs: Proving correct leaf-level computation
92+
- Merkle Record Path (MRP) proofs: Proving valid paths from leaf to root
93+
- Inclusion certificate generation: Proving a key exists with its value
94+
- Exclusion certificate generation: Proving a key is absent (default-empty at its position)
95+
96+
**Rationale**: The CoSMeTIC paper's use of the ezkl framework with Halo2 backend confirms algebraic-operation-native hashing is essential for practical proof generation. Poseidon requires approximately 250-350 R1CS constraints per hash invocation versus approximately 25,000-30,000 for SHA-256 -- a reduction of roughly 80-100x. For a tree of depth K, each inclusion/exclusion proof requires K hash evaluations inside the circuit. At K=256, this means:
97+
98+
- **SHA-256 in circuit**: ~6.4-7.7 million constraints per proof path
99+
- **Poseidon in circuit**: ~64,000-89,600 constraints per proof path
100+
101+
This difference determines whether proof generation completes in seconds (Poseidon) or minutes (SHA-256) on consumer hardware, and whether it is feasible at all in WASM with its memory constraints.
102+
103+
**Poseidon configuration**:
104+
- Field: BN254 scalar field (matching existing Rust ecosystem: `ark-bn254`, `halo2curves`)
105+
- Arity: 2 (binary tree, width-3 sponge: rate=2, capacity=1)
106+
- Security level: 128 bits
107+
- Full rounds: 8, partial rounds: 56 (per Poseidon reference specification)
108+
109+
#### Layer 3: Bridge Layer (Hash Compatibility)
110+
111+
A **bridge hash** mechanism links the two layers when a proof must attest to the same data that exists in the SHA-256-based tree:
112+
113+
1. The prover holds the pre-image data (leaf value, key)
114+
2. Inside the ZK circuit, the prover re-hashes using Poseidon to produce a Poseidon-based root
115+
3. A separate algebraic commitment binds the Poseidon root to the SHA-256 root via a signed attestation outside the circuit
116+
4. The verifier checks: (a) the ZK proof is valid against the Poseidon root, and (b) the attestation correctly binds the Poseidon root to the published SHA-256 root
117+
118+
This avoids the need to hash SHA-256 inside the ZK circuit while maintaining a verifiable link between the two hash domains.
119+
120+
### Feature Flag Design
121+
122+
```toml
123+
[features]
124+
default = ["console_error_panic_hook"]
125+
# Enable Poseidon hash for ZK-friendly circuits
126+
poseidon = ["dep:poseidon-permutation"]
127+
# Enable Blake3 as an alternative non-ZK hash
128+
blake3 = ["dep:blake3"]
129+
# Enable computation attestation proofs
130+
attestation = []
131+
# Enable all features
132+
full = ["poseidon", "attestation", "blake3"]
133+
```
134+
135+
### Hash Trait Abstraction
136+
137+
```rust
138+
/// Trait abstracting over hash function implementations
139+
pub trait TreeHasher: Clone + Send + Sync {
140+
/// Hash output size in bytes
141+
const OUTPUT_SIZE: usize;
142+
143+
/// Hash arbitrary data
144+
fn hash(data: &[u8]) -> Hash;
145+
146+
/// Hash a leaf node with domain separation
147+
fn hash_leaf(key: &Hash, value: &[u8]) -> Hash;
148+
149+
/// Hash an internal node with domain separation
150+
fn hash_internal(left: &Hash, right: &Hash) -> Hash;
151+
152+
/// The default hash for empty/absent leaves
153+
fn default_empty() -> Hash;
154+
}
155+
```
156+
157+
This trait allows the tree implementation to be generic over the hash function, enabling testing with fast non-cryptographic hashes and production use with either SHA-256 or Poseidon depending on context.
158+
159+
---
160+
161+
## Consequences
162+
163+
### Positive
164+
165+
1. **Practical ZK proof generation**: Poseidon reduces circuit constraint count by ~80-100x compared to SHA-256, making in-browser WASM proof generation feasible for trees of practical depth.
166+
167+
2. **Backward compatibility**: SHA-256 remains the default for non-ZK operations, preserving compatibility with the existing `hasher.rs` implementation and all current tests.
168+
169+
3. **Minimal WASM binary impact**: The `sha2` crate adds ~15KB. Poseidon is opt-in behind a feature flag and only pulled in when ZK functionality is needed.
170+
171+
4. **Domain separation preserved**: Both hash layers use domain-separated inputs (leaf vs. internal vs. attestation), preventing cross-domain collision attacks regardless of which hash function is active.
172+
173+
5. **Ecosystem alignment**: Poseidon over BN254 aligns with the dominant Rust ZK ecosystem (arkworks, halo2, circom). Multiple production Rust crates exist: `poseidon-merkle`, `dusk-poseidon`, `light-poseidon`.
174+
175+
### Negative
176+
177+
1. **Dual-root complexity**: The bridge layer introduces a second root hash (Poseidon-based) that must be kept in sync with the SHA-256 root. This adds implementation complexity and a potential consistency failure mode.
178+
179+
2. **Poseidon security maturity**: Poseidon has received significant cryptanalysis but has not been subjected to the decades of scrutiny that SHA-256 has. The security margin of ZK-friendly hash functions remains an active area of research. Newer candidates (Anemoi, Griffin) claim better efficiency but have even less cryptanalytic history.
180+
181+
3. **Performance asymmetry**: Poseidon is optimized for arithmetic circuits, not for raw throughput on conventional hardware. On native/WASM without ZK circuits, Poseidon is approximately 10-50x slower than SHA-256 for the same input size. It should never be used as a general-purpose hash outside of ZK contexts.
182+
183+
4. **Library dependency surface**: Adding Poseidon introduces dependencies on finite field arithmetic libraries (e.g., `ark-ff`, `pasta_curves`, or `halo2curves`), which increase WASM binary size by 100-500KB depending on the implementation chosen.
184+
185+
### Risks and Mitigations
186+
187+
| Risk | Likelihood | Impact | Mitigation |
188+
|------|------------|--------|------------|
189+
| Poseidon algebraic attack discovered | Low | High | Monitor ePrint/IACR; trait abstraction allows swapping to MiMC/Rescue/Anemoi |
190+
| WASM binary exceeds budget with Poseidon deps | Medium | Medium | Feature-flag isolation; consider vendoring minimal Poseidon implementation |
191+
| Dual-root consistency bug | Medium | High | Extensive property-based testing; root binding attestation includes both roots |
192+
| Blake3 needed for bulk operations | Low | Low | Already designed as feature flag; drop-in compatible via `TreeHasher` trait |
193+
194+
---
195+
196+
## References
197+
198+
- Ramanan, P. et al. "CoSMeTIC: Zero-Knowledge Computational Sparse Merkle Trees with Inclusion-Exclusion Proofs for Clinical Research." arXiv:2601.12136, January 2026.
199+
- Grassi, L. et al. "Poseidon: A New Hash Function for Zero-Knowledge Proof Systems." USENIX Security 2021.
200+
- Bowe, S. et al. "Halo: Recursive Proof Composition without a Trusted Setup." IACR ePrint 2019/1021.
201+
- Dahlberg, R. et al. "Efficient Sparse Merkle Trees." Nordic Conference on Secure IT Systems, 2016.
202+
- ZK-Plus. "Benchmarks of Hashing Algorithms in ZoKrates." https://zk-plus.github.io/tutorials/basics/hashing-algorithms-benchmarks
203+
- Zellic Research. "ZK-Friendly Hash Functions." https://www.zellic.io/blog/zk-friendly-hash-functions/

0 commit comments

Comments
 (0)