docs: clarify compares, limitations, boundaries

mascharkh · mascharkh · commit eaf493e4919d · 2026-01-04T19:56:10.000-08:00
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed proving using Ray, layer-wise partitioning, and Merkle trees.
 
-> **⚠️ Status Note:** This is an experimental research project. For production zkml, consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) which uses proof folding for parallelization. See [Status and Limitations](#status-and-limitations) for details.
+> **⚠️ Status Note:** This is an experimental research project. Also consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch).
 
 ## Completed Milestones
 
@@ -11,15 +11,16 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
 3. ~~**Ray-Rust integration**: Connect Python Ray workers to Rust proof generation ([#9](https://github.com/ray-project/distributed-zkml/issues/9))~~ Done
 4. ~~**GPU acceleration**: ICICLE GPU backend for MSM operations ([#10](https://github.com/ray-project/distributed-zkml/issues/10))~~ Done - see [GPU Acceleration](#gpu-acceleration)
 
-**Note**: For production zkML, see [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) or [Status and Limitations](#status-and-limitations).
-
 ---
 
 ## Table of Contents
 
 - [Status and Limitations](#status-and-limitations)
 - [Overview](#overview)
 - [Implementation](#implementation)
+  - [How Distributed Proving Works](#how-distributed-proving-works)
+  - [Security Model and Trust Boundaries](#security-model-and-trust-boundaries)
+  - [Structure](#structure)
 - [Requirements](#requirements)
 - [Quick Start](#quick-start)
 - [GPU Acceleration](#gpu-acceleration)
@@ -38,7 +39,10 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
 
 **Proof Composition**: This implementation generates separate proofs per chunk. It does not implement recursive proof composition or aggregation. Verifiers must check O(n) proofs rather than O(1), limiting succinctness.
 
-**Security Assumptions**: The distributed trust model (Ray workers) is not formally analyzed. It does not address malicious worker resistance, collusion resistance, and Byzantine fault tolerance.
+**Trust Domain**: 
+- **Merkle trees provide privacy for proof readers, not compute providers**: The prover must know all weights and activations to generate a valid ZK proof. Merkle trees hide intermediate values from people *reading the published proof*, not from the compute provider *during execution*.
+- **Multi-party security requires different trust domains**: Security only applies when chunks are distributed across different trust domains (e.g., your servers + AWS), not just different AWS regions.
+- **Comparison to TEE/FHE/MPC**: Trusted Execution Environments (TEEs), Fully Homomorphic Encryption (FHE), or Multi-Party Computation (MPC) provide stronger privacy guarantees but at significant costs that are beyond the threshold of scalable AI applications.
 
 ### When to Use This
 
@@ -47,6 +51,11 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
 - Need examples of Ray integration for cryptographic workloads
 - Studying Merkle-based privacy for intermediate computations
 - Building distributed halo2 proving (not zkML-specific)
+- **Use case**: You trust compute providers but want to limit public proof exposure, or model is partitioned across multiple non-colluding organizations
+
+**Use alternatives if:**
+- Need to hide data from compute providers themselves → Requires TEEs/FHE/MPC 
+- Need single aggregated proof → Consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch)
 
 ---
 
@@ -55,17 +64,17 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
 This repository extends zkml (see [ZKML paper](https://ddkang.github.io/papers/2024/zkml-eurosys.pdf)) with distributed proving capabilities. zkml provides an optimizing compiler from TensorFlow to halo2 ZK-SNARK circuits.
 
 distributed-zkml adds:
-- **Layer-wise partitioning**: Split ML models into chunks for parallel proving
-- **Merkle trees**: Privacy-preserving commitments to intermediate values using Poseidon hashing
-- **Ray integration**: Distributed execution across GPU workers
+- **Layer-wise partitioning**: Split ML models into chunks for parallel proving across GPUs via Ray
+- **Merkle tree commitments**: Hash intermediate activations with Poseidon; only publish root in proof
+- **ICICLE GPU acceleration**: Hardware-accelerated MSM operations
 
 ### Comparison to zkml
 
 | Feature | zkml | distributed-zkml |
 |---------|------|------------------|
 | Architecture | Single-machine | Distributed across GPUs |
 | Scalability | Single GPU memory | Horizontal scaling |
-| Privacy | Outputs public | Intermediate values private via Merkle trees |
+| Privacy | Outputs public | Intermediate values hidden from proof readers via Merkle trees |
 
 ## Implementation
 
@@ -76,32 +85,70 @@ distributed-zkml adds:
 3. **Merkle Commitments**: Hash intermediate outputs with Poseidon, only root is public
 4. **On-Chain**: Publish only the Merkle root (O(1) public values vs O(n) without)
 
-**Note**: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see [zk-orch](hhttps://github.com/uiuc-kang-lab/zk-torch)'s accumulation-based approach.
+**Note**: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see [zk-torch](https://github.com/uiuc-kang-lab/zk-torch)'s accumulation-based approach.
 
-\`\`\`
+```
 Model: 9 layers -> 3 chunks
   Chunk 1: Layers 0-2 -> GPU 1 -> Hash A
   Chunk 2: Layers 3-5 -> GPU 2 -> Hash B
   Chunk 3: Layers 6-8 -> GPU 3 -> Hash C
 
 Merkle Tree:
         Root (public)
-       /    \\
+       /    \
     Hash(AB) Hash C
-    /    \\
+    /    \
  Hash A  Hash B
-\`\`\`
+```
+
+### Trust Boundaries
+
+#### What Merkle Trees Provide
+
+| Scenario | Hidden? | Explanation |
+|----------|---------|-------------|
+| Proof readers reconstructing weights via model inversion | Yes | Intermediate activations are hashed, not exposed in proof |
+| Compute provider seeing weights during execution | No | Provider must have weights to generate ZK proof |
+| Compute provider seeing intermediate activations during execution | No | Provider computes them |
+
+**Key insight:** Merkle trees hide intermediate values from people *reading the published proof*, not from the compute provider *during execution*. The prover must know all values to generate a valid ZK proof.
+
+#### Multi-Party Proving and Trust Domains
+
+Security depends on **trust domains**, not physical location:
+
+| Setup | Trust Domains | What's Private |
+|-------|---------------|----------------|
+| Single AWS account (any region) | 1 | Nothing from AWS — they control all regions |
+| Your servers + AWS | 2 | Your portion's weights never sent to AWS |
+| AWS + Google + Azure | 3 | Each provider sees only their chunk (assuming non-collusion) |
+
+**Multi-party benefit:** If model is partitioned across different trust domains (e.g., your servers + AWS), no single party has the full model. Combined with Merkle trees, this provides layered privacy:
+- **Partitioning** → limits what any single provider can access
+- **Merkle trees** → limits what proof readers can observe
+
+#### Comparison with ZKTorch
+
+| Aspect | distributed-zkml | ZKTorch |
+|--------|------------------|---------|
+| Scaling strategy | Horizontal (more machines via Ray) | Vertical (proof compression via Mira) |
+| Final output | N separate proofs | 1 accumulated proof |
+| Verification cost | O(N) proofs to verify | O(1) single proof |
+| Intermediate privacy | Merkle trees hide from proof readers | Exposed in proof |
+| Base system | halo2 (~30M param limit) | Custom pairing-based (6B params tested) |
+
+**These approaches are orthogonal** — could theoretically combine Ray parallelism with Mira accumulation.
 
 ### Structure
 
-\`\`\`
+```
 distributed-zkml/
 ├── python/                 # Python wrappers for Rust prover
 ├── tests/                  # Distributed proving tests
 └── zkml/                   # zkml (modified for Merkle + chunking)
     ├── src/bin/prove_chunk.rs
     └── testing/
-\`\`\`
+```
 
 ## Requirements
 
@@ -115,12 +162,12 @@ Just Docker and Docker Compose. Everything else is in the container.
 |------------|-------|
 | Rust (nightly) | Install via [rustup](https://rustup.rs/) |
 | Python >=3.10 | |
-| pip | \`pip install -e .\` |
-| Build tools | Linux: \`build-essential pkg-config libssl-dev\`; macOS: Xcode CLI |
+| pip | `pip install -e .` |
+| Build tools | Linux: `build-essential pkg-config libssl-dev`; macOS: Xcode CLI |
 
-**Python deps** (installed via \`pip install -e .\`):
-- \`ray[default]>=2.31.0\`
-- \`msgpack\`, \`numpy\`
+**Python deps** (installed via `pip install -e .`):
+- `ray[default]>=2.31.0`
+- `msgpack`, `numpy`
 
 **Optional**: NVIDIA GPU + CUDA 12.x + ICICLE backend for GPU acceleration
 
@@ -130,16 +177,16 @@ Just Docker and Docker Compose. Everything else is in the container.
 
 ### Docker
 
-\`\`\`bash
+```bash
 docker compose build dev
 docker compose run --rm dev
 # Inside container:
 cd zkml && cargo test --test merkle_tree_test -- --nocapture
-\`\`\`
+```
 
 ### Native
 
-\`\`\`bash
+```bash
 # Install Rust
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
@@ -148,7 +195,7 @@ cd zkml && rustup override set nightly && cargo build --release && cd ..
 
 # Python deps
 pip install -e .
-\`\`\`
+```
 
 ---
 
@@ -164,9 +211,9 @@ Uses [ICICLE](https://github.com/ingonyama-zk/icicle) for GPU-accelerated MSM (M
 
 ### Setup
 
-\`\`\`bash
+```bash
 # 1. Download ICICLE backend (Ubuntu 22.04 - use ubuntu20 for 20.04)
-curl -L -o /tmp/icicle.tar.gz \\
+curl -L -o /tmp/icicle.tar.gz \
   https://github.com/ingonyama-zk/icicle/releases/download/v3.1.0/icicle_3_1_0-ubuntu22-cuda122.tar.gz
 
 # 2. Install
@@ -180,13 +227,13 @@ cd zkml && cargo build --release --features gpu
 
 # 5. Verify
 cargo test --test gpu_benchmark_test --release --features gpu -- --nocapture
-\`\`\`
+```
 
 Expected output:
-\`\`\`
+```
 Registered devices: ["CUDA", "CPU"]
 Successfully set CUDA device 0
-\`\`\`
+```
 
 ### Benchmarks (T4)
 
@@ -198,41 +245,41 @@ Successfully set CUDA device 0
 
 ### FFT/NTT Notes
 
-- **Measure FFT time**: \`HALO2_FFT_STATS=1\`
-- **GPU NTT (experimental)**: \`HALO2_USE_GPU_NTT=1\` - currently slower due to conversion overhead
+- **Measure FFT time**: `HALO2_FFT_STATS=1`
+- **GPU NTT (experimental)**: `HALO2_USE_GPU_NTT=1` - currently slower due to conversion overhead
 
 ---
 
 ## Testing
 
 ### Distributed Proving
 
-\`\`\`bash
+```bash
 # Simulation (fast)
-python tests/simple_distributed.py \\
-    --model zkml/examples/mnist/model.msgpack \\
-    --input zkml/examples/mnist/inp.msgpack \\
+python tests/simple_distributed.py \
+    --model zkml/examples/mnist/model.msgpack \
+    --input zkml/examples/mnist/inp.msgpack \
     --layers 4 --workers 2
 
 # Real proofs
 python tests/simple_distributed.py ... --real
-\`\`\`
+```
 
 ### Rust Tests
 
-\`\`\`bash
+```bash
 cd zkml
 cargo test --test merkle_tree_test --test chunk_execution_test -- --nocapture
-\`\`\`
+```
 
 ### CI
 
-Runs on PRs to \`main\`/\`dev\`: builds zkml, runs tests (~3-4 min). GPU tests excluded to save costs.
+Runs on PRs to `main`/`dev`: builds zkml, runs tests (~3-4 min). GPU tests excluded to save costs.
 
 ---
 
 ## References
 
 - [ZKML Paper](https://ddkang.github.io/papers/2024/zkml-eurosys.pdf) (EuroSys '24) - Original zkml framework
 - [zkml Repository](https://github.com/uiuc-kang-lab/zkml) - Base framework this project extends
-- [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) - Alternative approach using proof accumulation/folding (from same research group)
+- [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) - Alternative approach using proof accumulation/folding.