You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+87-40Lines changed: 87 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed proving using Ray, layer-wise partitioning, and Merkle trees.
4
4
5
-
> **⚠️ Status Note:** This is an experimental research project. For production zkml, consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) which uses proof folding for parallelization. See [Status and Limitations](#status-and-limitations) for details.
5
+
> **⚠️ Status Note:** This is an experimental research project. Also consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch).
6
6
7
7
## Completed Milestones
8
8
@@ -11,15 +11,16 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
11
11
3.~~**Ray-Rust integration**: Connect Python Ray workers to Rust proof generation ([#9](https://github.com/ray-project/distributed-zkml/issues/9))~~ Done
12
12
4.~~**GPU acceleration**: ICICLE GPU backend for MSM operations ([#10](https://github.com/ray-project/distributed-zkml/issues/10))~~ Done - see [GPU Acceleration](#gpu-acceleration)
13
13
14
-
**Note**: For production zkML, see [zk-torch](https://github.com/uiuc-kang-lab/zk-torch) or [Status and Limitations](#status-and-limitations).
15
-
16
14
---
17
15
18
16
## Table of Contents
19
17
20
18
-[Status and Limitations](#status-and-limitations)
-[Security Model and Trust Boundaries](#security-model-and-trust-boundaries)
23
+
-[Structure](#structure)
23
24
-[Requirements](#requirements)
24
25
-[Quick Start](#quick-start)
25
26
-[GPU Acceleration](#gpu-acceleration)
@@ -38,7 +39,10 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
38
39
39
40
**Proof Composition**: This implementation generates separate proofs per chunk. It does not implement recursive proof composition or aggregation. Verifiers must check O(n) proofs rather than O(1), limiting succinctness.
40
41
41
-
**Security Assumptions**: The distributed trust model (Ray workers) is not formally analyzed. It does not address malicious worker resistance, collusion resistance, and Byzantine fault tolerance.
42
+
**Trust Domain**:
43
+
-**Merkle trees provide privacy for proof readers, not compute providers**: The prover must know all weights and activations to generate a valid ZK proof. Merkle trees hide intermediate values from people *reading the published proof*, not from the compute provider *during execution*.
44
+
-**Multi-party security requires different trust domains**: Security only applies when chunks are distributed across different trust domains (e.g., your servers + AWS), not just different AWS regions.
45
+
-**Comparison to TEE/FHE/MPC**: Trusted Execution Environments (TEEs), Fully Homomorphic Encryption (FHE), or Multi-Party Computation (MPC) provide stronger privacy guarantees but at significant costs that are beyond the threshold of scalable AI applications.
42
46
43
47
### When to Use This
44
48
@@ -47,6 +51,11 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
47
51
- Need examples of Ray integration for cryptographic workloads
48
52
- Studying Merkle-based privacy for intermediate computations
49
53
- Building distributed halo2 proving (not zkML-specific)
54
+
-**Use case**: You trust compute providers but want to limit public proof exposure, or model is partitioned across multiple non-colluding organizations
55
+
56
+
**Use alternatives if:**
57
+
- Need to hide data from compute providers themselves → Requires TEEs/FHE/MPC
58
+
- Need single aggregated proof → Consider [zk-torch](https://github.com/uiuc-kang-lab/zk-torch)
50
59
51
60
---
52
61
@@ -55,17 +64,17 @@ This project implements a **Ray-based distributed proving approach** for zkml. I
55
64
This repository extends zkml (see [ZKML paper](https://ddkang.github.io/papers/2024/zkml-eurosys.pdf)) with distributed proving capabilities. zkml provides an optimizing compiler from TensorFlow to halo2 ZK-SNARK circuits.
56
65
57
66
distributed-zkml adds:
58
-
-**Layer-wise partitioning**: Split ML models into chunks for parallel proving
59
-
-**Merkle trees**: Privacy-preserving commitments to intermediate values using Poseidon hashing
60
-
-**Ray integration**: Distributed execution across GPU workers
67
+
-**Layer-wise partitioning**: Split ML models into chunks for parallel proving across GPUs via Ray
68
+
-**Merkle tree commitments**: Hash intermediate activations with Poseidon; only publish root in proof
| Architecture | Single-machine | Distributed across GPUs |
67
76
| Scalability | Single GPU memory | Horizontal scaling |
68
-
| Privacy | Outputs public | Intermediate values private via Merkle trees |
77
+
| Privacy | Outputs public | Intermediate values hidden from proof readers via Merkle trees |
69
78
70
79
## Implementation
71
80
@@ -76,32 +85,70 @@ distributed-zkml adds:
76
85
3.**Merkle Commitments**: Hash intermediate outputs with Poseidon, only root is public
77
86
4.**On-Chain**: Publish only the Merkle root (O(1) public values vs O(n) without)
78
87
79
-
**Note**: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see [zk-orch](hhttps://github.com/uiuc-kang-lab/zk-torch)'s accumulation-based approach.
88
+
**Note**: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see [zk-torch](https://github.com/uiuc-kang-lab/zk-torch)'s accumulation-based approach.
80
89
81
-
\`\`\`
90
+
```
82
91
Model: 9 layers -> 3 chunks
83
92
Chunk 1: Layers 0-2 -> GPU 1 -> Hash A
84
93
Chunk 2: Layers 3-5 -> GPU 2 -> Hash B
85
94
Chunk 3: Layers 6-8 -> GPU 3 -> Hash C
86
95
87
96
Merkle Tree:
88
97
Root (public)
89
-
/ \\
98
+
/ \
90
99
Hash(AB) Hash C
91
-
/ \\
100
+
/ \
92
101
Hash A Hash B
93
-
\`\`\`
102
+
```
103
+
104
+
### Trust Boundaries
105
+
106
+
#### What Merkle Trees Provide
107
+
108
+
| Scenario | Hidden? | Explanation |
109
+
|----------|---------|-------------|
110
+
| Proof readers reconstructing weights via model inversion | Yes | Intermediate activations are hashed, not exposed in proof |
111
+
| Compute provider seeing weights during execution | No | Provider must have weights to generate ZK proof |
112
+
| Compute provider seeing intermediate activations during execution | No | Provider computes them |
113
+
114
+
**Key insight:** Merkle trees hide intermediate values from people *reading the published proof*, not from the compute provider *during execution*. The prover must know all values to generate a valid ZK proof.
115
+
116
+
#### Multi-Party Proving and Trust Domains
117
+
118
+
Security depends on **trust domains**, not physical location:
119
+
120
+
| Setup | Trust Domains | What's Private |
121
+
|-------|---------------|----------------|
122
+
| Single AWS account (any region) | 1 | Nothing from AWS — they control all regions |
123
+
| Your servers + AWS | 2 | Your portion's weights never sent to AWS |
124
+
| AWS + Google + Azure | 3 | Each provider sees only their chunk (assuming non-collusion) |
125
+
126
+
**Multi-party benefit:** If model is partitioned across different trust domains (e.g., your servers + AWS), no single party has the full model. Combined with Merkle trees, this provides layered privacy:
127
+
-**Partitioning** → limits what any single provider can access
128
+
-**Merkle trees** → limits what proof readers can observe
129
+
130
+
#### Comparison with ZKTorch
131
+
132
+
| Aspect | distributed-zkml | ZKTorch |
133
+
|--------|------------------|---------|
134
+
| Scaling strategy | Horizontal (more machines via Ray) | Vertical (proof compression via Mira) |
135
+
| Final output | N separate proofs | 1 accumulated proof |
136
+
| Verification cost | O(N) proofs to verify | O(1) single proof |
137
+
| Intermediate privacy | Merkle trees hide from proof readers | Exposed in proof |
138
+
| Base system | halo2 (~30M param limit) | Custom pairing-based (6B params tested) |
139
+
140
+
**These approaches are orthogonal** — could theoretically combine Ray parallelism with Mira accumulation.
94
141
95
142
### Structure
96
143
97
-
\`\`\`
144
+
```
98
145
distributed-zkml/
99
146
├── python/ # Python wrappers for Rust prover
100
147
├── tests/ # Distributed proving tests
101
148
└── zkml/ # zkml (modified for Merkle + chunking)
102
149
├── src/bin/prove_chunk.rs
103
150
└── testing/
104
-
\`\`\`
151
+
```
105
152
106
153
## Requirements
107
154
@@ -115,12 +162,12 @@ Just Docker and Docker Compose. Everything else is in the container.
115
162
|------------|-------|
116
163
| Rust (nightly) | Install via [rustup](https://rustup.rs/)|
0 commit comments