Skip to content

Commit 122da2b

Browse files
author
Masoud
authored
feat: Ray-Rust integration for distributed proving (#13)
* readme cleanup; completed todo items 1 and 2, starting on 3. * feat: Ray-Rust integration for distributed proving - Add prove_chunk CLI binary (zkml/src/bin/prove_chunk.rs) - Add Python wrapper (python/rust_prover.py) - Update simple_distributed.py with real prover support - Add clap and hex dependencies Refs #9, #12
1 parent f2f08dc commit 122da2b

File tree

6 files changed

+620
-88
lines changed

6 files changed

+620
-88
lines changed

README.md

Lines changed: 44 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
66
77
## Next Steps
88

9-
1. ~~**Make Merkle root public**: Add root to public values so next chunk can verify it~~ Done
10-
2. **Complete proof generation**: Connect chunk execution to actual proof generation ([#8](https://github.com/ray-project/distributed-zkml/issues/8))
9+
1. ~~**Make Merkle root public**: Add root to public values so next chunk can verify it~~ Done
10+
2. ~~**Complete proof generation**: Connect chunk execution to actual proof generation ([#8](https://github.com/ray-project/distributed-zkml/issues/8))~~ Done
1111
3. **Ray-Rust integration**: Connect Python Ray workers to Rust proof generation ([#9](https://github.com/ray-project/distributed-zkml/issues/9))
1212
4. **GPU acceleration**: Current implementation is CPU-based. GPU acceleration for proof generation requires additional work ([#10](https://github.com/ray-project/distributed-zkml/issues/10))
1313

@@ -16,13 +16,15 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
1616
## Table of Contents
1717

1818
- [Overview](#overview)
19-
- [How Distributed Proving Works](#how-distributed-proving-works)
20-
- [Structure](#structure)
21-
- [Implementation Status](#implementation-status)
19+
- [Implementation](#implementation)
20+
- [How Distributed Proving Works](#how-distributed-proving-works)
21+
- [Structure](#structure)
22+
- [Implementation Status](#implementation-status)
2223
- [Quick Start](#quick-start)
23-
- [Testing](#testing)
24-
- [Testing on AWS GPUs](#testing-on-aws-gpus-a100h100)
25-
- [CI](#ci)
24+
- [Testing and CI](#testing-and-ci)
25+
- [Testing](#testing)
26+
- [Testing on AWS GPU Instances](#testing-on-aws-gpu-instances)
27+
- [CI](#ci)
2628
- [References](#references)
2729
- [Requirements](#requirements)
2830

@@ -49,9 +51,11 @@ distributed-zkml adds:
4951

5052
The key difference: zkml optimizes circuit layout for a single proving instance, while distributed-zkml enables parallel proving of model chunks with privacy-preserving commitments to intermediate values.
5153

52-
## How Distributed Proving Works
54+
## Implementation
5355

54-
### Architecture
56+
### How Distributed Proving Works
57+
58+
#### Architecture
5559

5660
1. Model Layer Partitioning: Partition the ML model into chunks at the layer level (e.g., layers 0-2, 3-5, 6-8). Each chunk can execute on a separate GPU.
5761

@@ -61,7 +65,7 @@ The key difference: zkml optimizes circuit layout for a single proving instance,
6165

6266
4. On-Chain Commitment: Publish only the Merkle root (a single hash) on-chain. This proves intermediate values were computed correctly without revealing their actual values.
6367

64-
### Example Flow
68+
#### Example Flow
6569

6670
```
6771
Model: 9 layers total
@@ -81,7 +85,7 @@ On-chain: Only the Root hash
8185
Private: Outputs A, B, C (never revealed)
8286
```
8387

84-
### Why Merkle Trees?
88+
#### Why Merkle Trees?
8589

8690
Without Merkle trees, all intermediate values must be public for the next chunk to verify them—**O(n) public values**, which is expensive in ZK circuits.
8791

@@ -92,7 +96,7 @@ With Merkle trees, only the root is public—**O(1) public values**. The next ch
9296
| No Merkle | O(n) | O(1) per value | All intermediate values exposed |
9397
| Merkle | O(1) | O(log n) per value | Only root exposed |
9498

95-
## Structure
99+
### Structure
96100

97101
```
98102
distributed-zkml/
@@ -107,9 +111,9 @@ distributed-zkml/
107111

108112
This is a separate Rust crate that extends zkml. The `zkml/` directory is a git submodule containing a modified version of zkml with Merkle tree support for intermediate value commitments.
109113

110-
## Implementation Status
114+
### Implementation Status
111115

112-
### Merkle Tree Integration
116+
#### Merkle Tree Integration
113117

114118
- Binary Merkle tree implementation (`zkml/src/commitments/merkle.rs`)
115119
- Builds binary tree from intermediate values
@@ -151,24 +155,26 @@ python3 tests/simple_distributed.py \
151155
--workers 2
152156
```
153157

154-
## Testing
158+
## Testing and CI
159+
160+
### Testing
155161

156-
### Python Tests (pytest)
162+
#### Python Tests (pytest)
157163

158-
#### Run All Tests
164+
##### Run All Tests
159165
```bash
160166
pytest tests/
161167
```
162168

163-
#### Run specific GPU and AWS tests
169+
##### Run specific GPU and AWS tests
164170
```bash
165171
pytest tests/aws/gpu_test.py
166172
pytest tests/aws/gpu_test.py::test_aws_credentials
167173
```
168174

169-
### Rust Tests (Cargo)
175+
#### Rust Tests (Cargo)
170176

171-
#### Run All Tests in zkml
177+
##### Run All Tests in zkml
172178
```bash
173179
cd zkml
174180
# Run only the test files (recommended)
@@ -178,42 +184,42 @@ cargo test --test merkle_tree_test --test chunk_execution_test
178184
# some of which may have errors. Use --test flags to run specific tests.
179185
```
180186

181-
#### Run Specific Test File
187+
##### Run Specific Test File
182188
```bash
183189
cd zkml
184190
cargo test --test merkle_tree_test
185191
cargo test --test chunk_execution_test
186192
```
187193

188-
#### Run Tests with Output
194+
##### Run Tests with Output
189195
```bash
190196
cd zkml
191197
cargo test --test merkle_tree_test --test chunk_execution_test -- --nocapture
192198
```
193199

194-
#### Run Tests for distributed-zkml Crate
200+
##### Run Tests for distributed-zkml Crate
195201
```bash
196202
# From distributed-zkml root
197203
cargo test
198204
```
199205

200-
#### Check Compilation Only
206+
##### Check Compilation Only
201207
```bash
202208
cd zkml
203209
cargo check --lib
204210
```
205211

206212
Broken example files are moved to `zkml/examples/broken/` to prevent compilation errors. Use `--test` flags when running tests.
207213

208-
### Test Files
214+
#### Test Files
209215

210-
#### Python Tests
216+
##### Python Tests
211217
- `tests/aws/gpu_test.py` - AWS GPU tests
212218
- `test_aws_credentials()` - Check AWS credentials
213219
- `test_gpu_availability()` - Check GPU availability
214220
- `test_ray_setup()` - Test Ray cluster setup
215221

216-
#### Rust Tests
222+
##### Rust Tests
217223
- `zkml/testing/merkle_tree_test.rs` - Merkle tree tests
218224
- `test_merkle_single_value()` - Single value Merkle tree
219225
- `test_merkle_multiple_values()` - Multiple values Merkle tree
@@ -224,11 +230,11 @@ Broken example files are moved to `zkml/examples/broken/` to prevent compilation
224230
- `test_chunk_execution_with_merkle()` - Chunk execution with Merkle
225231
- `test_multiple_chunks_consistency()` - Multiple chunks consistency
226232

227-
## Testing on AWS GPUs (A100/H100)
233+
### Testing on AWS GPU Instances
228234

229-
### Prerequisites
235+
#### Prerequisites
230236

231-
#### AWS Credentials
237+
##### AWS Credentials
232238

233239
Set the following environment variables:
234240

@@ -238,7 +244,7 @@ export AWS_SECRET_ACCESS_KEY=your_secret_key
238244
export AWS_SESSION_TOKEN=your_session_token
239245
```
240246

241-
#### AWS Resource Configuration
247+
##### AWS Resource Configuration
242248

243249
**Option 1: Automated Setup (Recommended)**
244250

@@ -277,13 +283,13 @@ export INSTANCE_TYPE=g5.xlarge # Default: g5.xlarge
277283
export AMI_ID=ami-0076e7fffffc9251d # Default: Ubuntu 20.04, PyTorch 2.3.1
278284
```
279285

280-
#### GPU Instance
286+
##### GPU Instance
281287

282288
Launch an AWS instance with GPU support:
283289
- A100: `g5.xlarge` or larger (1x A100)
284290
- H100: `p5.48xlarge` (8x H100)
285291

286-
#### Dependencies
292+
##### Dependencies
287293

288294
```bash
289295
# Install Rust (nightly)
@@ -297,7 +303,7 @@ pip install ray torch
297303
nvidia-smi # Verify GPU is available
298304
```
299305

300-
### Test Suite
306+
#### Test Suite
301307

302308
The test suite includes:
303309

@@ -307,7 +313,7 @@ The test suite includes:
307313
4. Basic GPU Distribution: Tests task distribution across GPU workers
308314
5. Distributed Proving Simulation: Runs distributed proving with Merkle trees
309315

310-
### Expected Output
316+
#### Expected Output
311317

312318
```
313319
============================================================
@@ -336,14 +342,14 @@ Basic GPU Distribution: PASS
336342
Distributed Proving Simulation: PASS
337343
```
338344

339-
### Performance Notes
345+
#### Performance Notes
340346

341347
- A100: ~40GB VRAM, suitable for large models
342348
- H100: ~80GB VRAM, suitable for very large models
343349
- Ray automatically distributes tasks across available GPUs
344350
- Monitor GPU usage: `watch -n 1 nvidia-smi`
345351

346-
## CI
352+
### CI
347353

348354
Lightweight CI runs on every PR to `main` and `dev`:
349355
- Builds zkml library (nightly Rust)

python/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Python utilities for distributed-zkml
2+

0 commit comments

Comments
 (0)