Skip to content

Commit b09d497

Browse files
author
Masoud
authored
feat: GPU-accelerated MSM via ICICLE for KZG proving (#17)
* feat: Add Docker, uv, rust-toolchain for reproducible builds - Add Dockerfile with multi-stage build (CUDA + Rust + Python) - Add docker-compose.yml for dev/test/gpu services - Add rust-toolchain.toml pinning nightly-2024-12-01 - Add pyproject.toml with uv for Python dependency management - Commit Cargo.lock and uv.lock for reproducible builds - Update CI to verify Docker build works - Update README with Docker quickstart Relates to #10 (GPU acceleration) * docs: improve requirements section, constrain Ray version for macOS * feat: add ICICLE GPU acceleration dependencies * fix: use ICICLE git dependencies instead of crates.io * feat: add GPU benchmark test using ICICLE * fix: correct ICICLE API call for get_device_properties * fix: load ICICLE CUDA backend before use * feat: integrate ICICLE GPU acceleration and dependencies - Added ICICLE packages (icicle-bn254, icicle-core, icicle-runtime) for GPU-accelerated multi-exponentiation. - Updated Cargo.toml to include optional GPU features. - Enhanced best_multiexp function to utilize GPU when available and enabled. - Introduced new dependencies in Cargo.lock for improved performance. * feat: enhance ICICLE GPU support and performance - Improved GPU multi-exponentiation capabilities in best_multiexp function. - Updated dependencies in Cargo.toml and Cargo.lock for better performance. - Ensured compatibility with optional GPU features for enhanced acceleration. * feat: wire ICICLE GPU MSM into KZG proving - Dispatch BN256 MSMs in halo2_proofs best_multiexp to ICICLE (CUDA) when built with --features gpu. - Add FFT/NTT notes + env toggles in README, and print MSM/NTT/FFT stats in proof test output. - Add an opt-in ICICLE NTT path and a benchmark (currently slower than CPU due to conversion overhead). * chore: remove backup file --------- Signed-off-by: Masoud <masoud@anyscale.com>
1 parent e7c50fe commit b09d497

File tree

11 files changed

+1014
-28
lines changed

11 files changed

+1014
-28
lines changed

README.md

Lines changed: 145 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
99
1. ~~**Make Merkle root public**: Add root to public values so next chunk can verify it~~ Done
1010
2. ~~**Complete proof generation**: Connect chunk execution to actual proof generation ([#8](https://github.com/ray-project/distributed-zkml/issues/8))~~ Done
1111
3. ~~**Ray-Rust integration**: Connect Python Ray workers to Rust proof generation ([#9](https://github.com/ray-project/distributed-zkml/issues/9))~~ Done
12-
4. **GPU acceleration**: Current implementation is CPU-based. GPU acceleration for proof generation requires additional work ([#10](https://github.com/ray-project/distributed-zkml/issues/10))
12+
4. **GPU acceleration**: ICICLE GPU backend integrated for MSM operations. See [GPU Acceleration](#gpu-acceleration) for setup. ([#10](https://github.com/ray-project/distributed-zkml/issues/10))
1313

1414
---
1515

@@ -19,6 +19,8 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
1919
- [Implementation](#implementation)
2020
- [How Distributed Proving Works](#how-distributed-proving-works)
2121
- [Structure](#structure)
22+
- [Requirements](#requirements)
23+
- [GPU Acceleration](#gpu-acceleration)
2224
- [Quick Start](#quick-start)
2325
- [Testing and CI](#testing-and-ci)
2426
- [CI](#ci)
@@ -106,7 +108,7 @@ distributed-zkml/
106108
└── testing/ # Rust test suites
107109
```
108110

109-
## Quick Start
111+
## Requirements
110112

111113
### Option 1: Docker (Recommended)
112114

@@ -123,34 +125,155 @@ docker compose run --rm test
123125

124126
### Option 2: Native Build
125127

126-
```bash
127-
# Ensure zkml is built first
128-
cd zkml
129-
rustup override set nightly
130-
cargo build --release
131-
cd ..
128+
- **Docker** and **Docker Compose** only
129+
- All other dependencies are included in the container image
132130

133131
# Install Python dependencies
134132
uv sync # or: pip install -e .
135133
```
136134
137-
### Run Distributed Proving
135+
**Required:**
136+
- **Rust** (nightly toolchain) - Install via [rustup](https://rustup.rs/)
137+
- **Python** (>=3.10, recommended 3.11-3.12)
138+
- macOS x86_64: Use Python 3.11 for Ray compatibility
139+
- **uv** (recommended) or **pip** - Python package manager
140+
- **System build tools**:
141+
- Linux: `build-essential`, `pkg-config`, `libssl-dev`
142+
- macOS: Xcode Command Line Tools (`xcode-select --install`)
143+
144+
**Python dependencies** (auto-installed via `uv sync` or `pip install -e .`):
145+
- `ray[default]>=2.9.0,<2.11.0` - Constrained for macOS x86_64 compatibility
146+
- `msgpack`, `numpy`
147+
148+
**Optional:**
149+
- `pytest` - For running tests (dev dependencies)
150+
- NVIDIA GPU + CUDA 12.x - For GPU-accelerated proving ops
151+
- ICICLE backend - GPU MSM/NTT acceleration (see [GPU Acceleration](#gpu-acceleration))
152+
153+
### Quick Reference
154+
155+
| Tool | Docker | Native | Notes |
156+
|------|--------|--------|-------|
157+
| Docker | Required | - | Only for containerized workflow |
158+
| Rust (nightly) | Included | Required | Builds zkml |
159+
| Python (>=3.10) | Included | Required | 3.11 recommended on macOS x86_64 |
160+
| uv/pip | Included | Required | Python package manager |
161+
| Ray | Included | Required | <2.11.0 for macOS x86_64 |
162+
| Build tools | Included | Required | System-specific |
163+
164+
---
165+
166+
---
167+
168+
## GPU Acceleration
169+
170+
GPU acceleration uses [ICICLE](https://github.com/ingonyama-zk/icicle) for GPU-accelerated MSM (Multi-Scalar Multiplication) operations.
171+
172+
### GPU Requirements
173+
174+
- NVIDIA GPU (tested on A10G, compatible with A100/H100)
175+
- CUDA 12.x drivers
176+
- Ubuntu 20.04+ (Ubuntu 22.04 recommended)
177+
178+
### GPU Setup
179+
180+
1. **Download ICICLE backend** (match your Ubuntu version):
138181
139182
```bash
140-
# Simulation mode (fast, no actual proofs)
141-
python3 tests/simple_distributed.py \
142-
--model zkml/examples/mnist/model.msgpack \
143-
--input zkml/examples/mnist/inp.msgpack \
144-
--layers 4 \
145-
--workers 2
183+
# Ubuntu 22.04
184+
curl -L -o /tmp/icicle.tar.gz \\
185+
https://github.com/ingonyama-zk/icicle/releases/download/v3.1.0/icicle_3_1_0-ubuntu22-cuda122.tar.gz
146186
147-
# Real mode (generates actual ZK proofs, ~2-3s per chunk)
148-
python3 tests/simple_distributed.py \
149-
--model zkml/examples/mnist/model.msgpack \
150-
--input zkml/examples/mnist/inp.msgpack \
151-
--layers 4 \
152-
--workers 2 \
153-
--real
187+
# Ubuntu 20.04
188+
curl -L -o /tmp/icicle.tar.gz \\
189+
https://github.com/ingonyama-zk/icicle/releases/download/v3.1.0/icicle_3_1_0-ubuntu20-cuda122.tar.gz
190+
```
191+
192+
2. **Install backend**:
193+
194+
```bash
195+
mkdir -p ~/.icicle
196+
tar -xzf /tmp/icicle.tar.gz -C /tmp
197+
cp -r /tmp/icicle/lib/backend ~/.icicle/
198+
```
199+
200+
3. **Set environment variable** (add to ~/.bashrc):
201+
202+
```bash
203+
export ICICLE_BACKEND_INSTALL_DIR=~/.icicle/backend
204+
```
205+
206+
4. **Build with GPU support**:
207+
208+
```bash
209+
cd zkml
210+
cargo build --release --features gpu
211+
```
212+
213+
5. **Verify GPU detection**:
214+
215+
```bash
216+
ICICLE_BACKEND_INSTALL_DIR=~/.icicle/backend \\
217+
cargo test --test gpu_benchmark_test --release --features gpu -- --nocapture
218+
```
219+
220+
Expected output:
221+
```
222+
Registered devices: ["CUDA", "CPU"]
223+
Successfully set CUDA device 0
224+
```
225+
226+
### Benchmark Results
227+
228+
Tested on 4x NVIDIA A10G (23GB each):
229+
230+
| Operation | Size | Time | Throughput |
231+
|-----------|------|------|------------|
232+
| GPU MSM | 2^12 (4K points) | 15ms | 260K pts/sec |
233+
| GPU MSM | 2^14 (16K points) | 6.5ms | 2.5M pts/sec |
234+
| GPU MSM | 2^16 (65K points) | 7.9ms | 8.3M pts/sec |
235+
| GPU MSM | 2^18 (262K points) | 13ms | 19.5M pts/sec |
236+
237+
238+
### FFT / NTT (how it’s used here)
239+
240+
Halo2 proving does a lot of polynomial work, and that uses FFTs. Over a finite field it’s usually called an NTT, but it’s the same “fast polynomial transform” idea. In this repo, a big chunk of proving time is from these FFT/NTT calls.
241+
242+
- **Measure it**: set `HALO2_FFT_STATS=1` (our proof test prints totals + call counts).
243+
- **GPU NTT (experimental)**: `HALO2_USE_GPU_NTT=1` turns on an ICICLE NTT path for BN256 `Fr`. It’s currently not faster due to conversion overhead, so it stays opt-in.
244+
245+
---
246+
247+
## Quick Start
248+
249+
### Option 1: Docker (Recommended)
250+
251+
```bash
252+
# Build the development image
253+
docker compose build dev
254+
255+
# Run interactive shell
256+
docker compose run --rm dev
257+
258+
# Inside container: run tests
259+
cd zkml && cargo test --test merkle_tree_test --test chunk_execution_test -- --nocapture
260+
```
261+
262+
### Option 2: Native Build
263+
264+
```bash
265+
# 1. Install Rust (if not already installed)
266+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
267+
source ~/.cargo/env
268+
269+
# 2. Build zkml
270+
cd zkml
271+
rustup override set nightly
272+
cargo build --release
273+
cd ..
274+
275+
# 3. Install Python dependencies
276+
uv sync # or: pip install -e .
154277
```
155278

156279
## Testing and CI

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ requires-python = ">=3.10"
77
license = {text = "MIT"}
88

99
dependencies = [
10-
"ray[default]>=2.9.0,<3.0.0",
10+
# Ray <2.11.0 required for macOS x86_64 compatibility
11+
"ray[default]>=2.9.0,<2.11.0",
1112
"msgpack>=1.0.0,<2.0.0",
1213
"numpy>=1.24.0,<2.0.0",
1314
]

zkml/Cargo.lock

Lines changed: 79 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

zkml/Cargo.toml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,15 @@ serde_json = "1.0.85"
3535
wav = "1.0.0"
3636
rayon = "1.5.1"
3737

38+
# GPU acceleration (optional) - ICICLE for GPU-accelerated MSM/NTT
39+
icicle-runtime = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
40+
icicle-bn254 = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
41+
icicle-core = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
42+
3843
[features]
44+
default = []
3945
ray = []
46+
gpu = ["halo2_proofs/gpu", "icicle-runtime", "icicle-bn254", "icicle-core"]
4047

4148
[dev-dependencies]
4249
rand_core = "0.6.4"
@@ -98,4 +105,8 @@ path = "testing/test_merkle_root_public.rs"
98105

99106
[[test]]
100107
name = "chunk_proof_test"
101-
path = "testing/chunk_proof_test.rs"
108+
path = "testing/chunk_proof_test.rs"
109+
110+
[[test]]
111+
name = "gpu_benchmark_test"
112+
path = "testing/gpu_benchmark_test.rs"

zkml/halo2/halo2_proofs/Cargo.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,11 @@ ark-std = { version = "0.3", features = ["print-trace"] }
6565
plotters = { version = "0.3.0", optional = true }
6666
tabbycat = { version = "0.1", features = ["attributes"], optional = true }
6767

68+
# GPU acceleration (optional) - ICICLE for GPU-accelerated MSM
69+
icicle-runtime = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
70+
icicle-bn254 = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
71+
icicle-core = { git = "https://github.com/ingonyama-zk/icicle.git", tag = "v3.1.0", optional = true }
72+
6873
[dev-dependencies]
6974
assert_matches = "1.5"
7075
criterion = "0.3"
@@ -82,6 +87,7 @@ gadget-traces = ["backtrace"]
8287
sanity-checks = []
8388
batch = ["rand_core/getrandom"]
8489
circuit-params = []
90+
gpu = ["icicle-runtime", "icicle-bn254", "icicle-core"]
8591

8692
[lib]
8793
bench = false

0 commit comments

Comments
 (0)