Skip to content

feat: GPU-accelerated MSM via ICICLE for KZG proving#17

Merged
mascharkh merged 12 commits intomainfrom
feat/gpu-msm-kzg
Jan 2, 2026
Merged

feat: GPU-accelerated MSM via ICICLE for KZG proving#17
mascharkh merged 12 commits intomainfrom
feat/gpu-msm-kzg

Conversation

@mascharkh
Copy link
Member

Summary

  • Integrates ICICLE CUDA backend for GPU-accelerated Multi-Scalar Multiplication (MSM) in Halo2's KZG proving
  • Uses Rust trait specialization to safely dispatch BN256 G1 MSMs to GPU while keeping generic curve support on CPU
  • Adds experimental GPU NTT path (opt-in, currently slower due to conversion overhead)
  • Includes GPU benchmarks and instrumentation for profiling

Changes

  • halo2_proofs: Added gpu_msm.rs, gpu_ntt.rs modules with ICICLE integration
  • arithmetic.rs: Specialization-based dispatch for best_multiexp and best_fft
  • gpu_benchmark_test.rs: Standalone GPU MSM/NTT benchmarks
  • chunk_proof_test.rs: Added GPU call counters and FFT stats output
  • README.md: GPU setup instructions and benchmark results

Test plan

  • cargo test --test gpu_benchmark_test --release --features gpu - all 4 tests pass
  • GPU MSM correctness verified (matches CPU output)
  • End-to-end proof generation with GPU MSM

Closes #10

masoud@anyscale.com added 10 commits January 1, 2026 15:58
- Add Dockerfile with multi-stage build (CUDA + Rust + Python)
- Add docker-compose.yml for dev/test/gpu services
- Add rust-toolchain.toml pinning nightly-2024-12-01
- Add pyproject.toml with uv for Python dependency management
- Commit Cargo.lock and uv.lock for reproducible builds
- Update CI to verify Docker build works
- Update README with Docker quickstart

Relates to #10 (GPU acceleration)
- Added ICICLE packages (icicle-bn254, icicle-core, icicle-runtime) for GPU-accelerated multi-exponentiation.
- Updated Cargo.toml to include optional GPU features.
- Enhanced best_multiexp function to utilize GPU when available and enabled.
- Introduced new dependencies in Cargo.lock for improved performance.
- Improved GPU multi-exponentiation capabilities in best_multiexp function.
- Updated dependencies in Cargo.toml and Cargo.lock for better performance.
- Ensured compatibility with optional GPU features for enhanced acceleration.
- Dispatch BN256 MSMs in halo2_proofs best_multiexp to ICICLE (CUDA) when built with --features gpu.
- Add FFT/NTT notes + env toggles in README, and print MSM/NTT/FFT stats in proof test output.
- Add an opt-in ICICLE NTT path and a benchmark (currently slower than CPU due to conversion overhead).
Signed-off-by: Masoud <masoud@anyscale.com>
@mascharkh mascharkh self-assigned this Jan 2, 2026
@mascharkh mascharkh merged commit b09d497 into main Jan 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: GPU acceleration for proof generation

1 participant