feat: GPU-accelerated MSM via ICICLE for KZG proving#17
Merged
Conversation
added 10 commits
January 1, 2026 15:58
- Add Dockerfile with multi-stage build (CUDA + Rust + Python) - Add docker-compose.yml for dev/test/gpu services - Add rust-toolchain.toml pinning nightly-2024-12-01 - Add pyproject.toml with uv for Python dependency management - Commit Cargo.lock and uv.lock for reproducible builds - Update CI to verify Docker build works - Update README with Docker quickstart Relates to #10 (GPU acceleration)
- Added ICICLE packages (icicle-bn254, icicle-core, icicle-runtime) for GPU-accelerated multi-exponentiation. - Updated Cargo.toml to include optional GPU features. - Enhanced best_multiexp function to utilize GPU when available and enabled. - Introduced new dependencies in Cargo.lock for improved performance.
- Improved GPU multi-exponentiation capabilities in best_multiexp function. - Updated dependencies in Cargo.toml and Cargo.lock for better performance. - Ensured compatibility with optional GPU features for enhanced acceleration.
- Dispatch BN256 MSMs in halo2_proofs best_multiexp to ICICLE (CUDA) when built with --features gpu. - Add FFT/NTT notes + env toggles in README, and print MSM/NTT/FFT stats in proof test output. - Add an opt-in ICICLE NTT path and a benchmark (currently slower than CPU due to conversion overhead).
Signed-off-by: Masoud <masoud@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
halo2_proofs: Addedgpu_msm.rs,gpu_ntt.rsmodules with ICICLE integrationarithmetic.rs: Specialization-based dispatch forbest_multiexpandbest_fftgpu_benchmark_test.rs: Standalone GPU MSM/NTT benchmarkschunk_proof_test.rs: Added GPU call counters and FFT stats outputREADME.md: GPU setup instructions and benchmark resultsTest plan
cargo test --test gpu_benchmark_test --release --features gpu- all 4 tests passCloses #10