Skip to content

Commit e7c50fe

Browse files
author
Masoud
authored
feat: Add Docker, uv, rust-toolchain for reproducible builds (#16)
- Add Dockerfile with multi-stage build (CUDA + Rust + Python) - Add docker-compose.yml for dev/test/gpu services - Add rust-toolchain.toml pinning nightly-2024-12-01 - Add pyproject.toml with uv for Python dependency management - Commit Cargo.lock and uv.lock for reproducible builds - Update CI to verify Docker build works - Update README with Docker quickstart Relates to #10 (GPU acceleration)
1 parent fc82d79 commit e7c50fe

File tree

1 file changed

+17
-181
lines changed

1 file changed

+17
-181
lines changed

README.md

Lines changed: 17 additions & 181 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,6 @@ Extension of [zkml](https://github.com/uiuc-kang-lab/zkml) for distributed provi
2121
- [Structure](#structure)
2222
- [Quick Start](#quick-start)
2323
- [Testing and CI](#testing-and-ci)
24-
- [General](#general)
25-
- [Testing on AWS GPU Instances](#testing-on-aws-gpu-instances)
2624
- [CI](#ci)
2725
- [References](#references)
2826

@@ -157,198 +155,36 @@ python3 tests/simple_distributed.py \
157155

158156
## Testing and CI
159157

160-
### General
158+
### Distributed Proving Test
161159

162-
#### Python Tests (pytest)
160+
Test the distributed proving pipeline:
163161

164-
##### Run All Tests
165162
```bash
166-
pytest tests/
167-
```
168-
169-
##### Run specific GPU and AWS tests
170-
```bash
171-
pytest tests/aws/gpu_test.py
172-
pytest tests/aws/gpu_test.py::test_aws_credentials
173-
```
174-
175-
#### Rust Tests (Cargo)
176-
177-
##### Run All Tests in zkml
178-
```bash
179-
cd zkml
180-
# Run only the test files (recommended)
181-
cargo test --test merkle_tree_test --test chunk_execution_test
182-
183-
# Note: Running `cargo test` without --test flags will try to compile examples,
184-
# some of which may have errors. Use --test flags to run specific tests.
185-
```
186-
187-
##### Run Specific Test File
188-
```bash
189-
cd zkml
190-
cargo test --test merkle_tree_test
191-
cargo test --test chunk_execution_test
192-
```
163+
# Simulation mode (fast, no real proofs)
164+
python tests/simple_distributed.py \
165+
--model zkml/examples/mnist/model.msgpack \
166+
--input zkml/examples/mnist/inp.msgpack \
167+
--layers 4 --workers 2
193168

194-
##### Run Tests with Output
195-
```bash
196-
cd zkml
197-
cargo test --test merkle_tree_test --test chunk_execution_test -- --nocapture
169+
# Real mode (generates actual ZK proofs)
170+
python tests/simple_distributed.py \
171+
--model zkml/examples/mnist/model.msgpack \
172+
--input zkml/examples/mnist/inp.msgpack \
173+
--layers 4 --workers 2 --real
198174
```
199175

200-
##### Run Tests for distributed-zkml Crate
201-
```bash
202-
# From distributed-zkml root
203-
cargo test
204-
```
176+
### Rust Tests
205177

206-
##### Check Compilation Only
207178
```bash
208179
cd zkml
209-
cargo check --lib
210-
```
211-
212-
Broken example files are moved to `zkml/examples/broken/` to prevent compilation errors. Use `--test` flags when running tests.
213-
214-
#### Test Files
215-
216-
##### Python Tests
217-
- `tests/aws/gpu_test.py` - AWS GPU tests
218-
- `test_aws_credentials()` - Check AWS credentials
219-
- `test_gpu_availability()` - Check GPU availability
220-
- `test_ray_setup()` - Test Ray cluster setup
221-
222-
##### Rust Tests
223-
- `zkml/testing/merkle_tree_test.rs` - Merkle tree tests
224-
- `test_merkle_single_value()` - Single value Merkle tree
225-
- `test_merkle_multiple_values()` - Multiple values Merkle tree
226-
- `test_merkle_root_verification()` - Root verification
227-
228-
- `zkml/testing/chunk_execution_test.rs` - Chunk execution tests
229-
- `test_chunk_execution_intermediate_values()` - Extract intermediate values
230-
- `test_chunk_execution_with_merkle()` - Chunk execution with Merkle
231-
- `test_multiple_chunks_consistency()` - Multiple chunks consistency
232-
233-
### Testing on AWS GPU Instances
234-
235-
#### Prerequisites
236-
237-
##### AWS Credentials
238-
239-
Set the following environment variables:
240180

241-
```bash
242-
export AWS_ACCESS_KEY_ID=your_access_key
243-
export AWS_SECRET_ACCESS_KEY=your_secret_key
244-
export AWS_SESSION_TOKEN=your_session_token
245-
```
246-
247-
##### AWS Resource Configuration
248-
249-
**Option 1: Automated Setup (Recommended)**
250-
251-
Run the setup script to automatically get/create resources:
252-
253-
```bash
254-
# Optional: Set custom resource names for auto-detection
255-
export AWS_KEY_NAME=your-key-name # Optional: for auto-detection
256-
export AWS_SECURITY_GROUP_NAME=your-sg-name # Optional: for auto-detection
257-
258-
# Run setup script (will prompt or create resources)
259-
./tests/aws/setup_aws_resources.sh
260-
261-
# Copy the export commands it shows, then set:
262-
export KEY_NAME=your-key-name # Required: from setup script
263-
export SECURITY_GROUP=sg-xxxxx # Required: from setup script
264-
```
265-
266-
**Option 2: Manual Configuration**
267-
268-
Set all required variables manually:
269-
270-
```bash
271-
# Required: AWS credentials
272-
export AWS_ACCESS_KEY_ID=your_access_key
273-
export AWS_SECRET_ACCESS_KEY=your_secret_key
274-
export AWS_SESSION_TOKEN=your_session_token
275-
276-
# Required: Resource identifiers
277-
export KEY_NAME=your-key-name # Your EC2 key pair name
278-
export SECURITY_GROUP=sg-xxxxx # Your security group ID
279-
280-
# Optional: Override defaults
281-
export AWS_REGION=us-west-2 # Default: us-west-2
282-
export INSTANCE_TYPE=g5.xlarge # Default: g5.xlarge
283-
export AMI_ID=ami-0076e7fffffc9251d # Default: Ubuntu 20.04, PyTorch 2.3.1
284-
```
285-
286-
##### GPU Instance
181+
# Run all tests
182+
cargo test --test merkle_tree_test --test chunk_execution_test --test test_merkle_root_public -- --nocapture
287183

288-
Launch an AWS instance with GPU support:
289-
- A100: `g5.xlarge` or larger (1x A100)
290-
- H100: `p5.48xlarge` (8x H100)
291-
292-
##### Dependencies
293-
294-
```bash
295-
# Install Rust (nightly)
296-
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
297-
rustup override set nightly
298-
299-
# Install Python dependencies
300-
pip install ray torch
301-
302-
# Install CUDA drivers (usually pre-installed on GPU instances)
303-
nvidia-smi # Verify GPU is available
304-
```
305-
306-
#### Test Suite
307-
308-
The test suite includes:
309-
310-
1. AWS Credentials Check: Validates required environment variables
311-
2. GPU Availability Check: Verifies GPU is accessible via `nvidia-smi`
312-
3. Ray Cluster Setup: Initializes Ray with GPU support
313-
4. Basic GPU Distribution: Tests task distribution across GPU workers
314-
5. Distributed Proving Simulation: Runs distributed proving with Merkle trees
315-
316-
#### Expected Output
317-
318-
```
319-
============================================================
320-
AWS GPU Tests for Distributed Proving
321-
============================================================
322-
INFO: AWS credentials found
323-
INFO: GPU detected
324-
INFO: Ray initialized with 1 GPU(s)
325-
326-
--- Running: Basic GPU Distribution ---
327-
INFO: Testing GPU distribution with 2 workers
328-
INFO: Completed 4 tasks
329-
INFO: Task 0: Worker 0, GPU 0, Time: 2.34ms
330-
...
331-
332-
--- Running: Distributed Proving Simulation ---
333-
INFO: Testing distributed proving simulation
334-
INFO: Distributed proving completed: 2 chunks
335-
INFO: Chunk 0: success
336-
INFO: Chunk 1: success
337-
338-
============================================================
339-
Test Summary
340-
============================================================
341-
Basic GPU Distribution: PASS
342-
Distributed Proving Simulation: PASS
184+
# Run specific test
185+
cargo test --test merkle_tree_test -- --nocapture
343186
```
344187

345-
#### Performance Notes
346-
347-
- A100: ~40GB VRAM, suitable for large models
348-
- H100: ~80GB VRAM, suitable for very large models
349-
- Ray automatically distributes tasks across available GPUs
350-
- Monitor GPU usage: `watch -n 1 nvidia-smi`
351-
352188
### CI
353189

354190
Lightweight CI runs on every PR to `main` and `dev`:

0 commit comments

Comments
 (0)