Skip to content

Commit 8f3eaf6

Browse files
committed
Add a README.md for the folder benchmarks
1 parent 05c0c84 commit 8f3eaf6

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed

benchmarks/README.md

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
## Benchmarks
2+
3+
This directory contains all the scripts and configuration files needed to reproduce the numerical results presented in the paper **Recovering Sparse DFT from Missing Signals via Interior Point Method on GPU**.
4+
5+
## Overview
6+
7+
Each script in this folder benchmarks different aspects of our GPU-accelerated interior-point solver and FFT implementations, comparing them against CPU-based references.
8+
9+
## Requirements
10+
11+
Ensure you have the appropriate hardware drivers installed (CUDA for NVIDIA GPUs, ROCm for AMD GPUs).
12+
13+
## Installation
14+
15+
1. Launch Julia with the project environment:
16+
```shell
17+
julia --project=.
18+
```
19+
2. Instantiate the environment:
20+
```julia
21+
using Pkg
22+
Pkg.instantiate()
23+
```
24+
25+
## Usage
26+
27+
To run a benchmark script, use one of the following commands:
28+
```shell
29+
julia --project=. -e 'include("benchmarks_cufft.jl")'
30+
julia --project=. -e 'include("benchmarks_rocfft.jl")'
31+
julia --project=. -e 'include("cpu_vs_gpu.jl")'
32+
julia --project=. -e 'include("crystal.jl")'
33+
```
34+
35+
## Scripts
36+
37+
- **benchmarks_cufft.jl**
38+
39+
Compares **cuFFT** (via CUDA.jl) against **FFTW** (via FFTW.jl) on problems of various sizes.
40+
Measures execution time for `fft` and `ifft` operations on random data.
41+
42+
- **benchmarks_rocfft.jl**
43+
44+
Compares **rocFFT** (via AMDGPU.jl) against **FFTW** (via FFTW.jl).
45+
Similar to the cuFFT benchmarks; results were not included in the final paper.
46+
47+
- **cpu_vs_gpu.jl**
48+
49+
Benchmarks our compressed sensing solver on CPU vs GPU across a range of problem sizes (artificial test cases).
50+
51+
- **crystal.jl**
52+
53+
Applies the same solver to a real-world problem of **104 million variables**, comparing CPU and GPU performance on a crystallographic dataset.
54+
55+
## Preferences
56+
57+
To enable unified memory by default on the GH200, create a file named `LocalPreferences.toml` in this directory with the following content:
58+
59+
```toml
60+
[CUDA]
61+
default_memory = "unified"
62+
```
63+
64+
## Acknowledgments
65+
66+
We thank [JLSE](https://www.jlse.anl.gov/) for providing access to the [NVIDIA GH200](https://www.jlse.anl.gov/nvidia-gh200) used in our experiments.

0 commit comments

Comments
 (0)