|
| 1 | +## Benchmarks |
| 2 | + |
| 3 | +This directory contains all the scripts and configuration files needed to reproduce the numerical results presented in the paper **Recovering Sparse DFT from Missing Signals via Interior Point Method on GPU**. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Each script in this folder benchmarks different aspects of our GPU-accelerated interior-point solver and FFT implementations, comparing them against CPU-based references. |
| 8 | + |
| 9 | +## Requirements |
| 10 | + |
| 11 | +Ensure you have the appropriate hardware drivers installed (CUDA for NVIDIA GPUs, ROCm for AMD GPUs). |
| 12 | + |
| 13 | +## Installation |
| 14 | + |
| 15 | +1. Launch Julia with the project environment: |
| 16 | +```shell |
| 17 | +julia --project=. |
| 18 | +``` |
| 19 | +2. Instantiate the environment: |
| 20 | +```julia |
| 21 | +using Pkg |
| 22 | +Pkg.instantiate() |
| 23 | +``` |
| 24 | + |
| 25 | +## Usage |
| 26 | + |
| 27 | +To run a benchmark script, use one of the following commands: |
| 28 | +```shell |
| 29 | +julia --project=. -e 'include("benchmarks_cufft.jl")' |
| 30 | +julia --project=. -e 'include("benchmarks_rocfft.jl")' |
| 31 | +julia --project=. -e 'include("cpu_vs_gpu.jl")' |
| 32 | +julia --project=. -e 'include("crystal.jl")' |
| 33 | +``` |
| 34 | + |
| 35 | +## Scripts |
| 36 | + |
| 37 | +- **benchmarks_cufft.jl** |
| 38 | + |
| 39 | +Compares **cuFFT** (via CUDA.jl) against **FFTW** (via FFTW.jl) on problems of various sizes. |
| 40 | +Measures execution time for `fft` and `ifft` operations on random data. |
| 41 | + |
| 42 | +- **benchmarks_rocfft.jl** |
| 43 | + |
| 44 | +Compares **rocFFT** (via AMDGPU.jl) against **FFTW** (via FFTW.jl). |
| 45 | +Similar to the cuFFT benchmarks; results were not included in the final paper. |
| 46 | + |
| 47 | +- **cpu_vs_gpu.jl** |
| 48 | + |
| 49 | + Benchmarks our compressed sensing solver on CPU vs GPU across a range of problem sizes (artificial test cases). |
| 50 | + |
| 51 | +- **crystal.jl** |
| 52 | + |
| 53 | +Applies the same solver to a real-world problem of **104 million variables**, comparing CPU and GPU performance on a crystallographic dataset. |
| 54 | + |
| 55 | +## Preferences |
| 56 | + |
| 57 | +To enable unified memory by default on the GH200, create a file named `LocalPreferences.toml` in this directory with the following content: |
| 58 | + |
| 59 | +```toml |
| 60 | +[CUDA] |
| 61 | +default_memory = "unified" |
| 62 | +``` |
| 63 | + |
| 64 | +## Acknowledgments |
| 65 | + |
| 66 | +We thank [JLSE](https://www.jlse.anl.gov/) for providing access to the [NVIDIA GH200](https://www.jlse.anl.gov/nvidia-gh200) used in our experiments. |
0 commit comments