Version: 11.0 Contact: hsharma@anl.gov
MIDAS supports GPU-accelerated computation across all major analysis pipelines using NVIDIA CUDA. This guide covers building with GPU support, available GPU-accelerated executables, and usage.
GPU support requires the NVIDIA CUDA Toolkit (version 11.0 or later recommended).
mkdir build && cd build
cmake .. -DUSE_CUDA=ON
make -j$(nproc)To target specific GPU architectures:
cmake .. -DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="80;86;89;90"Common architecture values:
| Architecture | GPUs |
|---|---|
| 70 | V100 |
| 80 | A100, A30 |
| 86 | RTX 3090, A40 |
| 89 | RTX 4090, L40 |
| 90 | H100 |
The build system compiles the following CUDA targets when USE_CUDA=ON:
| Target | Module | Description |
|---|---|---|
IndexerGPU |
FF-HEDM | GPU-accelerated indexer |
FitPosOrStrainsGPU |
FF-HEDM | GPU-accelerated strain fitting |
IndexerScanningGPU |
PF-HEDM | GPU scanning-mode indexer |
FitOrStrainsScanningGPU |
PF-HEDM | GPU scanning-mode strain fitter |
FitOrientationGPU |
NF-HEDM | GPU orientation fitting |
IntegratorFitPeaksGPUStream |
Integration | GPU-accelerated radial integration with peak fitting |
MIDAS_TOMO_GPU |
Tomography | Separate GPU executable for gridrec reconstruction (not linked into MIDAS_TOMO) |
All CUDA targets are compiled with -Xcompiler=-fopenmp for hybrid GPU+OpenMP parallelism.
Enable GPU acceleration in the FF-HEDM pipeline:
python FF_HEDM/workflows/ff_MIDAS.py -paramFN params.txt -useGPU 1The -useGPU 1 flag routes indexing through IndexerGPU and strain fitting through FitPosOrStrainsGPU.
IndexerGPU implements a two-pass funnel screening approach:
- Pass 1 (coarse): Single-layer bitfield prefilter using a 32×32 tile occupancy grid (~1.5 MB, fits in L2 cache). Uses
__restrict__pointers,__ldgtexture loads, break-on-miss early termination, and loop unrolling. - Pass 2 (fine): Full multi-layer verification of Pass 1 candidates with post-filter diagonal approach.
FitPosOrStrainsGPU ports the NLOPT Nelder-Mead simplex algorithm to GPU, running per-grain refinement in parallel with device-side spot computation. Features dynamic spot reassignment and full strain tensor fitting.
To run only the screening pass (Phase 1) without refinement:
export MIDAS_SCREEN_ONLY=1
python FF_HEDM/workflows/ff_MIDAS.py -paramFN params.txt -useGPU 1export MIDAS_VERBOSE=1Enables per-voxel diagnostic output for debugging.
Enable GPU acceleration for scanning HEDM:
python FF_HEDM/workflows/pf_MIDAS.py -paramFN params.txt -useGPU 1IndexerScanningGPU supports three indexing modes:
- Spot-driven — with beam proximity filter for spatial awareness
- MicFile-seeded — seeded from previous reconstruction
- GrainsFile-seeded — seeded from Grains.csv
FitOrStrainsScanningGPU reads consolidated indexer output (IndexBest_all.bin, IndexKey_all.bin) and performs per-voxel Nelder-Mead refinement on GPU.
Both GPU executables use the consolidated binary I/O format, reducing filesystem overhead from ~30K+ small files to 3 binary files per scan.
Enable GPU-accelerated NF-HEDM orientation fitting:
python NF_HEDM/workflows/nf_MIDAS.py -paramFN params.txt -gpuFit 1FitOrientationGPU accelerates both screening (Phase 1: discrete orientation search) and fitting (Phase 2: Nelder-Mead continuous refinement).
Features:
- Shared GPU math library (
nf_gpu.h) with device functions for orientation matrix operations, diffraction spot calculation, and fractional overlap computation - Port of NLOPT Nelder-Mead algorithm to GPU for exact CPU/GPU parity
- Batch processing of multiple voxels and orientations
- Constant memory for HKL tables, global memory for large arrays
- Optional double-precision mode for exact numerical parity
The -gpuFit flag works with both single-resolution (nf_MIDAS.py) and multi-resolution (nf_MIDAS_Multiple_Resolutions.py) workflows.
The GPU integrator provides real-time radial integration with peak fitting:
python FF_HEDM/workflows/integrator_batch_process.py -paramFN params.txtIntegratorFitPeaksGPUStream features:
- Socket-based architecture for continuous data streaming
- 4 CUDA streams for overlapped computation
- Warp shuffle reductions for efficient summation
- GSAS-II area-normalized pseudo-Voigt peak fitting
- Integration with
live_viewer.pyfor real-time visualization
Supports both folder-based file input and PVA (Process Variable Access) streaming from EPICS.
See FF_Radial_Integration.md for full documentation.
GPU-accelerated gridrec tomographic reconstruction is available as a separate executable, MIDAS_TOMO_GPU.
From the command line:
MIDAS_TOMO_GPU configFN numCPUs --gpu [--fftw-bridge]--gpu— enables GPU reconstruction.--fftw-bridge— forces CPU FFTW for FFTs (with GPU-CPU data transfers around each call), producing byte-identical output to the CPU-only path at the cost of slower execution.
From Python:
from TOMO.midas_tomo_python import reconstruct
reconstruct(..., useGPU=True, fftwBridge=False)If MIDAS_TOMO_GPU is not found, the workflow falls back to MIDAS_TOMO (CPU) automatically.
- Multi-pair batched reconstruction with dynamic batch sizing (capped at 50 pairs to limit pinned memory)
- Double-buffered pipeline with pthread overlap for compute/transfer
- 3-stream CUDA overlap for kernel execution
- Pinned memory for efficient host-device transfers
- OMP-parallel sinogram reads for GPU batch dispatch
- Pre-allocated per-thread scratch buffers
- mmap-based sinogram input for zero-copy parallel reads (both CPU and GPU paths)
- GPU-side Pad + reconCentering + getRecons kernels
- Stripe artifact removal on GPU path (Vo et al. 2018 algorithms)
See Tomography_Reconstruction.md for full documentation.
By default, GPU computations use single precision (float32) for performance. For applications requiring higher precision:
export MIDAS_GPU_DOUBLE=1This enables double-precision computation in the GPU kernels. The performance impact depends on the GPU architecture — consumer GPUs (RTX series) have significantly reduced double-precision throughput compared to data-center GPUs (A100, H100).
Double precision has been verified to achieve exact parity with CPU results across all GPU-accelerated modules.
| Variable | Description |
|---|---|
MIDAS_GPU_DOUBLE=1 |
Enable double-precision GPU computation |
MIDAS_GPU_FIT=1 |
Enable GPU Phase 2 (fitting) — used internally |
MIDAS_SCREEN_ONLY=1 |
Run only Phase 1 screening, skip fitting |
MIDAS_VERBOSE=1 |
Enable per-voxel diagnostic output |
| Flag | Pipeline | Description |
|---|---|---|
-useGPU 1 |
FF-HEDM, PF-HEDM | Route indexing and fitting through GPU executables |
-gpuFit 1 |
NF-HEDM | Enable GPU orientation fitting (screening + refinement) |
--gpu |
Tomography | Enable GPU reconstruction in MIDAS_TOMO_GPU |
--fftw-bridge |
Tomography | Use CPU FFTW for byte-identical output to CPU path (requires --gpu) |
- GPU acceleration provides the largest speedup for NF-HEDM (thousands of voxels × thousands of orientations) and PF/scanning HEDM (many scan positions)
- FF-HEDM GPU indexing benefits from large grid sizes and many diffraction rings
- The GPU integrator is optimized for real-time streaming use cases
- Tomography GPU acceleration scales with the number of sinogram pairs and reconstruction size
- Memory usage: GPU executables pre-allocate scratch buffers and use pinned memory for efficient transfers
- All GPU modules maintain full CPU/GPU parity — results are identical (within floating-point precision for float32 mode, exact for float64 mode)
MIDAS includes benchmark and parity tests for GPU modules:
# NF-HEDM GPU parity
python tests/test_nf_hedm.py -nCPUs 4 --gpu-fit
# Tomography GPU vs CPU parity
python tests/test_tomo_parity.py --phantom-size 256 --plot
# PF-HEDM GPU
python tests/test_pf_hedm.py -nCPUs 4 -useGPUAnalysis scripts for parity debugging are in NF_HEDM/Example/:
analyze_mismatches.py— per-voxel misorientation comparison with--allflagparity_maps.py— spatial confidence diff and misorientation maps
- FF_Analysis.md — FF-HEDM analysis pipeline
- NF_Analysis.md — NF-HEDM reconstruction
- PF_Analysis.md — PF/scanning HEDM analysis
- FF_Radial_Integration.md — Radial integration with GPU streaming
- Tomography_Reconstruction.md — Tomographic reconstruction
- README.md — MIDAS manual index
If you encounter any issues or have questions, please open an issue on this repository.