Rearchitect and Reinstate GPU #114

onurulgen · 2024-08-29T13:05:59Z

Reinstated GPU into the project by fixing or implementing missing functions.
Rearchitected the project to combine GPU and CPU implementations by using virtual functions.
Increased the performance up to 11 times by using GPU.
Implemented unit and regression tests for GPU and CPU by using Catch2 framework.
Developed GitHub Actions for tests, coverage, static code analysis, and producing CPU and GPU executables for Windows, Linux and macOS.

…ta in bytes

…ient() of *Compute classes

…dient()

…Gradient() to handle optimise* parameters

…pe()

- Ditch old texture objects and use up-to-date ones - Make texture objects managed - Ditch CUDA symbols and pass them as kernel function parameters - Extend reg_updateControlPointPosition_gpu() to handle optimise* parameters

- Rename CudaContextSingleton as CudaContext, and move it into NiftyReg namespace - Rename NiftyReg_CudaBlock100 as BlockSize, and move it into NiftyReg namespace - Move BlockSize implementation into the header - Change the type of BlockSize members as unsigned - Move BlockSize instance into CudaContext - Use unsigned instead of size_t in CUDA kernels - Initialise the CUDA or OpenCL device in Platform's constructor - Rename `unsigned int`s as `unsigned`

…est() for numeric limits

… magnitudes for complex numbers

… double> instead of using output parameters

… NiftiImageData::ConcreteTypeHandler

…y using std::minmax_element

…ieldFromFlowField

…emove deprecated reg_tools_getMinValue/MaxValue functions

…mFlowField()

…tion

.github/workflows/analysis.yml

+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Install Cppcheck
+        run: |
+          sudo apt-get update && sudo apt-get install -y libpcre3-dev
+          git clone -b 2.13.x https://github.com/danmar/cppcheck.git
+          cd cppcheck
+          # Disable color output of cppcheck
+          sed -i 's/ *bool *gDisableColors *= *false;/bool gDisableColors = true;/' lib/color.cpp
+          sudo make -j4 MATCHCOMPILER=yes FILESDIR=/usr/share/cppcheck HAVE_RULES=yes CXXFLAGS="-O2 -DNDEBUG -Wall -Wno-sign-compare -Wno-unused-function" install
+
+      - name: Install Python dependencies
+        run: pip3 install --upgrade setuptools urllib3 chardet pyOpenSSL pygithub
+
+      - name: Install CUDA Toolkit
+        uses: Jimver/cuda-toolkit@master
+        with:
+          cuda: '11.8.0'
+          method: network
+          use-github-cache: false
+          use-local-cache: false
+
+      - name: Configure NiftyReg
+        run: |
+          mkdir build && cd build
+          cmake -DCMAKE_C_COMPILER=gcc \
+                -DCMAKE_CXX_COMPILER=g++ \
+                -DCMAKE_BUILD_TYPE=Debug \
+                -DBUILD_ALL_DEP=ON \
+                -DCHECK_GPU=OFF \
+                -DUSE_CUDA=ON \
+                -DUSE_OPENCL=ON \
+                -DUSE_SSE=ON \
+                -DUSE_OPENMP=ON \
+                -DBUILD_TESTING=OFF \
+                -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
+                ..
+
+      - name: Code Analysis
+        env:
+            COMMENT_TITLE: Code Analysis Results
+            GITHUB_TOKEN: ${{ github.token }}
+            PR_NUMBER: ${{ github.event.pull_request.number }}
+            REPO: ${{ github.repository }}
+            REPORT_PR_CHANGES_ONLY: false
+        run: |
+            analysis_file="analysis.txt"
+            cppcheck_params="--enable=warning --check-level=exhaustive --inline-suppr --suppress=internalError --suppress=internalAstError --suppress=*:*third-party/Eigen/*"
+            cppcheck -j4 $cppcheck_params --project=$(pwd)/build/compile_commands.json --output-file=$analysis_file
+            # Since cppcheck does not support OpenCL and CUDA, we need to check these files separately
+            find $(pwd)/reg-lib/cl/. -name "*.cl" -print0 | while IFS= read -r -d '' file; do cppcheck "$file" $cppcheck_params --language=c++ 2>> $analysis_file; done
+            find $(pwd)/reg-lib/cuda/. -name "*.cu" -print0 | while IFS= read -r -d '' file; do cppcheck "$file" $cppcheck_params --language=c++ 2>> $analysis_file; done
+            python3 .github/code_analysis.py -cc $analysis_file -o ${{ github.event_name == 'push' }} -fk false


To fix the issue, we will add a permissions block at the root level of the workflow. This block will restrict permissions to contents: read, which is sufficient for most workflows that do not require write access. If the workflow requires additional permissions in the future, they can be added explicitly and scoped to specific jobs.

The permissions block will be added immediately after the name field (line 1) to apply to all jobs in the workflow.

.github/workflows/coverage.yml

+    runs-on: ubuntu-22.04-gpu
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Install dependencies
+        run: sudo apt-get update && sudo apt-get install -y cmake git lcov
+
+      - name: Install Catch2
+        run:  |
+          git clone https://github.com/catchorg/Catch2.git
+          cd Catch2
+          cmake -Bbuild -H. -DBUILD_TESTING=OFF
+          sudo cmake --build build/ --target install --config Debug
+
+      - name: Configure NiftyReg
+        run: |
+          mkdir build && cd build
+          cmake -DCMAKE_C_COMPILER=gcc \
+                -DCMAKE_CXX_COMPILER=g++ \
+                -DCMAKE_BUILD_TYPE=Debug \
+                -DBUILD_ALL_DEP=ON \
+                -DUSE_CUDA=ON \
+                -DUSE_OPENCL=OFF \
+                -DUSE_SSE=OFF \
+                -DUSE_OPENMP=OFF \
+                -DBUILD_TESTING=ON \
+                -DWITH_COVERAGE=ON \
+                ..
+
+      - name: Build NiftyReg
+        run: cmake --build build --config Debug
+
+      - name: Run tests
+        working-directory: build
+        run: ctest -V
+
+      - name: Coverage
+        working-directory: build
+        run: make coverage
+
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v3
+        with:
+          directory: build
+        env:
+          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}


To fix the issue, we will add a permissions block at the root level of the workflow file. This block will specify the minimum permissions required for the workflow to function correctly. Based on the current workflow steps, the workflow only needs to read repository contents and upload coverage reports to Codecov. Therefore, we will set contents: read and id-token: write (required for the Codecov action). This ensures that no unnecessary write permissions are granted.

.github/workflows/release.yml

.github/workflows/tests.yml

+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+        include:
+          - sudo: "sudo"
+            c-compiler: "gcc"
+            cxx-compiler: "g++"
+            cmake-extra: ""
+            build-all-dep: "ON"
+            build-type: "Debug"
+            use-sse: "ON"
+          - os: macos-latest
+            build-all-dep: "OFF"
+            use-sse: "OFF"
+          - os: windows-latest
+            sudo: ""
+            c-compiler: "clang-cl"
+            cxx-compiler: "clang-cl"
+            cmake-extra: "-G Ninja"
+
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Install Catch2
+        shell: bash
+        run:  |
+          git clone https://github.com/catchorg/Catch2.git
+          cd Catch2
+          cmake -Bbuild -H. -DBUILD_TESTING=OFF
+          ${{ matrix.sudo }} cmake --build build/ --target install --config ${{ matrix.build-type }}
+
+      - name: Configure NiftyReg
+        shell: bash
+        run: |
+          mkdir build && cd build
+          cmake -DCMAKE_C_COMPILER=${{ matrix.c-compiler }} \
+                -DCMAKE_CXX_COMPILER=${{ matrix.cxx-compiler }} \
+                -DCMAKE_BUILD_TYPE=${{ matrix.build-type }} \
+                -DBUILD_ALL_DEP=${{ matrix.build-all-dep }} \
+                -DUSE_CUDA=OFF \
+                -DUSE_OPENCL=OFF \
+                -DUSE_SSE=${{ matrix.use-sse }} \
+                -DUSE_OPENMP=ON \
+                -DBUILD_TESTING=ON \
+                ${{ matrix.cmake-extra }} \
+                ..
+
+      - name: Build NiftyReg
+        shell: bash
+        run: cmake --build build --config ${{ matrix.build-type }}
+
+      - name: Run tests
+        shell: bash
+        working-directory: build
+        run: ctest -V


To fix the issue, we will add a permissions block at the root of the workflow file. This block will specify the minimal permissions required for the workflow to function. Since the workflow only checks out the repository and runs tests, it likely only needs contents: read permissions. This ensures that the workflow cannot perform any unintended write operations.

The permissions block will be added at the top level of the workflow, just below the name field, so it applies to all jobs in the workflow.

kaczmarj · 2025-10-30T00:47:18Z

hi @onurulgen - thank you for your tremendous effort in getting CUDA support back into niftyreg for a variety of realignment methods.

is this branch OK to try out? i am currently using version 1.3.9 from 2012 for CUDA-enabled reg_f3d.

onurulgen · 2025-10-30T12:12:51Z

hi @onurulgen - thank you for your tremendous effort in getting CUDA support back into niftyreg for a variety of realignment methods.

is this branch OK to try out? i am currently using version 1.3.9 from 2012 for CUDA-enabled reg_f3d.

Yes, it's stable, you may go ahead and test it. I'd like to hear how it goes as well. Could you please give some feedback if you can?

Switches deprecated Windows runner from 2019 to 2022. Updates CMake flags for Windows and macOS builds to fix compilation errors.

.github/workflows/release.yml

+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-22.04, macos-13, macos-14, windows-2022]
+        platform: [cpu, cuda]
+        include:
+          - sudo: "sudo"
+            build-all-dep: "ON"
+            build-type: "Release"
+            cmake-extra: ""
+            use-sse: "ON"
+          - platform: cpu
+            platform-name: ""
+            use-cuda: "OFF"
+          - platform: cuda
+            platform-name: "-CUDA"
+            use-cuda: "ON"
+          - os: ubuntu-22.04
+            os-name: "Ubuntu"
+            c-compiler: "gcc-12"
+            cxx-compiler: "g++-12"
+          - os: macos-13
+            os-name: "macOS"
+            platform-name: "-Intel"
+            c-compiler: "gcc-14"
+            cxx-compiler: "g++-14"
+            build-all-dep: "OFF"
+          - os: macos-14
+            os-name: "macOS"
+            c-compiler: "gcc-14"
+            cxx-compiler: "g++-14"
+            cmake-extra: "-DCMAKE_C_FLAGS=-D__FLT_EVAL_METHOD__=0"
+            build-all-dep: "OFF"
+            use-sse: "OFF"
+          - os: windows-2022
+            os-name: "Windows"
+            sudo: ""
+            c-compiler: "clang-cl"
+            cxx-compiler: "clang-cl"
+            cmake-extra: "-DCMAKE_CUDA_FLAGS=\"-allow-unsupported-compiler -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH\" -G Ninja"
+        exclude:
+          - os: macos-13
+            platform: cuda
+          - os: macos-14
+            platform: cuda
+
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Setup MSVC environment
+        uses: ilammy/msvc-dev-cmd@v1
+        if: matrix.os-name == 'Windows'
+
+      - name: Install CUDA Toolkit
+        uses: Jimver/cuda-toolkit@master
+        if: matrix.platform == 'cuda'
+        with:
+          cuda: '11.8.0'
+          method: network
+          use-github-cache: false
+          use-local-cache: false
+
+      - name: Configure NiftyReg
+        shell: bash
+        run: |
+          mkdir build && cd build
+          cmake -DCMAKE_C_COMPILER=${{ matrix.c-compiler }} \
+                -DCMAKE_CXX_COMPILER=${{ matrix.cxx-compiler }} \
+                -DCMAKE_BUILD_TYPE=${{ matrix.build-type }} \
+                -DBUILD_ALL_DEP=${{ matrix.build-all-dep }} \
+                -DCHECK_GPU=OFF \
+                -DUSE_CUDA=${{ matrix.use-cuda }} \
+                -DUSE_OPENCL=OFF \
+                -DUSE_SSE=${{ matrix.use-sse }} \
+                -DUSE_OPENMP=ON \
+                -DBUILD_TESTING=OFF \
+                ${{ matrix.cmake-extra }} \
+                ..
+
+      - name: Build NiftyReg
+        shell: bash
+        run: cmake --build build --config ${{ matrix.build-type }}
+
+      - name: Prepare the variables
+        id: vars
+        shell: bash
+        run: echo "output-folder=NiftyReg-${{ matrix.os-name }}${{ matrix.platform-name }}-${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
+
+      - name: Prepare the package
+        if: matrix.os-name == 'Windows'
+        shell: powershell
+        working-directory: build/reg-apps
+        run: |
+          New-Item -ItemType Directory -Force -Path ${{ steps.vars.outputs.output-folder }}
+          Move-Item -Path *.exe -Destination ${{ steps.vars.outputs.output-folder }}
+          Copy-Item -Path "C:/Program Files/LLVM/bin/libomp.dll" -Destination ${{ steps.vars.outputs.output-folder }}/libomp140.x86_64.dll
+          Compress-Archive -Path ${{ steps.vars.outputs.output-folder }} -DestinationPath ../NiftyReg.zip
+
+      - name: Prepare the package
+        if: matrix.os-name == 'Ubuntu'
+        working-directory: build/reg-apps
+        run: |
+          mkdir -p ${{ steps.vars.outputs.output-folder }}
+          find . -maxdepth 1 -type f -executable -exec mv {} ${{ steps.vars.outputs.output-folder }} \;
+          zip -r ../NiftyReg.zip ${{ steps.vars.outputs.output-folder }}
+
+      - name: Prepare the package
+        if: matrix.os-name == 'macOS'
+        working-directory: build/reg-apps
+        run: |
+          mkdir -p ${{ steps.vars.outputs.output-folder }}
+          find . -maxdepth 1 -type f -perm +111 -exec mv {} ${{ steps.vars.outputs.output-folder }} \;
+          zip -r ../NiftyReg.zip ${{ steps.vars.outputs.output-folder }}
+
+      - name: Upload the package
+        uses: svenstaro/upload-release-action@v2
+        with:
+          repo_token: ${{ github.token }}
+          file: build/NiftyReg.zip
+          asset_name: ${{ steps.vars.outputs.output-folder }}.zip


To fix the problem, add a permissions section to the workflow. This can be done either globally at the root level, or per job (here, just the build job). For maximal clarity and future-proofing, it's best to add at the root, unless different jobs require different permissions. The workflow uploads release assets but does not modify repository contents; thus, the minimal required permissions should be contents: read, which allows reading release info but not writing to the repository. The upload-release-action uses repo_token—this is only for uploading assets to a release, which needs contents: read for release info, and requires contents: write to upload assets. To function correctly, contents: write may be needed; but to adhere to least privilege, we can set contents: write. So, add a block:

permissions: contents: write

directly under the name: Release line (before on:).

kaczmarj · 2025-11-22T22:01:55Z

@onurulgen - niftyreg v2.0.0 is working great so far. thanks again for all of your work.

i compiled it in a docker container. i copied the dockerfile below if anyone is interested. i got errors with CUDA 12.x so i stuck with 11.8

FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 AS builder

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
        build-essential \
        cmake \
        git \
        ninja-build \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /work
RUN git clone https://github.com/KCL-BMEIS/niftyreg . \
    && git checkout v2.0.0

RUN cmake -S . -B build -G "Ninja" \
        -DCMAKE_BUILD_TYPE=Release \
        -DBUILD_ALL_DEP=ON \
        -DUSE_CUDA=ON \
        -DCHECK_GPU=OFF \
        -DUSE_OPENMP=ON \
        -DUSE_SSE=ON \
        -DCMAKE_INSTALL_PREFIX=/opt/niftyreg \
        -DCMAKE_INSTALL_RPATH="\$ORIGIN/../lib" \
        # Support many different NVIDIA GPUs.
        -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;86-real;89-real;90" \
    && cmake --build build --parallel \
    && cmake --install build

FROM nvcr.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04
COPY --from=builder /opt/niftyreg /opt/niftyreg
ENV PATH="/opt/niftyreg/bin:${PATH}"
WORKDIR /work

i built with this (i'm on an arm-based macbook)

docker buildx build --platform=linux/amd64 --load -t kaczmarj/niftyreg .

then I got the binaries out with this:

docker run --rm \
  -v "$(pwd)":/out \
  kaczmarj/niftyreg \
  bash -c "cd /opt && tar -czf /out/niftyreg.tar.gz niftyreg"

onurulgen · 2025-11-24T12:49:36Z

Thank you! You may use precompiled binaries; and if you'd like to compile yourself, you may follow release.yml file. The code is compatible up to CUDA 12.4.x. It might've failed if you tried later versions.

kaczmarj · 2025-11-24T13:51:04Z

fantastic, i will use the precompiled binaries in that case. thanks kindly

onurulgen added 30 commits March 14, 2023 21:26

Add NiftiImage::realloc() to reallocate the image data

751f447

Add NiftiImage::setDim() to set a dimension of the image

19883fb

Add ability to NiftiImage for copying only image info

d13cf2d

Update tests to leverage new abilities of NiftiImage

5f92c68

Add an enum for NiftiImage dimensions

4947c2e

Small fixes

d087265

Add ability to NiftiImageData for extracting volume data

379c8f9

Handle optimise* variables in Compute::NormaliseGradient()

4061036

Refactorisations

b226687

Add NiftiImage::copyData() to copy the pixel data from another image

e9f5eaf

Extend NiftiImage constructor to optionally allocate image data

ade981c

Add NiftiImage::totalBytes() to return the total size of the image da…

961700a

…ta in bytes

Add move constructor and assignment operator to NiftiImageData

53ad3a4

Remove nodeNumber parameter from GetMaximalLength() and NormaliseGrad…

f3ba1f5

…ient() of *Compute classes

Fix possible bugs causing accessing freed memory

3592c61

Add tests for *Compute::GetMaximalLength() and *Compute::NormaliseGra…

21ed730

…dient()

Fix a bug in tests causing wrong voxel count calculation

1d50702

Fix HAVE_ZLIB definition

c597115

Refactor CMakeLists

c5dbb42

Refactorisations

7deef0b

Add NormaliseGradient() for CUDA and refactor GetMaximalLength()

26d195b

Extend tests for *Compute::GetMaximalLength() and *Compute::Normalise…

516b5ab

…Gradient() to handle optimise* parameters

Move platform and measure initialisation into reg_base::SetPlatformTy…

72b1874

…pe()

Refactor NR_CUDA_SAFE_CALL() and NR_CUDA_CHECK_KERNEL()

b58b3c8

Refactorisations

84afbe4

Modernise CUDA conjugate gradient functions

a6401b9

- Ditch old texture objects and use up-to-date ones - Make texture objects managed - Ditch CUDA symbols and pass them as kernel function parameters - Extend reg_updateControlPointPosition_gpu() to handle optimise* parameters

Remove unnecessary testing functions

66e48c8

Add tests for conjugate gradient

a4ae032

Generate test data only once for NormaliseGradientTest

9db2187

onurulgen added 17 commits May 13, 2025 14:51

Fix min calculation in NiftiImageData::ConcreteTypeHandler to use low…

50a85b9

…est() for numeric limits

Fix min/max calculation in NiftiImageData::ConcreteTypeHandler to use…

614bb4b

… magnitudes for complex numbers

Refactor minmax methods in NiftiImageData to return std::pair<double,…

7ebf3c5

… double> instead of using output parameters

Remove internal lessThan function and simplify min/max comparisons in…

6de3625

… NiftiImageData::ConcreteTypeHandler

Remove unused string variable in main function of reg_f3d

eb623dd

Handle scaling in NiftiImageData::minmax()

5c03616

Simplify minmax calculations in NiftiImageData::ConcreteTypeHandler b…

a9b1a70

…y using std::minmax_element

Refactor squaring number calculations in reg_defField_getDeformationF…

7bfdff9

…ieldFromFlowField

Refactor min/max value retrieval to use NiftiImageData.minmax() and r…

a3496a8

…emove deprecated reg_tools_getMinValue/MaxValue functions

Add .cache to .gitignore

155fe0a

Refactor squaring number calculations in Cuda::GetDeformationFieldFro…

d13d9ae

…mFlowField()

Refactor min/max value retrieval to use a unified GetMinMaxValue func…

414baed

…tion

Add VSCode Dev Containers to ease development

369f409

Add .clang-format configuration for consistent code styling

f5405b4

Update versioning to use Git tags

8b0e117

Update GitHub Actions workflows

f3da971

Fix Cppcheck installation in Code Analysis workflow

1b5069e

github-advanced-security bot found potential problems May 23, 2025

View reviewed changes

Fix runner label in Coverage workflow

dbce3ca

onurulgen force-pushed the rearchitect-and-reinstate-gpu branch from 7c3d2a2 to dbce3ca Compare May 23, 2025 15:16

onurulgen added 2 commits November 20, 2025 14:15

Fix compilation errors

61993a3

Update release workflow for Windows and CMake flags

290e41d

Switches deprecated Windows runner from 2019 to 2022. Updates CMake flags for Windows and macOS builds to fix compilation errors.

github-advanced-security bot found potential problems Nov 20, 2025

View reviewed changes

onurulgen merged commit ee6a92d into master Nov 21, 2025
14 of 15 checks passed

onurulgen deleted the rearchitect-and-reinstate-gpu branch November 21, 2025 09:59

@@ -1,2 +1,4 @@
             name: Tests
+            permissions:
+              contents: read
             on: [push, pull_request]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rearchitect and Reinstate GPU #114

Rearchitect and Reinstate GPU #114

Uh oh!

onurulgen commented Aug 29, 2024

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Uh oh!

Check warning

Copilot Autofix

kaczmarj commented Oct 30, 2025 •

edited

Loading

Uh oh!

onurulgen commented Oct 30, 2025

Uh oh!

Check warning

Copilot Autofix

Uh oh!

kaczmarj commented Nov 22, 2025 •

edited

Loading

Uh oh!

onurulgen commented Nov 24, 2025

Uh oh!

kaczmarj commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

@@ -1,4 +1,6 @@
             name: Release
+            permissions:
+              contents: write
             on:
               release:

Rearchitect and Reinstate GPU #114

Rearchitect and Reinstate GPU #114

Uh oh!

Conversation

onurulgen commented Aug 29, 2024

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Uh oh!

Check warning

Copilot Autofix

kaczmarj commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

onurulgen commented Oct 30, 2025

Uh oh!

Check warning

Copilot Autofix

Uh oh!

kaczmarj commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

onurulgen commented Nov 24, 2025

Uh oh!

kaczmarj commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kaczmarj commented Oct 30, 2025 •

edited

Loading

kaczmarj commented Nov 22, 2025 •

edited

Loading