This code demonstrates a usage of cuBLAS Cherk3mEx function to perform a Hermitian rank-k update
A = | 1.1 + 1.2j | 2.3 + 2.4j |
| 3.5 + 3.6j | 4.7 + 4.8j |
C = | 1.1 + 1.2j | 2.3 + 2.4j |
| 3.5 + 3.6j | 4.7 + 4.8j |
This function is an extension of cublasCherk where the input matrix and output matrix can have a lower precision but the computation is still done in the type cuComplex. This routine is implemented using the Gauss complexity reduction algorithm which can lead to an increase in performance up to 25%
See documentation for further details.
All GPUs supported by CUDA Toolkit (https://developer.nvidia.com/cuda-gpus)
Linux
Windows
x86_64
ppc64le
arm64-sbsa
- A Linux/Windows system with recent NVIDIA drivers.
- CMake version 3.18 minimum
$ mkdir build
$ cd build
$ cmake ..
$ make
Make sure that CMake finds expected CUDA Toolkit. If that is not the case you can add argument -DCMAKE_CUDA_COMPILER=/path/to/cuda/bin/nvcc to cmake command.
$ mkdir build
$ cd build
$ cmake -DCMAKE_GENERATOR_PLATFORM=x64 ..
$ Open cublas_examples.sln project in Visual Studio and build
$ ./cublas_Cherk3mEx_example
Sample example output:
A
1.10 + 1.20j 2.30 + 2.40j
3.50 + 3.60j 4.70 + 4.80j
=====
C
13.70 + 0.00j 30.50 + 0.48j
0.00 + 0.00j 70.34 + 0.00j
=====