Name	Name	Last commit message	Last commit date
parent directory ..
01_gemm_introduction	01_gemm_introduction
02_gemm_precisions	02_gemm_precisions
03_gemm_complex	03_gemm_complex
04_gemm_blockdim	04_gemm_blockdim
05_gemm_batched	05_gemm_batched
06_gemm_leading_dimension	06_gemm_leading_dimension
07_gemm_transform	07_gemm_transform
08_gemm_decoupled_io_and_compute	08_gemm_decoupled_io_and_compute
09_gemm_custom_layout	09_gemm_custom_layout
10_gemm_block_performance	10_gemm_block_performance
11_gemm_device_performance	11_gemm_device_performance
12_gemm_device_partial_sums	12_gemm_device_partial_sums
13_gemm_fft	13_gemm_fft
14_gemm_fused	14_gemm_fused
15_gemm_nvrtc	15_gemm_nvrtc
16_dgemm_emulation	16_dgemm_emulation
common	common
reference	reference
.gitignore	.gitignore
CMakeLists.txt	CMakeLists.txt
README.md	README.md

Name

Last commit message

Last commit date

06_gemm_leading_dimension

07_gemm_transform

08_gemm_decoupled_io_and_compute

09_gemm_custom_layout

10_gemm_block_performance

11_gemm_device_performance

12_gemm_device_partial_sums

cuBLASDx Library - API Examples

All examples, including more advanced ones, are shipped within cuBLASDx package.

Description

This folder demonstrates cuBLASDx APIs usage.

Requirements

cuBLASDx/MathDx package
See cuBLASDx requirements
CMake 3.18 or newer
Linux system with installed NVIDIA drivers
NVIDIA GPU of Volta (SM70) or newer architecture

Build

You may specify CUBLASDX_CUDA_ARCHITECTURES to limit CUDA architectures used for compilation (see CMake:CUDA_ARCHITECTURES)
mathdx_ROOT - path to mathDx package (XX.Y - version of the package)

mkdir build && cd build
cmake -DCUBLASDX_CUDA_ARCHITECTURES=70-real -Dmathdx_ROOT=/opt/nvidia/mathdx/XX.Y ..
make
// Run
ctest

Examples

For the detailed descriptions of the examples please visit Examples section of the cuBLASDx documentation.

Group	Subgroup	Example	Description
Introduction Examples		01_introduction_example	cuBLASDx API introduction example
		01_introduction_pipeline	cuBLASDx Pipeline API introduction example
Simple GEMM Examples	Basic Example	02_simple_gemm_fp32	Performs fp32 GEMM
		02_simple_gemm_int8_int8_int32	Performs integral GEMM using Tensor Cores
		02_simple_gemm_fp8	Performs fp8 GEMM
		02_simple_gemm_mixed_precision	Performs a mixed precision GEMM
		03_simple_gemm_cfp16	Performs complex fp16 GEMM
		03_simple_gemm_std_complex_fp32	Performs GEMM with cuda::std::complex as data type
		04_blockdim_gemm_fp16	BLAS execution with different block dimensions
	Other	05_batched_gemm_fp64	Manual batching in a single CUDA block
		06_simple_gemm_leading_dimensions	Performs GEMM with non-default leading dimensions
		07_simple_gemm_transform	Performs GEMM with custom load and store operators
		08_simple_gemm_fp32_decoupled	Performs fp32 GEMM using 16-bit input type to save on storage and transfers
		09_simple_gemm_custom_layout	Performs GEMM with a custom user provided CuTe layout
		09_simple_gemm_aat	Performs GEMM where C = A * A^T
GEMM Performance		10_single_gemm_performance	Benchmark for single GEMM
		11_device_gemm_performance	Benchmark entire device GEMMs using cuBLASDx for single tile
Advanced Examples		12_gemm_device_partial_sums	Enhance GEMM precision by performing higher precision partial sum accumulation
		14_fused_gemm_performance	Benchmark for 2 GEMMs fused into a single kernel
Advanced Examples	Fusion	14_gemm_fusion	Performs 2 GEMMs in a single kernel
		13_gemm_fft	Perform GEMM and FFT in a single kernel
		13_gemm_fft_fp16	Perform GEMM and FFT in a single kernel (half-precision complex type)
		13_gemm_fft_performance	Benchmark for GEMM and FFT fused into a single kernel
	Emulation	16_dgemm_emulation	Emulate double precision GEMM using lower precision operations (Ozaki scheme)
NVRTC Examples		15_nvrtc_gemm	Performs GEMM, kernel is compiled using NVRTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

cuBLASDx Library - API Examples

Description

Requirements

Build

Examples

FilesExpand file tree

cuBLASDx

Directory actions

More options

Directory actions

More options

Latest commit

History

cuBLASDx

Folders and files

parent directory

README.md

cuBLASDx Library - API Examples

Description

Requirements

Build

Examples