The CUDA Introduction exercises are design to get you accustomed to CUDA Programming using simple operations:
- SAXPY (
z[] = a * x[] + y[]) - Matrix Transpose
- Matrix Multiplication
Clone this repository from Github and then run the instructions below.
- From the
cuda-introductiondirectory, run:- Windows:
cmake -B build -S . -G "Visual Studio 17 2022"to generate the Visual Studio project. You may choose a different Visual Studio version. - Linux:
cmake -B build -S . -G "Unix Makefiles"to generate the makefiles.
- Windows:
- Open the generated
CUDAIntroductionproject from thebuilddirectory. - Build the project in Visual Studio. (Note that there are Debug and Release configuration options.)
- Run. Make sure you run the actual project as target (not
ALL_BUILD) by right-clicking it and selecting "Set as StartUp Project".
Please ask me or the TAs ahead of time if you have trouble compiling the code. We want to be ready to go at the start of the lab.
Start with SAXPY. Follow the TODOs, including taking a look at common.h and common.cpp. The TODOs are numbered in order to help guide you.
The LOOK comments are designed to show you best practices for CUDA Programming. You can copy these snippets into future projects if needed.
Once you have completed each the the exercises, follow the TODO Optionals for testing different configurations of sizes.
Repeat the same for Matrix Transpose, then Matrix Multiplication. In each file, follow the TODOs, which are numbered in order.
With each implementation, run the NSight Debugger. Walk through the steps, go to different threads, warps, blocks, inspect the variables. Try to thoroughly understand your code as well as the debugging tools available to you.
If you get stuck at any point, follow this order:
- Use the Nsight debugger to understand the problem. Use paper and pencil to write down the equations especially with regards to indexing.
- Search on the internet. Try to understand the code others have written and use that to solve your problem.
- Use the cuda-introduction-solutions branch. The solutions for all the exercises are provided. For best learning, try to solve the problems on your own and only use this as reference to compare your implementation.
Following that, try to break your own implementations to familiarize yourself with common CUDA errors. Some examples include:
- Pass invalid pointers - either null, or pass the host pointer to device.
- Out of bounds access in CUDA functions like
cudaMemcpyas well as in kernels. - Use incorrect sizes in CUDA APIs, for example set the size parameter to 0.
- Launch kernels with bad configurations, including exceeding device limits.
- Flip indices to force bad access patterns.
When doing the above actions also use Nsight to debug.
The goal is to not just understand how to correctly implement CUDA programs, but also identify when you are doing incorrect actions. This way, when you see similar errors in your subsequent projects, you'll know where to look.
This repository includes code from termcolor licensed under the BSD 3 Clause.