GitHub - puzzlef/hello-cuda: A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.

A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.

Note You can just copy main.sh to your system and run it.
For the code, refer to main.cu.

$ bash main.sh

# Cloning into 'hello-cuda'...
# remote: Enumerating objects: 33, done.
# remote: Counting objects: 100% (12/12), done.
# remote: Compressing objects: 100% (11/11), done.
# remote: Total 33 (delta 2), reused 6 (delta 1), pack-reused 21
# Receiving objects: 100% (33/33), 24.58 KiB | 719.00 KiB/s, done.
# Resolving deltas: 100% (9/9), done.
# HELLO WORLD:
# GPU[B1.T0]: Hello CUDA
# GPU[B1.T1]: Hello CUDA
# GPU[B1.T2]: Hello CUDA
# GPU[B1.T3]: Hello CUDA
# GPU[B1.T4]: Hello CUDA
# GPU[B1.T5]: Hello CUDA
# GPU[B1.T6]: Hello CUDA
# GPU[B1.T7]: Hello CUDA
# GPU[B3.T0]: Hello CUDA
# GPU[B3.T1]: Hello CUDA
# GPU[B3.T2]: Hello CUDA
# GPU[B3.T3]: Hello CUDA
# GPU[B3.T4]: Hello CUDA
# GPU[B3.T5]: Hello CUDA
# GPU[B3.T6]: Hello CUDA
# GPU[B3.T7]: Hello CUDA
# GPU[B2.T0]: Hello CUDA
# GPU[B2.T1]: Hello CUDA
# GPU[B2.T2]: Hello CUDA
# GPU[B2.T3]: Hello CUDA
# GPU[B2.T4]: Hello CUDA
# GPU[B2.T5]: Hello CUDA
# GPU[B2.T6]: Hello CUDA
# GPU[B2.T7]: Hello CUDA
# GPU[B0.T0]: Hello CUDA
# GPU[B0.T1]: Hello CUDA
# GPU[B0.T2]: Hello CUDA
# GPU[B0.T3]: Hello CUDA
# GPU[B0.T4]: Hello CUDA
# GPU[B0.T5]: Hello CUDA
# GPU[B0.T6]: Hello CUDA
# GPU[B0.T7]: Hello CUDA
# CPU: Hello world!

# DEVICE PROPERTIES:
# COMPUTE DEVICE 0:
# Name: Tesla V100-PCIE-16GB
# Compute capability: 7.0
# Multiprocessors: 80
# Clock rate: 1380 MHz
# Global memory: 16151 MB
# Constant memory: 64 KB
# Shared memory per block: 48 KB
# Registers per block: 65536
# Threads per block: 1024 (max)
# Threads per warp: 32
# Block dimension: 1024x1024x64 (max)
# Grid dimension: 2147483647x65535x65535 (max)
# Device copy overlap: yes
# Kernel execution timeout: no

# CHOOSE DEVICE:
# Current CUDA device: 0
# CUDA device with atleast compute capability 1.3: 0
# Cards that have compute capability 1.3 or higher
# support double-precision floating-point math.

# MALLOC PERFORMANCE:
# Host malloc (1 GB): 0.00 ms
# CUDA malloc (1 GB): 1.35 ms
# Host free (1 GB): 0.00 ms
# CUDA free (1 GB): 1.51 ms

# MEMCPY PERFORMANCE:
# Host to host (1 GB): 412.59 ms
# Host to device (1 GB): 225.32 ms
# Device to host (1 GB): 246.87 ms
# Device to device (1 GB): 0.04 ms

# ADDITION:
# a = 1, b = 2
# a + b = 3 (GPU)

# VECTOR ADDITION:
# x = vector of size 1 GB
# y = vector of size 1 GB
# Vector addition on host (a = x + y): 438.02 ms
# Vector addition on device <<<32768, 32>>> (a = x + y): 4.33 ms
# Vector addition on device <<<16384, 64>>> (a = x + y): 3.98 ms
# Vector addition on device <<<8192, 128>>> (a = x + y): 4.01 ms
# Vector addition on device <<<4096, 256>>> (a = x + y): 3.97 ms
# Vector addition on device <<<2048, 512>>> (a = x + y): 4.00 ms
# Vector addition on device <<<1024, 1024>>> (a = x + y): 3.97 ms

# DOT PRODUCT:
# x = vector of size 1 GB
# y = vector of size 1 GB
# Dot product on host (a = x . y): 207.39 ms [2.154769e+05]
# Dot product on device (a = x . y): 2.69 ms [2.154769e+05] (memcpy approach)
# Dot product on device (a = x . y): 2.50 ms [2.154769e+05] (inplace approach)
# Dot product on device (a = x . y): 2.50 ms [2.154769e+05] (atomic-add approach)

# HISTOGRAM:
# buf = vector of size 1 GB
# Finding histogram of buf on host: 747.00 ms
# Finding histogram of buf on device (basic approach): 401.06 ms
# Finding histogram of buf on device (shared approach): 6.85 ms

# MATRIX MULTIPLICATION:
# x = matrix of size 16 MB
# y = matrix of size 16 MB
# Matrix multiplication on host (a = x * y): 33307.13 ms [3.287916e+00]
# Matrix multiplication on device (a = x * y): 18.93 ms (basic approach) [3.287916e+00]
# Matrix multiplication on device (a = x * y): 12.20 ms (tiled approach) [3.287916e+00]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
inc		inc
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
main.cu		main.cu
main.sh		main.sh
system.txt		system.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

References

About

Uh oh!

Languages

License

puzzlef/hello-cuda

Folders and files

Latest commit

History

Repository files navigation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages