Skip to content

szarejkodariusz/reduce_optim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reduction optimisation

This repopository fully implements reduction optimisation steps from Mark Harris presentation:
Optimizing Parallel Reduction in CUDA - Mark Harris

How to run

make -j

./run

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1650"
Compute capability = 7.5
Total global mememeory = 3896 MB
Multi processor Count = 16
Warp size = 32
Max grid size = [2147483647, 65535, 65535]
Max threads per block = 1024
Shared memory per block = 48 kB
Shared memory per multiprocessor = 64 kB

correct_sum: -2097152

reduce_0: 0.638 ms
reduce_1: 0.529 ms
reduce_2: 0.417 ms
reduce_3: 0.249 ms
reduce_4: 0.177 ms

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published