-
Notifications
You must be signed in to change notification settings - Fork 7
Description
mul_op (int) in the instructions.cu example is actually doing an addition!
I'm new to cuda, but I presume the 'add.s32' should be 'mul.lo.s32'.
The output in the readme looks to be reflecting this error too.
I tested out the impact of making this change on a Tesla T4 and it went from -
int add 1.89 3 87.044762 3200 (3276800)
...
int mul 1.89 3 87.348724 3200 (3276800)
float mul 3.14 5 62.641941 3200 (3276800)
to -
int mul 3.14 5 62.652721 3200 (3276800)
float mul 3.14 5 62.641941 3200 (3276800)
(so int and float mul taking roughly equal amounts of time.)