https://github.com/BBuf/how-to-optim-algorithm-in-cuda/blob/3b076c6ef8f3204d4cb1a4c9029e433edd40edc0/reduce/reduce_v8_shfl_down_sync_pack.cu#L98 v8中这里看起来不需要sync ?
how-to-optim-algorithm-in-cuda/reduce/reduce_v8_shfl_down_sync_pack.cu
Line 98 in 3b076c6
v8中这里看起来不需要sync ?