Fix crashes appearing on CUDA CC 7 cards.#67
Conversation
ba98ac6 to
f93abbe
Compare
|
The checks are failing in an apt-get update. What can I do about that? |
[travis] add travis_retry on apt commands
|
@simogasp may have observed a slowdown. While the changes were tiny, I noticed a little overhead in s_desc_loop.cu. popsift::any() implies a __syncwarp(), and __syncthreads() isn't actually required. So I moved syncthreads() out of the loop. Also I revised the indentation, which makes it a lot easier to see how small the change is with respect to develop. |
|
@griwodz shall we also update the cmake to specify the architectures to build according to the CUDA version? As of now, CCs > 6.5 are not built. |
|
It seems to work safely on everything up to CC 7.5 now, so it's probably a good idea. |
simogasp
left a comment
There was a problem hiding this comment.
LGTM
As for cmake what about changing those lines to:
set(PopSift_CUDA_CC_LIST_BASIC 30 35 50 52 )
# versions greater than 7 support at least up to 6.x
if(CUDA_VERSION_MAJOR GREATER 7)
list(APPEND PopSift_CUDA_CC_LIST_BASIC 60 61 62)
endif()
# versions greater than 8 support at least up to 7.2
if(CUDA_VERSION_MAJOR GREATER 8)
list(APPEND PopSift_CUDA_CC_LIST_BASIC 70 72)
endif()
# versions greater than 9 support at least up to 7.5
if(CUDA_VERSION_MAJOR GREATER 9)
list(APPEND PopSift_CUDA_CC_LIST_BASIC 75)
endif()(info from here https://en.wikipedia.org/wiki/CUDA#GPUs_supported)
|
|
||
| return ( must_swap ? popsift::shuffle_xor( my_index, 1 << shift ) | ||
| : my_index ); | ||
| int lane = must_swap ? ( 1 << shift ) : 0; |
There was a problem hiding this comment.
Can we make a new PR for CMake and and the const?
I'm abroad and cannot test anything and I believe having the RTX fix merged is a good idea.
The APPEND needs a bit of testing because some of the CCs won't work. 32, 53, 62 and 72 are CC specific to the Tegra, which is an ARM-based system-on-a-chip and cannot do everything that I've used in PopSift. CC 52, on the other hand, should work.
There was a problem hiding this comment.
Oh yes, I've just listed the CC based on the cuda support, I had no idea which one should or should not be included. It was more to propose a different way to add the relevant CC.
Just out of curiosity, the tests that I ran were on an RTX 2080 without the proper CC set (7.5), do you think there is a significant difference when running the code from another CC that is not the one proper to the card? (i.e. should I expect that it runs even faster? :-) )
I'm not in a hurry to merge it, we can keep it here and merge it later.
Fixes the 2 issues with RTX 2080 that were reported and discussed in Issue #64