Conversation
|
Performance looks good: |
|
I have absolutely no idea why |
This comment has been minimized.
This comment has been minimized.
This implements the `iota` algorithm for the cuda backend. * `std::iota` see https://en.cppreference.com/w/cpp/algorithm/iota.html It provides tests and benchmarks similar to Thrust and some boilerplate for libcu++ The functionality is publicly available yet and implemented in a private internal header Fixes NVIDIA#7927
🥳 CI Workflow Results🟩 Finished in 1h 19m: Pass: 100%/99 | Total: 1d 01h | Max: 53m 50s | Hits: 98%/255735See results here. |
| # if _LIBCUDACXX_HAS_NVFP16() | ||
| // We cannot rely on operator+ and constructors from integers to be available for the extended fp types | ||
| if constexpr (is_same_v<_Tp, __half>) | ||
| { | ||
| return ::__hadd(__init_, ::__ull2half_rn(__index)); | ||
| } | ||
| else | ||
| # endif // _LIBCUDACXX_HAS_NVFP16() | ||
| # if _LIBCUDACXX_HAS_NVBF16() | ||
| if constexpr (is_same_v<_Tp, __nv_bfloat16>) | ||
| { | ||
| return ::__hadd(__init_, ::__ull2bfloat16_rn(__index)); | ||
| } | ||
| else | ||
| # endif // _LIBCUDACXX_HAS_NVBF16() | ||
| if constexpr (is_arithmetic_v<_Tp>) | ||
| { // avoid warnings about integer conversions | ||
| return static_cast<_Tp>(__init_ + static_cast<_Tp>(__index)); | ||
| } | ||
| else if constexpr (__can_operator_plus_integral<_Tp>) | ||
| { | ||
| return __init_ + __index; | ||
| } | ||
| else if constexpr (__can_operator_plus_conversion<_Tp>) | ||
| { | ||
| return __init_ + static_cast<_Tp>(__index); |
There was a problem hiding this comment.
question: Shouldn't cuda::std::plus already handle all these details? Can we just use that here?
There was a problem hiding this comment.
Unfortunately, it does not but does the plain return __x + __y;
That means we can get integer promotion / sign conversion warnings
| _InputIterator __first, | ||
| _InputIterator __last, | ||
| const _Tp& __init, | ||
| const _Tp& __step) |
There was a problem hiding this comment.
comment: Adding a parallel iota is one thing, but adding an iota with a step feels too extreme to still be called cuda::std::iota.
I'm leaning more towards a cuda::sequence algorithm here, otherwise we're starting to twist our promise that everything in cuda/std is conforming.
This implements the
iotaalgorithm for the cuda backend.std::iotasee https://en.cppreference.com/w/cpp/algorithm/iota.htmlIt provides tests and benchmarks similar to Thrust and some boilerplate for libcu++
The functionality is publicly available yet and implemented in a private internal header
Fixes #7927