Implement the new tuning API for DispatchThreeWayPartitionIf#7900
Implement the new tuning API for DispatchThreeWayPartitionIf#7900bernhardmgruber merged 13 commits intoNVIDIA:mainfrom
DispatchThreeWayPartitionIf#7900Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
d39b2c0 to
cb5b342
Compare
| large_segments_selector, | ||
| small_segments_selector, | ||
| max_num_segments_per_invocation, | ||
| static_cast<ChooseOffsetT::type>(max_num_segments_per_invocation), |
There was a problem hiding this comment.
Why is it still a ::type Shouldnt that be an alias?
There was a problem hiding this comment.
Because choose_signed_offset<Integral> is not just a type function. It provides a ::is_exceeding_offset_type in addition to a ::type.
This comment has been minimized.
This comment has been minimized.
| #if _CCCL_HAS_CONCEPTS() | ||
| requires three_way_partition_policy_selector<PolicySelector> | ||
| #endif // _CCCL_HAS_CONCEPTS() | ||
| CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE auto dispatch( |
There was a problem hiding this comment.
Question: in the other new API PRs, we moved the dispatch logic to from the Dispatch* struct to the new dispatch function. In this PR, we reuse it. Is this intentional?
There was a problem hiding this comment.
Yes, it keeps the diff smaller for the initial rewrite so we can merge faster. However, we will then need to do the host-side rewrite when we deprecate or drop the dispatchers.
f95a0a1 to
14628c6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 22h 19m: Pass: 100%/255 | Total: 8d 16h | Max: 1h 52m | Hits: 56%/158062See results here. |
DispatchSegmentedSort#7874cub.bench.partition.three_way.baseon SM75;80;86;90;100Fixes: #7646