Skip to content

Implement the new tuning API for DispatchThreeWayPartitionIf#7900

Merged
bernhardmgruber merged 13 commits intoNVIDIA:mainfrom
bernhardmgruber:tuning_three_way
Mar 14, 2026
Merged

Implement the new tuning API for DispatchThreeWayPartitionIf#7900
bernhardmgruber merged 13 commits intoNVIDIA:mainfrom
bernhardmgruber:tuning_three_way

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Mar 5, 2026

Fixes: #7646

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Mar 5, 2026
@bernhardmgruber bernhardmgruber marked this pull request as ready for review March 9, 2026 17:20
@bernhardmgruber bernhardmgruber requested review from a team as code owners March 9, 2026 17:20
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Mar 9, 2026
large_segments_selector,
small_segments_selector,
max_num_segments_per_invocation,
static_cast<ChooseOffsetT::type>(max_num_segments_per_invocation),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it still a ::type Shouldnt that be an alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because choose_signed_offset<Integral> is not just a type function. It provides a ::is_exceeding_offset_type in addition to a ::type.

@github-actions

This comment has been minimized.

#if _CCCL_HAS_CONCEPTS()
requires three_way_partition_policy_selector<PolicySelector>
#endif // _CCCL_HAS_CONCEPTS()
CUB_RUNTIME_FUNCTION _CCCL_FORCEINLINE auto dispatch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: in the other new API PRs, we moved the dispatch logic to from the Dispatch* struct to the new dispatch function. In this PR, we reuse it. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it keeps the diff smaller for the initial rewrite so we can merge faster. However, we will then need to do the host-side rewrite when we deprecate or drop the dispatchers.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) March 14, 2026 19:50
@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 22h 19m: Pass: 100%/255 | Total: 8d 16h | Max: 1h 52m | Hits: 56%/158062

See results here.

@bernhardmgruber bernhardmgruber merged commit 3ed7d1b into NVIDIA:main Mar 14, 2026
535 of 542 checks passed
@bernhardmgruber bernhardmgruber deleted the tuning_three_way branch March 14, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Implement the new tuning API for DispatchThreeWayPartitionIf

3 participants