[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes #6942

srinivasyadav18 · 2025-12-11T01:45:48Z

Description

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.
Merge guarantee's API commit seperately in Add cuda::execution::guarantee and cuda::execution::segment_size::max_segment_size #6682
Check if static segment size is really useful or reduce kernel size for DeviceSegmentedReduceKernel

Status

The current version of PR show's good speed ups for I32/F32 (reaching upto 70%) with Sum, but only very decent improvements (upto 10% SOL from < 1% SOL) with more complex operator's like ArgMax or larger input types (> 4B).

Some intial benchmarks:

Sum T{ct}=F32

ArgMax T{ct}=F64

…size::max_segment_size`

…c guarantees

github-actions · 2025-12-11T05:40:59Z

😬 CI Workflow Results

🟥 Finished in 3h 53m: Pass: 80%/136 | Total: 5d 12h | Max: 3h 25m | Hits: 84%/212452

See results here.

srinivasyadav18 added 6 commits December 10, 2025 17:04

Add cuda::execution::guarantee's API and `cuda::execution::segment_…

5bf0477

…size::max_segment_size`

Replace template parameter _N with _Size

4da3347

remove usage of sub-namespace segment_size

321f183

Extend Gurantee's API and max_segment_size to support stateful/dynami…

890eb10

…c guarantees

add support for max seg size and optimize small,med seg size

61d0ce0

add benchmarks

7bea7fa

srinivasyadav18 requested review from a team as code owners December 11, 2025 01:45

srinivasyadav18 requested review from alliepiper and miscco December 11, 2025 01:45

github-project-automation bot added this to CCCL Dec 11, 2025

github-project-automation bot moved this to Todo in CCCL Dec 11, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes #6942

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes #6942

srinivasyadav18 commented Dec 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes #6942

Are you sure you want to change the base?

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes #6942

Conversation

srinivasyadav18 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Status

Some intial benchmarks:

Uh oh!

github-actions bot commented Dec 11, 2025

😬 CI Workflow Results

🟥 Finished in 3h 53m: Pass: 80%/136 | Total: 5d 12h | Max: 3h 25m | Hits: 84%/212452

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

srinivasyadav18 commented Dec 11, 2025 •

edited

Loading