[INFRA]: Run CUB tests in parallel

### Is this a duplicate?

- [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cccl/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Overview

Today, CUB tests run sequentially because some of them test large problem sizes requiring all the VRAM. This limits our coverage of concurrency-related issues. @pauleonix found cases where compute sanitizer and sequential test runs are fine, but parallel runs of ctest lead to time-sharing and expose a data race on CUB end. 

By running CUB tests in parallel we'll get better coverage and faster CI. Current plan to achieve that is the following:

1. wait for https://github.com/NVIDIA/cccl/issues/9310 to be merged - it'll provide common component that's going to improve compilation time on split targets on step (2)
2. extract `*_large` tests into standalone TUs that require entire GPU and assign them appropriate `RESOURCE_GROUPS`
3. identify appropriate concurrency level 
4. use concurrency level from (3) as opt-in for CI - some runners are RAM limited (orin etc.). We should avoid running concurrent tests by default to avoid OOM. 

### Details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INFRA]: Run CUB tests in parallel #9550

Is this a duplicate?

Overview

Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[INFRA]: Run CUB tests in parallel #9550

Description

Is this a duplicate?

Overview

Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions