Skip to content

Conversation

@dnmokhov
Copy link
Contributor

@dnmokhov dnmokhov commented Nov 24, 2025

Add an RFC describing setting multiple core types in task arena constraints.

Reference implementation: dev/dnmokhov/core-types

@dnmokhov
Copy link
Contributor Author

@wangleis @sunxiaoxia2022, this is an RFC for adding multiple core type selection to the master branch. Feel free to provide feedback. Thanks!

| **P-cores only** | Maximum single-threaded performance | Leaves E-cores idle; limited parallelism; higher power |
| **E-cores only** | Good for parallel workloads | Doesn't utilize P-core performance; excludes LP E-cores |
| **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads |
| **No constraint** | Maximum flexibility | May schedule on inappropriate cores (e.g., LP E-cores for compute) |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P-cores + E-cores are required, but for latency mode with shared L3 cache, LPE cores should be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. As it says in the following paragraph,

None of these options provide the desired behavior: **"Use P-cores or E-cores, but avoid LP E-cores"** or **"Use any

- Pros: Simpler logic, easier to extend
- Cons: Increases struct size, breaks ABI compatibility

6. **Info API**: Should `info::core_types()` be extended to return a count instead of/in addition to a vector?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info::core_types() returns std::vector<core_type_id>. So, if I understand the sixth question correctly, the count can be retrieved via ct = info::core_types(); ct.size().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I only meant "instead", the main idea being that

info::core_types() → {0, 1, ..., n-1}

is no more useful than something like

info::num_core_types() → n

Comment on lines +66 to +68
### New API

Add the following methods to `tbb::task_arena::constraints`:
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility can be to leave constraints struct unchanged, but introduce, for example, a new constructor for the task_arena class that would accept vector/array of constraints instances, each bound to certain NUMA, core type and threads per core constraints. The task arena constructor would make a union of the masks resulting from each of constraints instance and use that union as its constraint.
At first glance, this design looks more flexible to me as it scales better, allowing users not only specifying more than one core type, but also more than one NUMA node. Essentially, users can specify multiple portions of the platform, whose united constraint is desired to be set for a single task_arena instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like TCM. 😉 I will try to add this as an alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along this same line of thinking, if there is a set_core_types there should likely, eventually, be a set_num_ids function. We should consider which is easier to reason about, a combination created from a vector of constraints, or these specific functions.


### Motivation

The current oneTBB API allows users to constrain task execution to a single core type using
Copy link
Contributor

@vossmjp vossmjp Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The current oneTBB API allows users to constrain task execution to a single core type using
By default, oneTBB includes all available core types in a task arena unless explicitly constrained.
The current oneTBB API allows users to constrain task execution to a single core type using


#### 1. **Flexibility and Resource Utilization**

Many parallel workloads can execute efficiently on multiple core types. For example:
Copy link
Contributor

@vossmjp vossmjp Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Many parallel workloads can execute efficiently on multiple core types. For example:
While it is often best to allow the OS to use all core types and flexibly schedule threads, some advanced users may find it necessary to constrain scheduling.
When there are more than two core types, it may be desired to constrain execution to not just a single core type.
Many parallel workloads can execute efficiently on multiple core types that make up a subset of the available core types. For example:


#### 3. **Avoiding Inappropriate Core Selection**

Without the ability to specify "P-cores OR E-cores (but not LP E-cores)", applications face a dilemma:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Without the ability to specify "P-cores OR E-cores (but not LP E-cores)", applications face a dilemma:
Without the ability to specify "P-cores OR E-cores (but not LP E-cores)" or
"LP E-cores and E-cores but not P-cores" applications face dilemmas.
For example, without being able to specify "P-cores OR E-cores (but not LP E-cores)":

|----------|------|------|
| **P-cores only** | Maximum single-threaded performance | Leaves E-cores idle; limited parallelism; higher power |
| **E-cores only** | Good for parallel workloads | Doesn't utilize P-core performance; excludes LP E-cores |
| **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads |
Copy link
Contributor

@vossmjp vossmjp Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads |
| **LP E-cores only** | Minimal power consumption | Severe performance impact for some workloads that require large, shared caches. |

Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants