-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[RFC] Advanced Core Type Selection #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@wangleis @sunxiaoxia2022, this is an RFC for adding multiple core type selection to the master branch. Feel free to provide feedback. Thanks! |
| | **P-cores only** | Maximum single-threaded performance | Leaves E-cores idle; limited parallelism; higher power | | ||
| | **E-cores only** | Good for parallel workloads | Doesn't utilize P-core performance; excludes LP E-cores | | ||
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads | | ||
| | **No constraint** | Maximum flexibility | May schedule on inappropriate cores (e.g., LP E-cores for compute) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P-cores + E-cores are required, but for latency mode with shared L3 cache, LPE cores should be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming. As it says in the following paragraph,
oneTBB/rfcs/proposed/core_types/README.md
Line 57 in d9af5f4
| None of these options provide the desired behavior: **"Use P-cores or E-cores, but avoid LP E-cores"** or **"Use any |
| - Pros: Simpler logic, easier to extend | ||
| - Cons: Increases struct size, breaks ABI compatibility | ||
|
|
||
| 6. **Info API**: Should `info::core_types()` be extended to return a count instead of/in addition to a vector? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info::core_types() returns std::vector<core_type_id>. So, if I understand the sixth question correctly, the count can be retrieved via ct = info::core_types(); ct.size().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I only meant "instead", the main idea being that
info::core_types() → {0, 1, ..., n-1}
is no more useful than something like
info::num_core_types() → n
| ### New API | ||
|
|
||
| Add the following methods to `tbb::task_arena::constraints`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility can be to leave constraints struct unchanged, but introduce, for example, a new constructor for the task_arena class that would accept vector/array of constraints instances, each bound to certain NUMA, core type and threads per core constraints. The task arena constructor would make a union of the masks resulting from each of constraints instance and use that union as its constraint.
At first glance, this design looks more flexible to me as it scales better, allowing users not only specifying more than one core type, but also more than one NUMA node. Essentially, users can specify multiple portions of the platform, whose united constraint is desired to be set for a single task_arena instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like TCM. 😉 I will try to add this as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along this same line of thinking, if there is a set_core_types there should likely, eventually, be a set_num_ids function. We should consider which is easier to reason about, a combination created from a vector of constraints, or these specific functions.
|
|
||
| ### Motivation | ||
|
|
||
| The current oneTBB API allows users to constrain task execution to a single core type using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The current oneTBB API allows users to constrain task execution to a single core type using | |
| By default, oneTBB includes all available core types in a task arena unless explicitly constrained. | |
| The current oneTBB API allows users to constrain task execution to a single core type using |
|
|
||
| #### 1. **Flexibility and Resource Utilization** | ||
|
|
||
| Many parallel workloads can execute efficiently on multiple core types. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Many parallel workloads can execute efficiently on multiple core types. For example: | |
| While it is often best to allow the OS to use all core types and flexibly schedule threads, some advanced users may find it necessary to constrain scheduling. | |
| When there are more than two core types, it may be desired to constrain execution to not just a single core type. | |
| Many parallel workloads can execute efficiently on multiple core types that make up a subset of the available core types. For example: |
|
|
||
| #### 3. **Avoiding Inappropriate Core Selection** | ||
|
|
||
| Without the ability to specify "P-cores OR E-cores (but not LP E-cores)", applications face a dilemma: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Without the ability to specify "P-cores OR E-cores (but not LP E-cores)", applications face a dilemma: | |
| Without the ability to specify "P-cores OR E-cores (but not LP E-cores)" or | |
| "LP E-cores and E-cores but not P-cores" applications face dilemmas. | |
| For example, without being able to specify "P-cores OR E-cores (but not LP E-cores)": |
| |----------|------|------| | ||
| | **P-cores only** | Maximum single-threaded performance | Leaves E-cores idle; limited parallelism; higher power | | ||
| | **E-cores only** | Good for parallel workloads | Doesn't utilize P-core performance; excludes LP E-cores | | ||
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for most workloads | | |
| | **LP E-cores only** | Minimal power consumption | Severe performance impact for some workloads that require large, shared caches. | |
vossmjp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this RFC should not read like guidance on how to select which core type(s) to use based on application characteristics. The relative capabilities of core types may differ based on the HW platform and benefits will be highly application dependent. Instead, it should describe use cases with appropriate caveats that its generally better to the let the OS decide and that constraints should be applied carefully.
Add an RFC describing setting multiple core types in task arena constraints.
Reference implementation: dev/dnmokhov/core-types