Skip to content

tbb::task_group thread scaling #313

Closed
@Dr15Jones

Description

@Dr15Jones

As part of transitioning from using the deprecated tbb::task API to tbb::task_group I have been doing performance measurement on our applications. I have found that when using a single tbb::task_group we get highly diminished thread scaling. To illustrate the problem, I created four highly simplified versions of the main processing loop of our applications. The code for the simple applications can be found here: https://github.com/Dr15Jones/tbb_group_scaling. Each application does the same processing but uses TBB in a different way. The differences are

  • using tbb::tasks directly which are all created using allocate_root (this is how our application typically works)
  • using 1 tbb::task_group to launch all the needed work
  • using N tbb::task_groups where we can use a task_group per thread we are requesting.
  • using tbb::tasks directly but using allocate_additional_child_of (created based on studying the performance of the other three cases).

When testing on either an Intel or AMD CPU, the single tbb::task_group was found to either not scale as the number of threads increased or to have extremely weak scaling compared to the other options. The tbb::task using allocate_additional_child_of had the best performance followed closely by the N tbb::task_groups case.

My question is, are there plans to improve the performance when using a single tbb::task_group? If not, is the use of multiple tbb::task_groups working together to share the load on creating tasks a supported use case? Alternatively, could a new API for creating a performant hierarchy of task_groups be developed in order to avoid doing a 'spin' loop over the task_group::wait calls?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions