Description
Summary
task_group::wait() may return earlier than the tasks queued with ::run() are finished.
Version
2022.0.0 and up to the latest master - affected
2021.13.0 - works
Might be related to commit 1f52f50?
Environment
Intel Alderlake CPU (16 perf threads, 8 energy eff threads)
Windows 11 (24H2 currently, although happens on olders as well)
ClangCL 19.1.1 from Visual Studio 2022 17.14 (earlier versions suffer, too)
C++20 application (game engine), all tasking and threading is done only via oneTBB
Observed Behavior
Consider the following simple example:
// serial code
parallel_invoke(
[]{ render_task(); },
[]{ service_task(); },
);
// serial code that depends on what render_task() and service_task() do
This works as expected on any version. When parallel_invoke() returns, all the required data is processed and available.
Now we change this to:
// serial code
task_group tg;
tg.run([]{ render_task(); });
tg.run([]{ service_task(); });
tg.wait();
// serial code that depends on what render_task() and service_task() do
According to the documentation, they must behave roughly the same way.
But time to time, on versions newer than 2021.13.0 I get crashes in the code after tg.wait(), with the dumps showing me that the data render_task() and service_task() must've processed is not ready yet.
However, W/A like this:
// serial code
std::barrier wa(3);
task_group tg;
tg.run([&wa]{ render_task(); wa.arrive(); });
tg.run([&wa]{ service_task(); wa.arrive(); });
tg.wait();
wa.arrive_and_wait();
// serial code that depends on what render_task() and service_task() do
works just fine. This made me think tg.wait() exits prematurely.
Where possible, I use parallel_invoke(), parallel_for() etc., but sometimes I just need to fire an async task and then wait for the results in a different part of the code, so I anyway have to use task_groups (at least until task_arena implements per-task waiters)
Expected Behavior
The code below tg.wait() must be run only after the task group finished processing all of the tasks.
Steps To Reproduce
The examples above should be enough, considering that the tasks that are run via tg.run() will take some time.
Activity