Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak for static_partitioner #1404

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions include/oneapi/tbb/partitioner.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2005-2023 Intel Corporation
Copyright (c) 2005-2024 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -46,6 +46,7 @@
#include "cache_aligned_allocator.h"
#include "task_group.h" // task_group_context
#include "task_arena.h"
#include "global_control.h"

#include <algorithm>
#include <atomic>
Expand All @@ -70,7 +71,8 @@ class affinity_partitioner_base;

inline std::size_t get_initial_auto_partitioner_divisor() {
const std::size_t factor = 4;
return factor * static_cast<std::size_t>(max_concurrency());
return factor * std::min(static_cast<std::size_t>(max_concurrency()),
tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism));
}

//! Defines entry point for affinity partitioner into oneTBB run-time library.
Expand All @@ -90,7 +92,8 @@ class affinity_partitioner_base: no_copy {
/** Retains values if resulting size is the same. */
void resize(unsigned factor) {
// Check factor to avoid asking for number of workers while there might be no arena.
unsigned max_threads_in_arena = static_cast<unsigned>(max_concurrency());
unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));
Comment on lines +95 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a little bit clumsy, don't you think? Does it make sense to move calculation of maximum_arena_concurrency into some dedicated function that could be reused in get_initial_auto_partitioner_divisor as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. if need to introduce new method for these 2 lines. (we check not arena concurrency min(arena_concurrency, allowed_concurrency) but rather available amount of workers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to move it into something like get_num_possible_workers() to highlight that the solution is based on the immediate value of workers as std::min(...) does not mean much to reader.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from affinity

std::size_t new_size = factor ? factor * max_threads_in_arena : 0;
if (new_size != my_size) {
if (my_array) {
Expand Down
28 changes: 27 additions & 1 deletion test/tbb/test_task.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (c) 2005-2023 Intel Corporation
Copyright (c) 2005-2024 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -20,6 +20,7 @@
#include "common/spin_barrier.h"
#include "common/utils_concurrency_limit.h"
#include "common/cpu_usertime.h"
#include "common/memory_usage.h"

#include "tbb/task.h"
#include "tbb/task_group.h"
Expand Down Expand Up @@ -840,3 +841,28 @@ TEST_CASE("Check correct arena destruction with enqueue") {
tbb::finalize(handle, std::nothrow_t{});
}
}

//! \brief \ref regression
TEST_CASE("Check that memory does not leak with static_partitioner + global_control") {
tbb::global_control gbl_ctrl{ tbb::global_control::max_allowed_parallelism, std::size_t(tbb::this_task_arena::max_concurrency() / 2) };

size_t current_memory_usage = 0, previous_memory_usage = 0, stability_counter = 0;
bool no_memory_leak = false;
std::size_t num_iterations = 100;
for (std::size_t i = 0; i < num_iterations; ++i) {
for (std::size_t j = 0; j < 100; ++j) {
tbb::parallel_for(0, 1000, [] (int) {}, tbb::static_partitioner{});
}

current_memory_usage = utils::GetMemoryUsage();
stability_counter = current_memory_usage==previous_memory_usage ? stability_counter + 1 : 0;
// If the amount of used memory has not changed during 5% of executions,
// then we can assume that the check was successful
if (stability_counter > num_iterations / 20) {
no_memory_leak = true;
break;
}
previous_memory_usage = current_memory_usage;
}
REQUIRE_MESSAGE(no_memory_leak, "Seems we get memory leak here.");
Comment on lines +847 to +867
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap it into the loop with gradual decrease of the global_control's limit? E.g.,

std::size_t current_limit = std::size_t(tbb::this_task_arena::max_concurrency());
while (current_limit /= 2) {
    tbb::global_control gc{ tbb::global_control::max_allowed_parallelism, current_limit };

    // iterations loop goes here {
        // repetitions loop goes here {
        // }
    // }
}

}
Loading