Skip to content

Commit a00cc3b

Browse files
authored
Update documentation for oneTBB 2021.9 (#1060)
1 parent 3eb1ff7 commit a00cc3b

9 files changed

+194
-5
lines changed

doc/GSG/get_started.rst

+2
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Get Started with |short_name|
99

1010
.. include:: before_beginning_and_example.rst
1111

12+
.. include:: hybrid_cpu_support.rst
13+
1214
Find more
1315
*********
1416

doc/GSG/hybrid_cpu_support.rst

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
.. _hybrid_cpu_support:
2+
3+
Hybrid CPU and NUMA Support
4+
***************************
5+
6+
If you need NUMA/Hybrid CPU support in oneTBB, you need to make sure that HWLOC* is installed on your system.
7+
8+
HWLOC* (Hardware Locality) is a library that provides a portable abstraction of the hierarchical topology of modern architectures (NUMA, hybrid CPU systems, etc).
9+
oneTBB relies on HWLOC* to identify the underlying topology of the system to optimize thread scheduling and memory allocation.
10+
11+
Without HWLOC*, oneTBB may not take advantage of NUMA/Hybrid CPU support. Therefore, it's important to make sure that HWLOC* is installed before using oneTBB on such systems.
12+
13+
Check HWLOC* on the System
14+
^^^^^^^^^^^^^^^^^^^^^^^^^^
15+
16+
To check if HWLOC* is already installed on your system, run `hwloc-ls`:
17+
18+
* For Linux* OS, in the command line.
19+
* For Windows* OS, in the command prompt.
20+
21+
If HWLOC* is installed, the command displays information about the hardware topology of your system.
22+
If it is not installed, you receive an error message saying that the command ``hwloc-ls`` could not be found.
23+
24+
.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher.
25+
For NUMA support, install HWLOC* version 1.11 or higher.
26+
27+
Install HWLOC*
28+
^^^^^^^^^^^^^^
29+
30+
To install HWLOC*, visit the official Portable Hardware Locality website (https://www-lb.open-mpi.org/projects/hwloc/).
31+
32+
* For Windows* OS, binaries are available for download.
33+
* For Linux* OS, only the source code is provided and binaries should be built.
34+
35+
On Linux* OS, HWLOC* can be also installed with package managers, such as APT*, YUM*, etc.
36+
To do so, run: ``sudo apt install hwloc``.
37+
38+
39+
.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher.
40+
For NUMA support, install HWLOC* version 1.11 or higher.

doc/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
project = u'Intel® oneAPI Threading Building Blocks (oneTBB)'
3030
else:
3131
project = u'oneTBB'
32-
copyright = u'2022, Intel Corporation'
32+
copyright = u'2023, Intel Corporation'
3333
author = u'Intel'
3434

3535
# The short X.Y version

doc/main/reference/reference.rst

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ It also describes features that are not included in the oneTBB specification.
1919
info_namespace
2020
parallel_for_each_semantics
2121
parallel_sort_ranges_extension
22+
scalable_memory_pools/malloc_replacement_log
2223

2324
Preview features
2425
****************

doc/main/reference/scalable_memory_pools.rst

+1
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,4 @@ Here, ``P`` represents an instance of the memory pool class.
4141
scalable_memory_pools/memory_pool_cls
4242
scalable_memory_pools/fixed_pool_cls
4343
scalable_memory_pools/memory_pool_allocator_cls
44+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
.. _malloc_replacement_log:
2+
3+
TBB_malloc_replacement_log Function
4+
===================================
5+
6+
.. note:: This function is for Windows* OS only.
7+
8+
Summary
9+
*******
10+
11+
Provides information about the status of dynamic memory allocation replacement.
12+
13+
Syntax
14+
*******
15+
16+
::
17+
18+
extern "C" int TBB_malloc_replacement_log(char *** log_ptr);
19+
20+
21+
Header
22+
******
23+
24+
::
25+
26+
#include "oneapi/tbb/tbbmalloc_proxy.h"
27+
28+
29+
Description
30+
***********
31+
32+
Dynamic replacement of memory allocation functions on Windows* OS uses in-memory binary instrumentation techniques.
33+
To make sure that such instrumentation is safe, oneTBB first searches for a subset of replaced functions in the Visual C++* runtime DLLs
34+
and checks if each one has a known bytecode pattern. If any required function is not found or its bytecode pattern is unknown, the replacement is skipped,
35+
and the program continues to use the standard memory allocation functions.
36+
37+
The ``TBB_malloc_replacement_log`` function allows the program to check if the dynamic memory replacement happens and to get a log of the performed checks.
38+
39+
**Returns:**
40+
41+
* 0, if all necessary functions are successfully found and the replacement takes place.
42+
* 1, otherwise.
43+
44+
The ``log_ptr`` parameter must be an address of a char** variable or be ``NULL``. If it is not ``NULL``, the function writes there the address of an array of
45+
NULL-terminated strings containing detailed information about the searched functions in the following format:
46+
47+
::
48+
49+
search_status: function_name (dll_name), byte pattern: <bytecodes>
50+
51+
52+
For more information about the replacement of dynamic memory allocation functions, see :ref:`Windows_C_Dynamic_Memory_Interface_Replacement`.
53+
54+
55+
Example
56+
*******
57+
58+
::
59+
60+
#include "oneapi/tbb/tbbmalloc_proxy.h"
61+
#include <stdio.h>
62+
63+
int main(){
64+
char **func_replacement_log;
65+
int func_replacement_status = TBB_malloc_replacement_log(&func_replacement_log);
66+
67+
if (func_replacement_status != 0) {
68+
printf("tbbmalloc_proxy cannot replace memory allocation routines\n");
69+
for (char** log_string = func_replacement_log; *log_string != 0; log_string++) {
70+
printf("%s\n",*log_string);
71+
}
72+
}
73+
74+
return 0;
75+
}
76+
77+
78+
Example output:
79+
80+
::
81+
82+
tbbmalloc_proxy cannot replace memory allocation routines
83+
Success: free (ucrtbase.dll), byte pattern: <C7442410000000008B4424>
84+
Fail: _msize (ucrtbase.dll), byte pattern: <E90B000000CCCCCCCCCCCC>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
.. _Floating_Point_Settings:
2+
3+
Floating-point Settings
4+
=======================
5+
6+
To propagate CPU-specific settings for floating-point computations to tasks executed by the task scheduler, you can use one of the following two methods:
7+
8+
* When a ``task_arena`` or a task scheduler for a given application thread is initialized, they capture the current floating-point settings of the thread.
9+
* The ``task_group_context`` class has a method to capture the current floating-point settings.
10+
11+
By default, worker threads use floating-point settings obtained during the initialization of a ``task_arena`` or the implicit arena of the application thread. The settings are applied to all computations within that ``task_arena`` or started by that application thread.
12+
13+
14+
For better control over floating point behavior, a thread may capture the current settings in a task group context. Do it at context creation with a special flag passed to the constructor:
15+
16+
::
17+
18+
task_group_context ctx( task_group_context::isolated,
19+
task_group_context::default_traits | task_group_context::fp_settings );
20+
21+
22+
Or call the ``capture_fp_settings`` method:
23+
24+
::
25+
26+
task_group_context ctx;
27+
ctx.capture_fp_settings();
28+
29+
30+
You can then pass the task group context to most parallel algorithms, including ``flow::graph``, to ensure that all tasks related to this algorithm use the specified floating-point settings.
31+
It is possible to execute the parallel algorithms with different floating-point settings captured to separate contexts, even at the same time.
32+
33+
Floating-point settings captured to a task group context prevail over the settings captured during task scheduler initialization. It means, if a context is passed to a parallel algorithm, the floating-point settings captured to the context are used.
34+
Otherwise, if floating-point settings are not captured to the context, or a context is not explicitly specified, the settings captured during the task arena initialization are used.
35+
36+
In a nested call to a parallel algorithm that does not use the context of a task group with explicitly captured floating-point settings, the outer-level settings are used.
37+
If none of the outer-level contexts capture floating-point settings, the settings captured during task arena initialization are used.
38+
39+
It guarantees that:
40+
41+
* Floating-point settings are applied to all tasks executed within a task arena, if they are captured:
42+
43+
* To a task group context.
44+
* During the arena initialization.
45+
46+
* A call to a oneTBB parallel algorithm does not change the floating-point settings of the calling thread, even if the algorithm uses different settings.
47+
48+
.. note::
49+
The guarantees above apply only to the following conditions:
50+
51+
* A user code inside a task should:
52+
53+
* Not change the floating-point settings.
54+
* Revert any modifications.
55+
* Restore previous settings before the end of the task.
56+
57+
* oneTBB task scheduler observers are not used to set or modify floating point settings.
58+
59+
Otherwise, the stated guarantees are not valid and the behavior related to floating-point settings is undefined.
60+

doc/main/tbb_userguide/Working_on_the_Assembly_Line_pipeline.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -172,13 +172,13 @@ equivalent version of the previous example that does this follows:
172172

173173

174174
void RunPipeline( int ntoken, FILE* input_file, FILE* output_file ) {
175-
oneapi::tbb::filter_mode<void,TextSlice*> f1( oneapi::tbb::filter_mode::serial_in_order,
175+
oneapi::tbb::filter<void,TextSlice*> f1( oneapi::tbb::filter_mode::serial_in_order,
176176
MyInputFunc(input_file) );
177-
oneapi::tbb::filter_mode<TextSlice*,TextSlice*> f2(oneapi::tbb::filter_mode::parallel,
177+
oneapi::tbb::filter<TextSlice*,TextSlice*> f2(oneapi::tbb::filter_mode::parallel,
178178
MyTransformFunc() );
179-
oneapi::tbb::filter_mode<TextSlice*,void> f3(oneapi::tbb::filter_mode::serial_in_order,
179+
oneapi::tbb::filter<TextSlice*,void> f3(oneapi::tbb::filter_mode::serial_in_order,
180180
MyOutputFunc(output_file) );
181-
oneapi::tbb::filter_mode<void,void> f = f1 & f2 & f3;
181+
oneapi::tbb::filter<void,void> f = f1 & f2 & f3;
182182
oneapi::tbb::parallel_pipeline(ntoken,f);
183183
}
184184

doc/main/tbb_userguide/title.rst

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
../tbb_userguide/Flow_Graph
1515
../tbb_userguide/work_isolation
1616
../tbb_userguide/Exceptions_and_Cancellation
17+
../tbb_userguide/Floating_Point_Settings
1718
../tbb_userguide/Containers
1819
../tbb_userguide/Mutual_Exclusion
1920
../tbb_userguide/Timing

0 commit comments

Comments
 (0)