Instantiating multiple engines for multi-thread use #20359
-
I would like to confirm that it is possible to instantiate several engines in the same process and use each one in a separate thread. Each thread/engine is to be used for both training and inference. Is this possible for version 1.7? I only found #12239, but here they seem to be using fully fledged processes (see referenced tests). Assuming I can use one engine per thread, will I may need to tweak the thread/device usage as described here. Is this correct? Any suggestions for initial tests of a 6 core machine? TIA |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
@samskalicky @ptrendx Maybe you two can help answer this question. |
Beta Was this translation helpful? Give feedback.
-
I think this is not possible. At least in version 1.7 - I get errors (results with NaN). |
Beta Was this translation helpful? Give feedback.
-
@hmf you are correct. In general with large enough data sizes your 6-core machine would be best running with a single engine and using all 6 cores as openMP threads (default setting). This will give you the lowest inference latency. However, to maximize throughput (at the expense of lowest latency) you may want to try using 1 process per core (6 processes). But be sure to set the OMP_NUM_THREADS to 1 so you dont overwhelm the processor. By default, MXNet sets OMP_NUM_THREADS to the number of real cores (not vCPUs or hyperThreads). Not exactly the same topic, but similar ideas are discussed in this blog: https://aws.amazon.com/blogs/machine-learning/model-serving-with-amazon-elastic-inference/ |
Beta Was this translation helpful? Give feedback.
@hmf you are correct. In general with large enough data sizes your 6-core machine would be best running with a single engine and using all 6 cores as openMP threads (default setting). This will give you the lowest inference latency.
However, to maximize throughput (at the expense of lowest latency) you may want to try using 1 process per core (6 processes). But be sure to set the OMP_NUM_THREADS to 1 so you dont overwhelm the processor. By default, MXNet sets OMP_NUM_THREADS to the number of real cores (not vCPUs or hyperThreads).
Not exactly the same topic, but similar ideas are discussed in this blog: https://aws.amazon.com/blogs/machine-learning/model-serving-with-amazon-elastic-infere…