Instantiating multiple engines for multi-thread use #20359

hmf · 2021-06-17T08:36:11Z

hmf
Jun 17, 2021

I would like to confirm that it is possible to instantiate several engines in the same process and use each one in a separate thread. Each thread/engine is to be used for both training and inference. Is this possible for version 1.7?

I only found #12239, but here they seem to be using fully fledged processes (see referenced tests).

Assuming I can use one engine per thread, will I may need to tweak the thread/device usage as described here. Is this correct? Any suggestions for initial tests of a 6 core machine?

TIA

Answered by samskalicky

Jun 21, 2021

@hmf you are correct. In general with large enough data sizes your 6-core machine would be best running with a single engine and using all 6 cores as openMP threads (default setting). This will give you the lowest inference latency.

However, to maximize throughput (at the expense of lowest latency) you may want to try using 1 process per core (6 processes). But be sure to set the OMP_NUM_THREADS to 1 so you dont overwhelm the processor. By default, MXNet sets OMP_NUM_THREADS to the number of real cores (not vCPUs or hyperThreads).

Not exactly the same topic, but similar ideas are discussed in this blog: https://aws.amazon.com/blogs/machine-learning/model-serving-with-amazon-elastic-infere…

View full answer

TristonC · 2021-06-19T00:36:13Z

TristonC
Jun 19, 2021

@samskalicky @ptrendx Maybe you two can help answer this question.

0 replies

hmf · 2021-06-21T08:47:42Z

hmf
Jun 21, 2021
Author

I think this is not possible. At least in version 1.7 - I get errors (results with NaN).
So I will assume I need to use processes.

0 replies

samskalicky · 2021-06-21T16:20:48Z

samskalicky
Jun 21, 2021
Collaborator

@hmf you are correct. In general with large enough data sizes your 6-core machine would be best running with a single engine and using all 6 cores as openMP threads (default setting). This will give you the lowest inference latency.

However, to maximize throughput (at the expense of lowest latency) you may want to try using 1 process per core (6 processes). But be sure to set the OMP_NUM_THREADS to 1 so you dont overwhelm the processor. By default, MXNet sets OMP_NUM_THREADS to the number of real cores (not vCPUs or hyperThreads).

Not exactly the same topic, but similar ideas are discussed in this blog: https://aws.amazon.com/blogs/machine-learning/model-serving-with-amazon-elastic-inference/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Instantiating multiple engines for multi-thread use #20359

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Instantiating multiple engines for multi-thread use #20359

Uh oh!

hmf Jun 17, 2021

Replies: 3 comments

Uh oh!

TristonC Jun 19, 2021

Uh oh!

hmf Jun 21, 2021 Author

Uh oh!

samskalicky Jun 21, 2021 Collaborator

hmf
Jun 17, 2021

TristonC
Jun 19, 2021

hmf
Jun 21, 2021
Author

samskalicky
Jun 21, 2021
Collaborator