Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Instantiating multiple engines for multi-thread use #20359

Answered by samskalicky
hmf asked this question in Q&A
Discussion options

You must be logged in to vote

@hmf you are correct. In general with large enough data sizes your 6-core machine would be best running with a single engine and using all 6 cores as openMP threads (default setting). This will give you the lowest inference latency.

However, to maximize throughput (at the expense of lowest latency) you may want to try using 1 process per core (6 processes). But be sure to set the OMP_NUM_THREADS to 1 so you dont overwhelm the processor. By default, MXNet sets OMP_NUM_THREADS to the number of real cores (not vCPUs or hyperThreads).

Not exactly the same topic, but similar ideas are discussed in this blog: https://aws.amazon.com/blogs/machine-learning/model-serving-with-amazon-elastic-infere…

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by hmf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants