-
Notifications
You must be signed in to change notification settings - Fork 164
Perf[MQB]: optimize inDispatcherThread
#1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
586dff0 to
829bded
Compare
Signed-off-by: Evgeny Malygin <[email protected]>
829bded to
4bf33e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3174 of commit 4bf33e5 has completed with FAILURE
| // the first operand: | ||
| return (dispatcherClientData().threadId() == | ||
| bslmt::ThreadUtil::selfId()) || | ||
| (dispatcherClientData().threadId() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure I verified in my benchmark that inDispatcherThread() takes the same time with or without the second part of the expression when the first part is true (short-circuit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also dispatcherClientData() is a virtual function and makes inDispatcherThread() slower than it can be. If we just keep threadId as a field in mqbi::DispatcherClient, we can address it directly without virtual function call and save time on this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just keep
threadIdas a field inmqbi::DispatcherClient, we can address it directly without virtual function call and save time on this check.
I applied this in the second commit and updated the benchmarks
Note that amd64 Linux time improved much more:
TEST /blazingmq/src/groups/mqb/mqba/mqba_dispatcher.t.cpp CASE -1
inDispatcherThread():
total: 6.11 ms (10000000 iterations)
per call: 0 ns
Signed-off-by: Evgeny Malygin <[email protected]>
a4307af to
675493f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3178 of commit 675493f has completed with FAILURE
pniedzielski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update yesterday, looks good.
Commit 1:
bslmt::ThreadUtil::Handleusage withbslmt::ThreadUtil::Id. According to docs,Handledoesn't allow usage from a thread different from the one it corresponds to.Idallows this.mqbi::DispatcherClientData. This allows to simplify lookups forinDispatcherThreadchecks.mqbi::Dispatcher::inDispatcherThread(int)->mqbi::DispatcherClient::inDispatcherThread(void). The previous declaration was abstract and it was not possible to inlineinDispatcherThread, now it is possible to inline it.inDispatcherThreadcalls. Before:BSLS_ASSERT_SAFE(dispatcher()->inDispatcherThread(this));, after:BSLS_ASSERT_SAFE(inDispatcherThread());Commit 2:
d_threadId:mqbi::DispatcherClientData::d_threadId->mqbi::DispatcherClient::d_threadId. This allows to simplify calls ininDispatcherThread()and make it even faster.Benchmarks
Darwin M2 - RelWithDebInfo
Before: 4.1 ns per iteration
BSLS_ASSERT_OPT(dispatcher_p->inDispatcherThread(client_p));After (commit 1): 2.7 ns per iteration
BSLS_ASSERT_OPT(client_p->inDispatcherThread());After (commit 2): 2.0 ns per iteration
BSLS_ASSERT_OPT(client_p->inDispatcherThread());Linux amd64 - RelWithDebInfo
Before: 4.6 ns per iteration
BSLS_ASSERT_OPT(dispatcher_p->inDispatcherThread(client_p));After (commit 1): 3.4 ns per iteration
BSLS_ASSERT_OPT(client_p->inDispatcherThread());After (commit 2): 0.6 ns per iteration
BSLS_ASSERT_OPT(client_p->inDispatcherThread());Note that the speedup on M2 Darwin is 2x, on amd64 Linux is 7.5x.