-
Notifications
You must be signed in to change notification settings - Fork 266
Fix performance bug in MultiThreadedExecutor (hopefully) #1547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rolling
Are you sure you want to change the base?
Fix performance bug in MultiThreadedExecutor (hopefully) #1547
Conversation
1e061ef to
6b15da7
Compare
…g, this helped a lot with ros2#1452 Signed-off-by: Michael Tandy <[email protected]>
6b15da7 to
0fb9834
Compare
|
From issue triage meeting: I put this on the agenda for the next Client Library Working Group meeting. |
|
We had to use 0.01 instead of 0 for this fix to work. @wjwwood Not sure what the underlying issue is, but the CPU performance boost is incredible. |
|
@LSchi-nexat Interesting, glad it's worked for you! For me, the sleep of zero worked OK. Can you tell me how many topics you're subscribed to, the message rates of the various topics, and whether you're doing any CPU-intensive processing of any of the messages? Using 0.01 (i.e. 10 milliseconds) of course means any messages arriving faster than 100 Hz might get dropped. So if you've got 1000 Hz topics, some of your CPU saving might be coming from just not not processing a bunch of messages. |
|
Discussion from Client Library Working Group. Based on researching during the meeting, we are fairly sure that @michaeltandy's analysis is correct. Since this PR has been opened, we have also landed some updates to the logic in the Executor based class here (#1469) which could potentially have impact on the behavior. It may be that this thread yield is still needed, but before we land it, we are going to check to see if 1469 has any impact. Actions:
|
|
I've tested, and @mjcarroll is right to say the current rolling branch has resolved the yield-the-same-task-repeatedly bug, in 9695271e providing a big boost in performance. However, there's still a further (modest) performance improvement available with the addition of the As of 10.0.2 Mostly fixed by 9695271e And slightly better again the the addition of the sleep(0) Still an improvement, but not the three-orders-of-magnitude I claimed in the pull request. |
fujitatomoya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaeltandy thanks a lot for sharing the information. i believe your analysis is correct, and this PR fixes that problematic situation in the MultiThreadedExecutor.
| future.result() # raise any exceptions | ||
|
|
||
| # Yield GIL so executor threads have a chance to run. | ||
| time.sleep(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this? can be more efficient. it directly tells the OS scheduler to yield the CPU to another thread without any sleep/timer overhead, and it clearly communicates the intent (yield CPU) rather than sleeping.
| time.sleep(0) | |
| os.sched_yield() if hasattr(os, 'sched_yield') else time.sleep(0) |
Description
Fixes #1452
This one-line change should improve the performance of MultiThreadedExecutor.
_wait_for_ready_callbacksyields, among other things, tasks that are in progress,if (not task.executing() and not task.done()SingleThreadedExecutorthis logic works fine - because yielded tasks are always completed before the next call to_wait_for_ready_callbacksMultiThreadedExecutoryielded tasks are merely submitted to a threadpool before the next call to_wait_for_ready_callbacks- there's no guarantee they'll have started executing. A task that's waiting in the threadpool queue can be returned and enqueued again._wait_for_ready_callbacksreturns quickly, rather than waiting.task.py's__call__method includes a check -if (not self._pending() or self._executing): return- the task doesn't actually run beyond that check repeatedly, despite being enqueued repeatedly.spin()thread holds the GIL, the thread pool executor's workers can't even begin to make progress.Running the Zachary's example from #1452 yields the following difference:
Before
22.503s/30.127s = 74.7% of a CPU core used, on average.
After
0.842s/30.127s = 2.8% of a CPU core used, on average.
Is this user-facing behavior change?
No change.
Did you use Generative AI?
No generative AI was used.
Additional Information
Diff if you want the logging that revealed this issue to me:
As a test I ran the basic publisher and subscriber from the documentation but with the publisher using
timer_period = 0.02and the subscriber using aMultiThreadedExecutor:Here's an excerpt of the logs, before the fix is applied: