Skip to content

Make PostUpdate run serially#3512

Open
luca-della-vedova wants to merge 1 commit intogazebosim:mainfrom
luca-della-vedova:luca/serial_post_update
Open

Make PostUpdate run serially#3512
luca-della-vedova wants to merge 1 commit intogazebosim:mainfrom
luca-della-vedova:luca/serial_post_update

Conversation

@luca-della-vedova
Copy link
Copy Markdown
Member

🎉 New feature

Remove parallel PostUpdate runs!

Summary

When profiling Gazebo worlds I realized a major bottleneck in synchronization primitives. Digging a bit deeper I found out this was due to the parallel PostUpdates and the synchronization between threads in the Barrier implementation.
I investigated and benchmarked three different approaches:

  • Status quo.
  • A new implementation based on gz::common::WorkerPool in this branch.
  • This implementation, ridiculously simple where we actually just run them in sequence and do no multithreading.

Test it

To be honest I struggled a bit with coming up with a scenario where parallel PostUpdates would help. The ideal would be a world with a lot of PostUpdates that do a lot of heavy work, however looking at the codebase it seems most systems do fairly trivial work on PostUpdate.
The benchmark I came up with instantiates a lot of TouchPlugin systems on static entities (200). The plugin does a decent amount of work on its PostUpdate function (which calls this), but if anyone can come up with a better benchmark I'm happy to change it.
The script I used to create the world is here.

I ran both worlds with bullet featherstone and didn't cap the RTF so it would run as fast as possible:

$ gz sim benchmark.sdf -r --physics-engine gz-physics-bullet-featherstone-plugin -s
$ gz sim shapes.sdf -r --physics-engine gz-physics-bullet-featherstone-plugin -s

Results

Tests ran in a cloud machine with many CPU cores but no GPU. Results in % of RTF.

Test world Current main Worker rewrite Serial PostUpdate (this PR)
shapes.sdf 600-1000% 2900% 3000%
benchmark.sdf 12-20% 25% 150%

Discussion

Apart from the clear result that the current barrier based implementation is too inefficient, the performance difference on a real world between the WorkerPool based approach and a naive serial PostUpdate is fairly small and almost within noise.
For a stress test however, when we have a lot of PostUpdate systems, the overhead of the thread scheduling is just too high and a naive serial implementation (this PR) is a lot faster, as well as being a lot simpler to reason about.

Of course this result is due to the design of the benchmark, which is why I'm happy to hear about any case where there might be numerous expensive PostUpdates that can be parallelized that I should test.

Checklist

  • Signed all commits for DCO
  • Added a screen capture or video to the PR description that demonstrates the feature
  • Added tests
  • Added example and/or tutorial
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • Updated Bazel files (if adding new files). Created an issue otherwise.
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers
  • Was GenAI used to generate this PR? If so, make sure to add "Generated-by" to your commits. (See this policy for more info.)

Generated-by: This PR wasn't but the benchmark and the alternative implementations were generated by Gemini 3.0

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by and Generated-by messages.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

2 participants