[SPH] improve new SPH setup performance#1501
Conversation
|
Thanks @tdavidcl for opening this PR! You can do multiple things directly here: Once the workflow completes a message will appear displaying informations related to the run. Also the PR gets automatically reviewed by gemini, you can: |
Summary of ChangesHello @tdavidcl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the performance of the SPH (Smoothed Particle Hydrodynamics) setup process by optimizing memory management during particle distribution. It introduces a mechanism to reuse device buffers for intermediate calculations, thereby reducing the overhead of repeated memory allocations in critical loops. These changes are expected to yield noticeable performance improvements in the SPH simulation setup phase. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to improve performance in the SPH setup by refactoring code and introducing buffer recycling to avoid reallocations in a loop. The changes are generally good, but I've found a high-severity issue where the performance optimization is not correctly implemented. Additionally, I've provided a couple of medium-severity suggestions to improve template argument forwarding for better consistency and correctness within the codebase.
I am having trouble creating individual review comments. Click here to see my feedback.
src/shammodels/sph/src/modules/SPHSetup.cpp (422-429)
To achieve the performance benefit of recycling the mask_get_ids_where buffer, its declaration should be moved outside the while (has_been_limited) loop. Currently, it's being re-created in every iteration, which negates the optimization.
auto dev_sched = shamsys::instance::get_compute_scheduler_ptr();
sham::DeviceBuffer<u32> mask_get_ids_where(0, dev_sched);
bool has_been_limited = true;
while (has_been_limited) {
has_been_limited = false;
using namespace shamrock::patch;src/shamrock/include/shamrock/patch/PatchDataField.hpp (346)
To support perfect forwarding and for consistency with other template functions in this file (e.g., get_ids_vec_where), variadic template arguments should be taken as forwarding references (Args&&... args).
sham::DeviceBuffer<u32> &mask, Lambdacd &&cd_true, Args&&... args) const {src/shamrock/include/shamrock/patch/PatchDataField.hpp (394)
To ensure correct move/copy semantics and enable perfect forwarding of the lambda, std::forward should be used here. This is consistent with other functions in this file like get_ids_vec_where.
return get_ids_where_recycle_buffer(mask, std::forward<Lambdacd>(cd_true), args...);|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the table utility class by moving its definition from src/shammodels/common/src/timestep_report.cpp to a new, centralized header src/shambase/include/shambase/tabulate.hpp, making it reusable across the codebase. The timestep_report.cpp file is updated to include the new header and use the shambase::table namespace. Additionally, the SPH setup process (SPHSetup.hpp and SPHSetup.cpp) is enhanced with a new max_msg_size parameter, providing more granular control over message sizes during particle injection. Default values for insert_step, gen_step, msg_limit, and data_count_limit are adjusted, and performance logging for memory allocation and MPI timings is introduced during the injection phase. An optimization is also implemented to reuse device buffers (get_ids_where_recycle_buffer) during particle indexing. Review comments highlight the need to simplify the rulled_data rendering logic in the new tabulate.hpp for better readability and performance, correct a misleading variable name (sum_max_time_rank_getter) in the performance report, and fix the calculation of per-rank allocation time percentages by using rank-specific injection times instead of a global value.
Workflow reportworkflow report corresponding to commit 75b6e21 Pre-commit check reportPre-commit check: ✅ Test pipeline can run. Clang-tidy diff reportNo relevant changes found. You should now go back to your normal life and enjoy a hopefully sunny day while waiting for the review. Doxygen diff with
|
No description provided.