Skip to content

Fix MPI communiation issues in Dust problems#1614

Draft
chongchonghe wants to merge 5 commits intodevelopmentfrom
chong/dust/MPI
Draft

Fix MPI communiation issues in Dust problems#1614
chongchonghe wants to merge 5 commits intodevelopmentfrom
chong/dust/MPI

Conversation

@chongchonghe
Copy link
Contributor

Description

Related issues

Are there any GitHub issues that are fixed by this pull request? Add a link to them here.

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

  • I have added a description (see above).
  • I have added a link to any related issues (if applicable; see above).
  • I have read the Contributing Guide.
  • I have added tests for any new physics that this PR adds to the code.
  • (For quokka-astro org members) I have manually triggered the GPU tests with the magic comment /azp run.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chongchonghe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to resolve potential MPI communication issues specifically within the DustDamping problem setup. It achieves this by removing a computeBeforeTimestep function that might have been redundant or problematic for parallel execution and by introducing a GPU synchronization step to ensure data consistency across processes after each timestep. These changes collectively enhance the stability and correctness of the simulation in a distributed computing environment.

Highlights

  • Removed computeBeforeTimestep Function: The computeBeforeTimestep function, which was responsible for extracting and storing initial physical quantities for analysis at t=0, has been entirely removed from the testDustDamping.cpp file. This likely streamlines the simulation's initialization or resolves conflicts with parallel execution.
  • Added GPU Synchronization: A call to amrex::Gpu::synchronize() was introduced within the computeAfterTimestep function. This ensures that all GPU operations are completed before the CPU proceeds, which is crucial for maintaining data consistency and preventing race conditions in a parallel computing environment, especially when dealing with MPI communication.
  • Refactored fextract Call: The fextract function call in computeAfterTimestep was updated to explicitly ignore the position return value using auto [_, values] = ..., indicating that this specific return value was not being utilized.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix MPI communication issues in dust-related problems. The main change is in src/problems/DustDamping/testDustDamping.cpp, where computeBeforeTimestep is removed and amrex::Gpu::synchronize() is added to computeAfterTimestep.

The removal of computeBeforeTimestep likely resolves an MPI deadlock caused by calling a collective function within a block that only executes on the IO processor. However, this also removes the logic for recording the initial state at t=0, which affects the test's error calculation. My review includes a suggestion to reintroduce computeBeforeTimestep with a corrected implementation that properly handles MPI communication, ensuring the test's integrity is maintained.

The other changes, such as using _ for an unused variable and adding a GPU synchronization point, are good improvements for code clarity and correctness.

I am having trouble creating individual review comments. Click here to see my feedback.

src/problems/DustDamping/testDustDamping.cpp (160-194)

high

The removal of computeBeforeTimestep fixes a potential MPI deadlock, which occurs if fextract is a collective operation but was called only on the IO processor. However, this removal also eliminates the recording of the simulation's initial state at t=0, which can affect the accuracy of the test's error analysis.

A better solution would be to correct the implementation of computeBeforeTimestep to handle MPI communication properly. This involves calling fextract on all ranks and then processing the results on the IO processor. This would fix the deadlock while preserving the test's integrity.

I recommend re-adding computeBeforeTimestep with the following corrected implementation:

template <>
void QuokkaSimulation<DustDamping>::computeBeforeTimestep()
{
	// extract initial physical quantities at t=0
	if (userData_.t_vec_.empty()) {
		auto [position, values] = fextract(state_new_cc_[0], Geom(0), 0, 0.5);

		if (amrex::ParallelDescriptor::IOProcessor()) {
			userData_.t_vec_.push_back(0.0); // initial time t=0

			// extract physical quantities
			const double density = values.at(HydroSystem<DustDamping>::density_index)[0];
			const double momentum_x = values.at(HydroSystem<DustDamping>::x1Momentum_index)[0];
			const double Egas_total = values.at(HydroSystem<DustDamping>::energy_index)[0];

			// store gas velocity
			const double v_gas = momentum_x / density;
			userData_.v_gas_vec_.push_back(v_gas);

			// store gas total energy
			userData_.E_gas_vec_.push_back(Egas_total);

			if constexpr (Physics_Traits<DustDamping>::is_dust_enabled) {
				// store dust1 velocity
				const double dust1_density = values.at(HydroSystem<DustDamping>::dustDensity_index)[0];
				const double dust1_momentum_x = values.at(HydroSystem<DustDamping>::x1DustMomentum_index)[0];
				const double v_dust1 = dust1_momentum_x / dust1_density;
				userData_.v_dust1_vec_.push_back(v_dust1);

				// store dust2 velocity
				const double dust2_density = values.at(HydroSystem<DustDamping>::dustDensity_index + numDustVars)[0];
				const double dust2_momentum_x = values.at(HydroSystem<DustDamping>::x1DustMomentum_index + numDustVars)[0];
				const double v_dust2 = dust2_momentum_x / dust2_density;
				userData_.v_dust2_vec_.push_back(v_dust2);
			}
		}
	}
}
template <>
void QuokkaSimulation<DustDamping>::computeBeforeTimestep()
{
	// extract initial physical quantities at t=0
	if (userData_.t_vec_.empty()) {
		auto [position, values] = fextract(state_new_cc_[0], Geom(0), 0, 0.5);

		if (amrex::ParallelDescriptor::IOProcessor()) {
			userData_.t_vec_.push_back(0.0); // initial time t=0

			// extract physical quantities
			const double density = values.at(HydroSystem<DustDamping>::density_index)[0];
			const double momentum_x = values.at(HydroSystem<DustDamping>::x1Momentum_index)[0];
			const double Egas_total = values.at(HydroSystem<DustDamping>::energy_index)[0];

			// store gas velocity
			const double v_gas = momentum_x / density;
			userData_.v_gas_vec_.push_back(v_gas);

			// store gas total energy
			userData_.E_gas_vec_.push_back(Egas_total);

			if constexpr (Physics_Traits<DustDamping>::is_dust_enabled) {
				// store dust1 velocity
				const double dust1_density = values.at(HydroSystem<DustDamping>::dustDensity_index)[0];
				const double dust1_momentum_x = values.at(HydroSystem<DustDamping>::x1DustMomentum_index)[0];
				const double v_dust1 = dust1_momentum_x / dust1_density;
				userData_.v_dust1_vec_.push_back(v_dust1);

				// store dust2 velocity
				const double dust2_density = values.at(HydroSystem<DustDamping>::dustDensity_index + numDustVars)[0];
				const double dust2_momentum_x = values.at(HydroSystem<DustDamping>::x1DustMomentum_index + numDustVars)[0];
				const double v_dust2 = dust2_momentum_x / dust2_density;
				userData_.v_dust2_vec_.push_back(v_dust2);
			}
		}
	}
}

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant