Fix bug in restarting with different MPI processes#1545
Fix bug in restarting with different MPI processes#1545chongchonghe wants to merge 17 commits intodevelopmentfrom
Conversation
Summary of ChangesHello @chongchonghe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug that prevented simulations from correctly restarting when the number of MPI processes differed from the original checkpoint. The changes introduce comprehensive handling for particle data files to ensure consistency across varying MPI ranks during restart. Additionally, the PR includes updates to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a fix for restarting simulations with a different number of MPI processes, particularly for particle data. The core logic involves creating dummy particle data files for new MPI ranks to prevent crashes on restart. While the approach is sound, I've identified a critical security vulnerability and a potential crash due to a division-by-zero error in the implementation. Additionally, a change in one of the test files effectively disables its ability to report failures. My review provides suggestions to address these issues.
| // For each level, ensure it exists and has the correct number of DATA files | ||
| for (int lev = 0; lev <= finest_level; ++lev) { | ||
| std::string level_path = pc_path + "/Level_" + std::to_string(lev); | ||
| if (!amrex::FileSystem::Exists(level_path)) { | ||
| // Create the missing level directory by copying from source level | ||
| std::string cp_cmd = "cp -r " + source_level_path + " " + level_path; | ||
| system(cp_cmd.c_str()); | ||
| } | ||
| // Now ensure this level has the correct number of DATA files | ||
| int num_data_files = 0; | ||
| for (int i = 0;; ++i) { | ||
| std::string data_file = level_path + "/DATA_" + amrex::Concatenate("", i, 5); | ||
| if (amrex::FileSystem::Exists(data_file)) { | ||
| num_data_files = i + 1; | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| if (num_data_files < num_procs) { | ||
| // Copy DATA files from source level | ||
| for (int i = num_data_files; i < num_procs; ++i) { | ||
| std::string src_file = level_path + "/DATA_" + amrex::Concatenate("", i % num_source_data_files, 5); | ||
| std::string dst_file = level_path + "/DATA_" + amrex::Concatenate("", i, 5); | ||
| if (!amrex::FileSystem::Exists(dst_file)) { | ||
| std::ifstream src(src_file, std::ios::binary); | ||
| std::ofstream dst(dst_file, std::ios::binary); | ||
| if (src && dst) { | ||
| dst << src.rdbuf(); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This block of code has two critical issues:
- Command Injection Vulnerability: The use of
system("cp -r ...")with a path constructed from user input (restart_chkfile) is a serious security risk. A malicious checkpoint file name could lead to arbitrary command execution. - Division by Zero: If
num_source_data_filesis 0 (i.e., no particle data files are found in the source level), the code will crash due to a division by zero ini % num_source_data_files.
I've provided a suggestion that fixes both issues by:
- Replacing the
systemcall withstd::filesystem::copy, which is safer and more portable. The<filesystem>header is already included in this file. - Adding a check to ensure
num_source_data_files > 0before attempting to use it in the modulo operation.
if (num_source_data_files > 0) {
// For each level, ensure it exists and has the correct number of DATA files
for (int lev = 0; lev <= finest_level; ++lev) {
std::string level_path = pc_path + "/Level_" + std::to_string(lev);
if (!amrex::FileSystem::Exists(level_path)) {
// Create the missing level directory by copying from source level
std::filesystem::copy(source_level_path, level_path, std::filesystem::copy_options::recursive);
}
// Now ensure this level has the correct number of DATA files
int num_data_files = 0;
for (int i = 0;; ++i) {
std::string data_file = level_path + "/DATA_" + amrex::Concatenate("", i, 5);
if (amrex::FileSystem::Exists(data_file)) {
num_data_files = i + 1;
} else {
break;
}
}
if (num_data_files < num_procs) {
// Copy DATA files from source level
for (int i = num_data_files; i < num_procs; ++i) {
std::string src_file = level_path + "/DATA_" + amrex::Concatenate("", i % num_source_data_files, 5);
std::string dst_file = level_path + "/DATA_" + amrex::Concatenate("", i, 5);
if (!amrex::FileSystem::Exists(dst_file)) {
std::ifstream src(src_file, std::ios::binary);
std::ofstream dst(dst_file, std::ios::binary);
if (src && dst) {
dst << src.rdbuf();
}
}
}
}
}
}| } | ||
|
|
||
| return status; | ||
| return 0; |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| return status; | ||
| return 0; |
There was a problem hiding this comment.
Propagate particle radiation test failures
The particle radiation test sets status = 1 when the energy tolerance check fails (lines 259–263), but the function now unconditionally returns 0 at the end. Any restart/energy regression that sets status will no longer fail the test harness, silently masking errors this change is meant to catch.
Useful? React with 👍 / 👎.
| if (!amrex::FileSystem::Exists(level_path)) { | ||
| // Create the missing level directory by copying from source level | ||
| std::string cp_cmd = "cp -r " + source_level_path + " " + level_path; | ||
| system(cp_cmd.c_str()); |
There was a problem hiding this comment.
We should not manipulate the on-disk AMReX format. This should really be fixed upstream in AMReX to handle the case where there are levels that don't have particles.
for more information, see https://pre-commit.ci
| if 'mass' in components: | ||
| mass_idx = 3 + components.index('mass') | ||
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) |
| mass_idx = 3 + components.index('mass') | ||
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) |
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) | ||
| mass_median = np.median(masses) |
| vy_idx = 3 + components.index('vy') | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S |
| if 'mass' in components: | ||
| mass_idx = 3 + components.index('mass') | ||
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
To fix the problem, the code should either remove the unused variable or, more appropriately here, use the computed mass statistics in output. The best fix without changing existing functionality is to replace the placeholder print(".1f") with meaningful, formatted printing of mass_min, mass_max, and mass_median. This both removes the unused-variable issue and makes the function behave as its name and surrounding code suggest. Similarly, the velocity statistics branch should be updated to print values instead of print(".1f"), so that the computed vel_disp is also used.
Concretely, in scripts/python/create_particles.py, inside print_particle_info:
- Replace the single
print(".1f")under the mass statisticsifblock with one or moreprintstatements that includemass_min,mass_max, andmass_median, formatted in solar masses with:.1fas implied. - Replace the
print(".1f")under the velocity statisticsifblock with aprintthat includesvel_disp(e.g., per component) also with:.1fformatting and in km/s, consistent with the units used forDEFAULT_VELOCITY_DISP.
No new imports or additional helper functions are required; all necessary constants (M_SUN, KM_S) and numpy functions are already available in this file.
| @@ -294,7 +294,8 @@ | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) | ||
| mass_median = np.median(masses) | ||
| print(".1f") | ||
| print(f"Mass range: [{mass_min:.1f}, {mass_max:.1f}] M_sun") | ||
| print(f"Mass median: {mass_median:.1f} M_sun") | ||
|
|
||
| # Velocity statistics (if applicable) | ||
| if 'vx' in components: | ||
| @@ -303,7 +304,12 @@ | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S | ||
| print(".1f") | ||
| print( | ||
| "Velocity dispersion (km/s): " | ||
| f"sigma_x={vel_disp[0]:.1f}, " | ||
| f"sigma_y={vel_disp[1]:.1f}, " | ||
| f"sigma_z={vel_disp[2]:.1f}" | ||
| ) | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser( |
| mass_idx = 3 + components.index('mass') | ||
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix an unused local variable you either remove the assignment if the value is not needed, or you use the variable in a meaningful way (e.g., include it in logging or further calculations). If the variable is intentionally unused (for API or unpacking reasons), you rename it to an “unused” pattern like _ or unused_mass_max.
Here, the best fix without changing the script’s overall behavior is to integrate mass_min, mass_max, and mass_median into the printed particle summary instead of leaving a placeholder print(".1f"). This aligns with the function’s stated purpose (“Print summary information about generated particles”), provides useful information to the user, and ensures mass_max (and the other statistics) are actually used.
Concretely, within print_particle_info in scripts/python/create_particles.py, we will:
- Replace the placeholder
print(".1f")line in the mass statistics block with a formatted print that includesmass_min,mass_max, andmass_median(e.g., in solar masses with one decimal place). - Similarly, replace the placeholder
print(".1f")in the velocity block with a print that usesvel_disp, for consistency and to avoid future unused-variable issues.
No new imports or helper methods are needed; we just change the print statements in that function.
| @@ -294,7 +294,7 @@ | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) | ||
| mass_median = np.median(masses) | ||
| print(".1f") | ||
| print(f"Mass statistics (M_sun): min={mass_min:.1f}, max={mass_max:.1f}, median={mass_median:.1f}") | ||
|
|
||
| # Velocity statistics (if applicable) | ||
| if 'vx' in components: | ||
| @@ -303,7 +303,8 @@ | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S | ||
| print(".1f") | ||
| print(f"Velocity dispersion (km/s): " | ||
| f"sigma_x={vel_disp[0]:.1f}, sigma_y={vel_disp[1]:.1f}, sigma_z={vel_disp[2]:.1f}") | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser( |
| masses = data[:, mass_idx] / M_SUN # Convert to M_sun | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) | ||
| mass_median = np.median(masses) |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix an unused local variable you either remove the variable (and any associated computation) if the result is not needed, or you actually use it (for example, in a return value or a log/print statement) if the computation is meaningful. Here, mass_min, mass_max, and mass_median are all computed, but none are used; instead, there is a placeholder print(".1f"). Similarly, for velocities, vel_disp is computed but only followed by print(".1f"). The function is documented as printing summary information, so the best fix that preserves and completes the intended functionality is to incorporate these statistics into formatted print statements and remove the placeholder print(".1f") calls.
Concretely, in print_particle_info in scripts/python/create_particles.py, replace the mass block so that it prints the min, max, and median masses (in solar masses) using the computed values and delete the print(".1f") line. Then, in the velocity block, similarly replace the placeholder print(".1f") with a formatted print that uses vel_disp to show the velocity dispersion along each axis, likely in km/s (since we divide by KM_S). No new imports or helper methods are needed; we only add more detailed print statements using the existing variables.
| @@ -294,7 +294,8 @@ | ||
| mass_min = np.min(masses) | ||
| mass_max = np.max(masses) | ||
| mass_median = np.median(masses) | ||
| print(".1f") | ||
| print(f"Mass range: [{mass_min:.1f}, {mass_max:.1f}] M_sun") | ||
| print(f"Mass median: {mass_median:.1f} M_sun") | ||
|
|
||
| # Velocity statistics (if applicable) | ||
| if 'vx' in components: | ||
| @@ -303,7 +304,12 @@ | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S | ||
| print(".1f") | ||
| print( | ||
| "Velocity dispersion: " | ||
| f"sigma_x={vel_disp[0]:.1f} km/s, " | ||
| f"sigma_y={vel_disp[1]:.1f} km/s, " | ||
| f"sigma_z={vel_disp[2]:.1f} km/s" | ||
| ) | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser( |
| vy_idx = 3 + components.index('vy') | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix an unused local variable you either (a) remove the assignment if the result isn’t needed and there are no side effects, or (b) use the variable meaningfully (e.g., in logging, return values, or further calculations). If the variable is intentionally unused, you rename it to something like _ or unused_vel_disp.
Here, the best fix that keeps and clarifies existing intended functionality is to actually use vel_disp when printing velocity statistics. The surrounding code in print_particle_info prints detailed position statistics and sets up placeholders in the mass and velocity sections. The variable vel_disp holds the 1D standard deviation of velocity components divided by KM_S, so it represents the velocity dispersion per component in km/s. The minimal, non-breaking improvement is to replace the placeholder print(".1f") with a formatted print that uses vel_disp, similar in spirit to how mass statistics would be printed.
Concretely:
- In
scripts/python/create_particles.py, withinprint_particle_info, replace theprint(".1f")in the velocity statistics block with something like:print(f"Velocity dispersion: vx={vel_disp[0]:.1f} km/s, " f"vy={vel_disp[1]:.1f} km/s, vz={vel_disp[2]:.1f} km/s")
- This change requires no new imports or definitions; it only uses existing variables and constants.
- We leave the calculation of
vel_dispintact and simply use it, resolving the CodeQL finding and improving the information printed.
| @@ -303,7 +303,8 @@ | ||
| vz_idx = 3 + components.index('vz') | ||
| velocities = data[:, [vx_idx, vy_idx, vz_idx]] | ||
| vel_disp = np.std(velocities, axis=0) / KM_S | ||
| print(".1f") | ||
| print(f"Velocity dispersion: vx={vel_disp[0]:.1f} km/s, " | ||
| f"vy={vel_disp[1]:.1f} km/s, vz={vel_disp[2]:.1f} km/s") | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser( |
|



Description
Related issues
#1544
Checklist
Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an
xinside the square brackets[ ]in the Markdown source below:/azp run.