add script to compute radial profiles#1597
Conversation
Summary of ChangesHello @BenWibking, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Python utility script that streamlines the analysis of simulation data by computing radial profiles from 2D histograms. It provides a robust method for extracting mean and median values of a variable as a function of radius, offering both tabular data and visual representations. The script also enhances analytical capabilities by allowing comparison with theoretical disk models and external datasets, making it a valuable tool for astrophysical data interpretation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
There was a problem hiding this comment.
Code Review
The pull request introduces a Python script to compute radial profiles from 2D histograms. The script includes functionality for reading data, performing interpolations, computing profiles (mean/median), and plotting. Overall, the script is functional and addresses the stated objective. However, there are several areas for improvement related to code duplication, constant management, and efficiency that would enhance maintainability and performance.
|
|
||
| header = read_header(filename) | ||
| var_names = header.get("variables", []) | ||
| if len(var_names) != 2: | ||
| raise ValueError(f"{filename}: expected 2D histogram, found variables={var_names}") | ||
| if args.radius_name not in var_names: | ||
| raise ValueError(f"{filename}: radius variable '{args.radius_name}' not found in {var_names}") | ||
|
|
||
| is_log = [bool(v) for v in header.get("is_log_spaced", [0, 0])] | ||
| data = np.genfromtxt(filename, names=True, skip_header=4) | ||
| columns = list(data.dtype.names) | ||
| weight_col = find_weight_column(columns) | ||
|
|
||
| xvar, yvar = var_names | ||
| x_idx = data[xvar + "_idx"].astype(int) | ||
| y_idx = data[yvar + "_idx"].astype(int) | ||
| nx = int(x_idx.max()) + 1 | ||
| ny = int(y_idx.max()) + 1 | ||
|
|
||
| weights = np.zeros((nx, ny)) | ||
| weights[x_idx, y_idx] = data[weight_col] | ||
|
|
||
| x_mins, x_maxs = extract_bin_edges(data, xvar, nx) | ||
| y_mins, y_maxs = extract_bin_edges(data, yvar, ny) | ||
| x_centers = bin_centers(x_mins, x_maxs, is_log[0]) | ||
| y_centers = bin_centers(y_mins, y_maxs, is_log[1]) | ||
|
|
||
| if args.radius_name == xvar: | ||
| radius_mins, radius_maxs = x_mins, x_maxs | ||
| radius_centers = x_centers | ||
| other_name = yvar | ||
| other_centers = y_centers | ||
| radius_axis = 0 | ||
| else: | ||
| radius_mins, radius_maxs = y_mins, y_maxs | ||
| radius_centers = y_centers | ||
| other_name = xvar | ||
| other_centers = x_centers | ||
| radius_axis = 1 | ||
|
|
||
| sum_w, mean_vals, median_vals = compute_profiles(weights, other_centers, radius_axis) |
There was a problem hiding this comment.
The entire block of code for reading the header, parsing data, extracting bin edges, and computing profiles is duplicated from the load_profile function call on line 462. This is a significant code duplication and inefficiency. The prof variable already holds all the necessary processed data from the initial load_profile(filename) call.
# All necessary data is already in 'prof' from the previous load_profile(filename) call.
# The following lines are redundant and can be removed.
# header = read_header(filename)
# var_names = header.get("variables", [])
# if len(var_names) != 2:
# raise ValueError(f"{filename}: expected 2D histogram, found variables={var_names}")
# if args.radius_name not in var_names:
# raise ValueError(f"{filename}: radius variable '{args.radius_name}' not found in {var_names}")
# is_log = [bool(v) for v in header.get("is_log_spaced", [0, 0])]
# data = np.genfromtxt(filename, names=True, skip_header=4)
# columns = list(data.dtype.names)
# weight_col = find_weight_column(columns)
# xvar, yvar = var_names
# x_idx = data[xvar + "_idx"].astype(int)
# y_idx = data[yvar + "_idx"].astype(int)
# nx = int(x_idx.max()) + 1
# ny = int(y_idx.max()) + 1
# weights = np.zeros((nx, ny))
# weights[x_idx, y_idx] = data[weight_col]
# x_mins, x_maxs = extract_bin_edges(data, xvar, nx)
# y_mins, y_maxs = extract_bin_edges(data, yvar, ny)
# x_centers = bin_centers(x_mins, x_maxs, is_log[0])
# y_centers = bin_centers(y_mins, y_maxs, is_log[1])
# if args.radius_name == xvar:
# radius_mins, radius_maxs = x_mins, x_maxs
# radius_centers = x_centers
# other_name = yvar
# other_centers = y_centers
# radius_axis = 0
# else:
# radius_mins, radius_maxs = y_mins, y_maxs
# radius_centers = y_centers
# other_name = xvar
# other_centers = x_centers
# radius_axis = 1
# sum_w, mean_vals, median_vals = compute_profiles(weights, other_centers, radius_axis)| def float_if_possible(element: str): | ||
| try: | ||
| return float(element) | ||
| except ValueError: | ||
| return element |
There was a problem hiding this comment.
The function name float_if_possible suggests it returns a boolean, but it actually returns a float or the original string. This can be misleading. Additionally, the type hint for element is str, but the original function signature had any. It should consistently be str if it's always expecting a string input.
Consider renaming the function to something like try_parse_float or convert_to_float_or_keep_string for better clarity.
| def float_if_possible(element: str): | |
| try: | |
| return float(element) | |
| except ValueError: | |
| return element | |
| def try_parse_float(element: str): | |
| try: | |
| return float(element) | |
| except ValueError: | |
| return element |
| def read_inputs_param(filename: str, key: str): | ||
| try: | ||
| with open(filename, "r", encoding="utf-8") as handle: | ||
| for line in handle: | ||
| stripped = line.split("#", 1)[0].strip() | ||
| if not stripped: | ||
| continue | ||
| if "=" not in stripped: | ||
| continue | ||
| lhs, rhs = stripped.split("=", 1) | ||
| if lhs.strip() == key: | ||
| return float(rhs.strip()) | ||
| except OSError: | ||
| return None | ||
| return None | ||
|
|
||
|
|
||
| def read_inputs_param_str(filename: str, key: str): | ||
| try: | ||
| with open(filename, "r", encoding="utf-8") as handle: | ||
| for line in handle: | ||
| stripped = line.split("#", 1)[0].strip() | ||
| if not stripped: | ||
| continue | ||
| if "=" not in stripped: | ||
| continue | ||
| lhs, rhs = stripped.split("=", 1) | ||
| if lhs.strip() == key: | ||
| return rhs.strip().strip('"').strip("'") | ||
| except OSError: | ||
| return None | ||
| return None |
There was a problem hiding this comment.
The read_inputs_param and read_inputs_param_str functions share a significant amount of duplicated logic for file reading and line parsing. This could be refactored into a single helper function that handles the common file I/O and line processing, and then specialized functions for type conversion (float or string). This would improve maintainability and reduce redundancy.
def _read_param_from_file(filename: str, key: str):
try:
with open(filename, "r", encoding="utf-8") as handle:
for line in handle:
stripped = line.split("#", 1)[0].strip()
if not stripped:
continue
if "=" not in stripped:
continue
lhs, rhs = stripped.split("=", 1)
if lhs.strip() == key:
return rhs.strip()
except OSError:
return None
return None
def read_inputs_param(filename: str, key: str):
value = _read_param_from_file(filename, key)
return float(value) if value is not None else None
def read_inputs_param_str(filename: str, key: str):
value = _read_param_from_file(filename, key)
return value.strip('"').strip("'') if value is not None else None| def disk_density_spherical_avg(radius_kpc, r_scale_kpc, z_scale_kpc, rho0_cgs): | ||
| kpc_cm = 1.0e3 * 3.085677581e18 | ||
| r_cm = np.asarray(radius_kpc) * kpc_cm | ||
| r_scale_cm = r_scale_kpc * kpc_cm | ||
| z_scale_cm = z_scale_kpc * kpc_cm | ||
|
|
||
| theta = np.linspace(0.0, math.pi, 512) | ||
| sin_t = np.sin(theta) | ||
| abs_cos_t = np.abs(np.cos(theta)) | ||
| r_cm_col = r_cm[:, None] | ||
| rho = rho0_cgs * np.exp(-(r_cm_col * sin_t) / r_scale_cm) * np.exp(-(r_cm_col * abs_cos_t) / z_scale_cm) | ||
|
|
||
| # Spherical average: (1/2) * ∫ rho(r,theta) sin(theta) dtheta | ||
| integrand = rho * sin_t | ||
| avg = 0.5 * np.trapz(integrand, theta, axis=1) | ||
| return avg | ||
|
|
||
|
|
||
| def disk_pressure_midplane(radius_kpc, r_scale_kpc, rho0_cgs, t_disk_k): | ||
| m_p = 1.67262192369e-24 | ||
| mu = 0.61 | ||
| rho_mid = rho0_cgs * np.exp(-np.asarray(radius_kpc) / r_scale_kpc) | ||
| return rho_mid * t_disk_k / (mu * m_p) |
There was a problem hiding this comment.
The physical constants kpc_cm, m_p, and mu are hardcoded within these functions and also duplicated later in the main function. It's good practice to define such constants at the module level (e.g., at the top of the file) to improve readability, make them easily discoverable, and ensure consistency across the script. This also makes it easier to modify them if precision requirements change or if they need to be configurable.
KPC_CM = 1.0e3 * 3.085677581e18
M_P = 1.67262192369e-24
MU = 0.61
def disk_density_spherical_avg(radius_kpc, r_scale_kpc, z_scale_kpc, rho0_cgs):
r_cm = np.asarray(radius_kpc) * KPC_CM
r_scale_cm = r_scale_kpc * KPC_CM
z_scale_cm = z_scale_kpc * KPC_CM
theta = np.linspace(0.0, math.pi, 512)
sin_t = np.sin(theta)
abs_cos_t = np.abs(np.cos(theta))
r_cm_col = r_cm[:, None]
rho = rho0_cgs * np.exp(-(r_cm_col * sin_t) / r_scale_cm) * np.exp(-(r_cm_col * abs_cos_t) / z_scale_cm)
# Spherical average: (1/2) * ∫ rho(r,theta) sin(theta) dtheta
integrand = rho * sin_t
avg = 0.5 * np.trapz(integrand, theta, axis=1)
return avg
def disk_pressure_midplane(radius_kpc, r_scale_kpc, rho0_cgs, t_disk_k):
rho_mid = rho0_cgs * np.exp(-np.asarray(radius_kpc) / r_scale_kpc)
return rho_mid * t_disk_k / (MU * M_P)| def compute_profiles(weights, other_centers, radius_axis: int): | ||
| if radius_axis == 0: | ||
| sum_w = weights.sum(axis=1) | ||
| mean_vals = np.full(weights.shape[0], np.nan) | ||
| median_vals = np.full(weights.shape[0], np.nan) | ||
| for i in range(weights.shape[0]): | ||
| w = weights[i, :] | ||
| total = sum_w[i] | ||
| if total > 0.0: | ||
| mean_vals[i] = np.sum(w * other_centers) / total | ||
| cdf = np.cumsum(w) | ||
| idx = int(np.searchsorted(cdf, 0.5 * total)) | ||
| if idx < len(other_centers): | ||
| median_vals[i] = other_centers[idx] | ||
| return sum_w, mean_vals, median_vals | ||
|
|
||
| sum_w = weights.sum(axis=0) | ||
| mean_vals = np.full(weights.shape[1], np.nan) | ||
| median_vals = np.full(weights.shape[1], np.nan) | ||
| for i in range(weights.shape[1]): | ||
| w = weights[:, i] | ||
| total = sum_w[i] | ||
| if total > 0.0: | ||
| mean_vals[i] = np.sum(w * other_centers) / total | ||
| cdf = np.cumsum(w) | ||
| idx = int(np.searchsorted(cdf, 0.5 * total)) | ||
| if idx < len(other_centers): | ||
| median_vals[i] = other_centers[idx] | ||
| return sum_w, mean_vals, median_vals |
There was a problem hiding this comment.
The logic for calculating mean_vals and median_vals is almost identical for radius_axis == 0 and radius_axis == 1. This duplication can be avoided by abstracting the common logic or by using a more generalized approach to array indexing. For example, you could transpose the weights array if radius_axis is 1, and then apply the same logic.
def compute_profiles(weights, other_centers, radius_axis: int):
if radius_axis == 1:
weights = weights.T # Transpose weights to unify logic
sum_w = weights.sum(axis=1)
mean_vals = np.full(weights.shape[0], np.nan)
median_vals = np.full(weights.shape[0], np.nan)
for i in range(weights.shape[0]):
w = weights[i, :]
total = sum_w[i]
if total > 0.0:
mean_vals[i] = np.sum(w * other_centers) / total
cdf = np.cumsum(w)
idx = int(np.searchsorted(cdf, 0.5 * total))
if idx < len(other_centers):
median_vals[i] = other_centers[idx]
return sum_w, mean_vals, median_vals| M_solar = 1.98847e33 | ||
| kpc_cm = 1.0e3 * 3.085677581e18 | ||
| rho0 = (disk_mass * M_solar) / (4.0 * math.pi * (disk_rscale * kpc_cm) ** 2 * (disk_zscale * kpc_cm)) |
There was a problem hiding this comment.
The constants M_solar and kpc_cm are hardcoded here. Similar to the constants in disk_density_spherical_avg, these should be defined at the module level to avoid repetition and improve maintainability.
rho0 = (disk_mass * M_SOLAR) / (4.0 * math.pi * (disk_rscale * KPC_CM) ** 2 * (disk_zscale * KPC_CM))| m_p = 1.67262192369e-24 | ||
| mu = 0.61 |
| if profile_data is not None: | ||
| m_p = 1.67262192369e-24 | ||
| mu = 0.61 | ||
| vals = profile_data["rho"] * profile_data["temp"] / (mu * m_p) |
There was a problem hiding this comment.
This block of code is duplicated within the plot_profile function. The logic for calculating vals based on other == "pressure" is repeated. This could be extracted into a helper function or handled more efficiently to avoid repetition.
if other == "pressure":
vals = profile_data["rho"] * profile_data["temp"] / (MU * M_P)
@BenWibking I'm not sure that this means exactly. It's not possible to get a 'radial profile' from a N-D histogram of Quantitys v.s. volume, because the volume does not have information about position. Can you clarify? |
@chongchonghe You can if one of the binning dimensions is radius. So you can then take the median or mean within a radial bin, and you get a radial profile accurate up to the resolution of the bins. |
chongchonghe
left a comment
There was a problem hiding this comment.
Gemini's comments about code duplication makes sense to me. Can you address them?
|
Particularly, since the script is ~600 lines of code, it's worth it to modularize it and avoid duplication. |
|
Yes, I'll clean this up tomorrow. |



Description
Adds a Python script to compute radial profiles of a given variable from 2D histograms (https://quokka-astro.github.io/quokka/insitu_analysis/#2d-slices) of radius and that variable.
Related issues
N/A
Checklist
Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an
xinside the square brackets[ ]in the Markdown source below:/azp run.