Skip to content

add script to compute radial profiles#1597

Open
BenWibking wants to merge 1 commit intodevelopmentfrom
disk-prs/radial-profile-script
Open

add script to compute radial profiles#1597
BenWibking wants to merge 1 commit intodevelopmentfrom
disk-prs/radial-profile-script

Conversation

@BenWibking
Copy link
Collaborator

@BenWibking BenWibking commented Jan 20, 2026

Description

Adds a Python script to compute radial profiles of a given variable from 2D histograms (https://quokka-astro.github.io/quokka/insitu_analysis/#2d-slices) of radius and that variable.

Related issues

N/A

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

  • I have added a description (see above).
  • I have added a link to any related issues (if applicable; see above).
  • I have read the Contributing Guide.
  • I have added tests for any new physics that this PR adds to the code.
  • (For quokka-astro org members) I have manually triggered the GPU tests with the magic comment /azp run.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 20, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @BenWibking, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Python utility script that streamlines the analysis of simulation data by computing radial profiles from 2D histograms. It provides a robust method for extracting mean and median values of a variable as a function of radius, offering both tabular data and visual representations. The script also enhances analytical capabilities by allowing comparison with theoretical disk models and external datasets, making it a valuable tool for astrophysical data interpretation.

Highlights

  • New Radial Profile Script: Introduces radial_profile_from_pdf.py, a new Python script designed to compute radial mean and median profiles from 2D histogram data.
  • 2D Histogram Processing: The script processes 2D histogram data (DiagPDF outputs) by marginalizing the probability distribution function to derive 1D radial profiles.
  • Output Generation: It generates both CSV files containing the computed profile data and corresponding PNG plots for visualization.
  • Analytical Model Comparison: The script can optionally load external profile data and compute analytical disk density and pressure profiles for direct comparison with the derived simulation data.
  • Performance Optimization: Includes optional Numba JIT compilation for performance-critical interpolation functions, if Numba is available.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the enhancement New feature or request label Jan 20, 2026
@sonarqubecloud
Copy link

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a Python script to compute radial profiles from 2D histograms. The script includes functionality for reading data, performing interpolations, computing profiles (mean/median), and plotting. Overall, the script is functional and addresses the stated objective. However, there are several areas for improvement related to code duplication, constant management, and efficiency that would enhance maintainability and performance.

Comment on lines +463 to +503

header = read_header(filename)
var_names = header.get("variables", [])
if len(var_names) != 2:
raise ValueError(f"{filename}: expected 2D histogram, found variables={var_names}")
if args.radius_name not in var_names:
raise ValueError(f"{filename}: radius variable '{args.radius_name}' not found in {var_names}")

is_log = [bool(v) for v in header.get("is_log_spaced", [0, 0])]
data = np.genfromtxt(filename, names=True, skip_header=4)
columns = list(data.dtype.names)
weight_col = find_weight_column(columns)

xvar, yvar = var_names
x_idx = data[xvar + "_idx"].astype(int)
y_idx = data[yvar + "_idx"].astype(int)
nx = int(x_idx.max()) + 1
ny = int(y_idx.max()) + 1

weights = np.zeros((nx, ny))
weights[x_idx, y_idx] = data[weight_col]

x_mins, x_maxs = extract_bin_edges(data, xvar, nx)
y_mins, y_maxs = extract_bin_edges(data, yvar, ny)
x_centers = bin_centers(x_mins, x_maxs, is_log[0])
y_centers = bin_centers(y_mins, y_maxs, is_log[1])

if args.radius_name == xvar:
radius_mins, radius_maxs = x_mins, x_maxs
radius_centers = x_centers
other_name = yvar
other_centers = y_centers
radius_axis = 0
else:
radius_mins, radius_maxs = y_mins, y_maxs
radius_centers = y_centers
other_name = xvar
other_centers = x_centers
radius_axis = 1

sum_w, mean_vals, median_vals = compute_profiles(weights, other_centers, radius_axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The entire block of code for reading the header, parsing data, extracting bin edges, and computing profiles is duplicated from the load_profile function call on line 462. This is a significant code duplication and inefficiency. The prof variable already holds all the necessary processed data from the initial load_profile(filename) call.

        # All necessary data is already in 'prof' from the previous load_profile(filename) call.
        # The following lines are redundant and can be removed.
        # header = read_header(filename)
        # var_names = header.get("variables", [])
        # if len(var_names) != 2:
        #     raise ValueError(f"{filename}: expected 2D histogram, found variables={var_names}")
        # if args.radius_name not in var_names:
        #     raise ValueError(f"{filename}: radius variable '{args.radius_name}' not found in {var_names}")

        # is_log = [bool(v) for v in header.get("is_log_spaced", [0, 0])]
        # data = np.genfromtxt(filename, names=True, skip_header=4)
        # columns = list(data.dtype.names)
        # weight_col = find_weight_column(columns)

        # xvar, yvar = var_names
        # x_idx = data[xvar + "_idx"].astype(int)
        # y_idx = data[yvar + "_idx"].astype(int)
        # nx = int(x_idx.max()) + 1
        # ny = int(y_idx.max()) + 1

        # weights = np.zeros((nx, ny))
        # weights[x_idx, y_idx] = data[weight_col]

        # x_mins, x_maxs = extract_bin_edges(data, xvar, nx)
        # y_mins, y_maxs = extract_bin_edges(data, yvar, ny)
        # x_centers = bin_centers(x_mins, x_maxs, is_log[0])
        # y_centers = bin_centers(y_mins, y_maxs, is_log[1])

        # if args.radius_name == xvar:
        #     radius_mins, radius_maxs = x_mins, x_maxs
        #     radius_centers = x_centers
        #     other_name = yvar
        #     other_centers = y_centers
        #     radius_axis = 0
        # else:
        #     radius_mins, radius_maxs = y_mins, y_maxs
        #     radius_centers = y_centers
        #     other_name = xvar
        #     other_centers = x_centers
        #     radius_axis = 1

        # sum_w, mean_vals, median_vals = compute_profiles(weights, other_centers, radius_axis)

Comment on lines +19 to +23
def float_if_possible(element: str):
try:
return float(element)
except ValueError:
return element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function name float_if_possible suggests it returns a boolean, but it actually returns a float or the original string. This can be misleading. Additionally, the type hint for element is str, but the original function signature had any. It should consistently be str if it's always expecting a string input.

Consider renaming the function to something like try_parse_float or convert_to_float_or_keep_string for better clarity.

Suggested change
def float_if_possible(element: str):
try:
return float(element)
except ValueError:
return element
def try_parse_float(element: str):
try:
return float(element)
except ValueError:
return element

Comment on lines +38 to +69
def read_inputs_param(filename: str, key: str):
try:
with open(filename, "r", encoding="utf-8") as handle:
for line in handle:
stripped = line.split("#", 1)[0].strip()
if not stripped:
continue
if "=" not in stripped:
continue
lhs, rhs = stripped.split("=", 1)
if lhs.strip() == key:
return float(rhs.strip())
except OSError:
return None
return None


def read_inputs_param_str(filename: str, key: str):
try:
with open(filename, "r", encoding="utf-8") as handle:
for line in handle:
stripped = line.split("#", 1)[0].strip()
if not stripped:
continue
if "=" not in stripped:
continue
lhs, rhs = stripped.split("=", 1)
if lhs.strip() == key:
return rhs.strip().strip('"').strip("'")
except OSError:
return None
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The read_inputs_param and read_inputs_param_str functions share a significant amount of duplicated logic for file reading and line parsing. This could be refactored into a single helper function that handles the common file I/O and line processing, and then specialized functions for type conversion (float or string). This would improve maintainability and reduce redundancy.

def _read_param_from_file(filename: str, key: str):
    try:
        with open(filename, "r", encoding="utf-8") as handle:
            for line in handle:
                stripped = line.split("#", 1)[0].strip()
                if not stripped:
                    continue
                if "=" not in stripped:
                    continue
                lhs, rhs = stripped.split("=", 1)
                if lhs.strip() == key:
                    return rhs.strip()
    except OSError:
        return None
    return None

def read_inputs_param(filename: str, key: str):
    value = _read_param_from_file(filename, key)
    return float(value) if value is not None else None

def read_inputs_param_str(filename: str, key: str):
    value = _read_param_from_file(filename, key)
    return value.strip('"').strip("'') if value is not None else None

Comment on lines +148 to +170
def disk_density_spherical_avg(radius_kpc, r_scale_kpc, z_scale_kpc, rho0_cgs):
kpc_cm = 1.0e3 * 3.085677581e18
r_cm = np.asarray(radius_kpc) * kpc_cm
r_scale_cm = r_scale_kpc * kpc_cm
z_scale_cm = z_scale_kpc * kpc_cm

theta = np.linspace(0.0, math.pi, 512)
sin_t = np.sin(theta)
abs_cos_t = np.abs(np.cos(theta))
r_cm_col = r_cm[:, None]
rho = rho0_cgs * np.exp(-(r_cm_col * sin_t) / r_scale_cm) * np.exp(-(r_cm_col * abs_cos_t) / z_scale_cm)

# Spherical average: (1/2) * ∫ rho(r,theta) sin(theta) dtheta
integrand = rho * sin_t
avg = 0.5 * np.trapz(integrand, theta, axis=1)
return avg


def disk_pressure_midplane(radius_kpc, r_scale_kpc, rho0_cgs, t_disk_k):
m_p = 1.67262192369e-24
mu = 0.61
rho_mid = rho0_cgs * np.exp(-np.asarray(radius_kpc) / r_scale_kpc)
return rho_mid * t_disk_k / (mu * m_p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The physical constants kpc_cm, m_p, and mu are hardcoded within these functions and also duplicated later in the main function. It's good practice to define such constants at the module level (e.g., at the top of the file) to improve readability, make them easily discoverable, and ensure consistency across the script. This also makes it easier to modify them if precision requirements change or if they need to be configurable.

KPC_CM = 1.0e3 * 3.085677581e18
M_P = 1.67262192369e-24
MU = 0.61

def disk_density_spherical_avg(radius_kpc, r_scale_kpc, z_scale_kpc, rho0_cgs):
    r_cm = np.asarray(radius_kpc) * KPC_CM
    r_scale_cm = r_scale_kpc * KPC_CM
    z_scale_cm = z_scale_kpc * KPC_CM

    theta = np.linspace(0.0, math.pi, 512)
    sin_t = np.sin(theta)
    abs_cos_t = np.abs(np.cos(theta))
    r_cm_col = r_cm[:, None]
    rho = rho0_cgs * np.exp(-(r_cm_col * sin_t) / r_scale_cm) * np.exp(-(r_cm_col * abs_cos_t) / z_scale_cm)

    # Spherical average: (1/2) * ∫ rho(r,theta) sin(theta) dtheta
    integrand = rho * sin_t
    avg = 0.5 * np.trapz(integrand, theta, axis=1)
    return avg


def disk_pressure_midplane(radius_kpc, r_scale_kpc, rho0_cgs, t_disk_k):
    rho_mid = rho0_cgs * np.exp(-np.asarray(radius_kpc) / r_scale_kpc)
    return rho_mid * t_disk_k / (MU * M_P)

Comment on lines +200 to +228
def compute_profiles(weights, other_centers, radius_axis: int):
if radius_axis == 0:
sum_w = weights.sum(axis=1)
mean_vals = np.full(weights.shape[0], np.nan)
median_vals = np.full(weights.shape[0], np.nan)
for i in range(weights.shape[0]):
w = weights[i, :]
total = sum_w[i]
if total > 0.0:
mean_vals[i] = np.sum(w * other_centers) / total
cdf = np.cumsum(w)
idx = int(np.searchsorted(cdf, 0.5 * total))
if idx < len(other_centers):
median_vals[i] = other_centers[idx]
return sum_w, mean_vals, median_vals

sum_w = weights.sum(axis=0)
mean_vals = np.full(weights.shape[1], np.nan)
median_vals = np.full(weights.shape[1], np.nan)
for i in range(weights.shape[1]):
w = weights[:, i]
total = sum_w[i]
if total > 0.0:
mean_vals[i] = np.sum(w * other_centers) / total
cdf = np.cumsum(w)
idx = int(np.searchsorted(cdf, 0.5 * total))
if idx < len(other_centers):
median_vals[i] = other_centers[idx]
return sum_w, mean_vals, median_vals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for calculating mean_vals and median_vals is almost identical for radius_axis == 0 and radius_axis == 1. This duplication can be avoided by abstracting the common logic or by using a more generalized approach to array indexing. For example, you could transpose the weights array if radius_axis is 1, and then apply the same logic.

def compute_profiles(weights, other_centers, radius_axis: int):
    if radius_axis == 1:
        weights = weights.T # Transpose weights to unify logic

    sum_w = weights.sum(axis=1)
    mean_vals = np.full(weights.shape[0], np.nan)
    median_vals = np.full(weights.shape[0], np.nan)

    for i in range(weights.shape[0]):
        w = weights[i, :]
        total = sum_w[i]
        if total > 0.0:
            mean_vals[i] = np.sum(w * other_centers) / total
            cdf = np.cumsum(w)
            idx = int(np.searchsorted(cdf, 0.5 * total))
            if idx < len(other_centers):
                median_vals[i] = other_centers[idx]
    return sum_w, mean_vals, median_vals

Comment on lines +306 to +308
M_solar = 1.98847e33
kpc_cm = 1.0e3 * 3.085677581e18
rho0 = (disk_mass * M_solar) / (4.0 * math.pi * (disk_rscale * kpc_cm) ** 2 * (disk_zscale * kpc_cm))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constants M_solar and kpc_cm are hardcoded here. Similar to the constants in disk_density_spherical_avg, these should be defined at the module level to avoid repetition and improve maintainability.

        rho0 = (disk_mass * M_SOLAR) / (4.0 * math.pi * (disk_rscale * KPC_CM) ** 2 * (disk_zscale * KPC_CM))

Comment on lines +398 to +399
m_p = 1.67262192369e-24
mu = 0.61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constants m_p and mu are hardcoded here. Similar to other constants, these should be defined at the module level to avoid repetition and improve maintainability.

                pressure = profile_data["rho"] * profile_data["temp"] / (MU * M_P)

Comment on lines +413 to +416
if profile_data is not None:
m_p = 1.67262192369e-24
mu = 0.61
vals = profile_data["rho"] * profile_data["temp"] / (mu * m_p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code is duplicated within the plot_profile function. The logic for calculating vals based on other == "pressure" is repeated. This could be extracted into a helper function or handled more efficiently to avoid repetition.

                if other == "pressure":
                    vals = profile_data["rho"] * profile_data["temp"] / (MU * M_P)

@chongchonghe
Copy link
Contributor

Adds a Python script to compute radial profiles of a given variable from 2D histograms (https://quokka-astro.github.io/quokka/insitu_analysis/#2d-slices) of radius and that variable.

@BenWibking I'm not sure that this means exactly. It's not possible to get a 'radial profile' from a N-D histogram of Quantitys v.s. volume, because the volume does not have information about position. Can you clarify?

@BenWibking
Copy link
Collaborator Author

BenWibking commented Jan 21, 2026

Adds a Python script to compute radial profiles of a given variable from 2D histograms (https://quokka-astro.github.io/quokka/insitu_analysis/#2d-slices) of radius and that variable.

@BenWibking I'm not sure that this means exactly. It's not possible to get a 'radial profile' from a N-D histogram of Quantitys v.s. volume, because the volume does not have information about position. Can you clarify?

@chongchonghe You can if one of the binning dimensions is radius. So you can then take the median or mean within a radial bin, and you get a radial profile accurate up to the resolution of the bins.

Copy link
Contributor

@chongchonghe chongchonghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini's comments about code duplication makes sense to me. Can you address them?

@chongchonghe
Copy link
Contributor

Particularly, since the script is ~600 lines of code, it's worth it to modularize it and avoid duplication.

@BenWibking
Copy link
Collaborator Author

Yes, I'll clean this up tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants