Skip to content

Conversation

@chengzhuzhang
Copy link
Contributor

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@chengzhuzhang
Copy link
Contributor Author

I may discovered a bug in xcdat that we need to fix for merging this PR.

@chengzhuzhang
Copy link
Contributor Author

New issues when adding U_at_850hPa which has missing values. Error shown below:

2025-05-20 12:48:41,027 [INFO]: dataset_xr.py(_get_dataset_with_derived_ts_var:877) >> Deriving the time series variable using the source variables: ('U_at_850hPa',)
2025-05-20 12:48:45,985 [INFO]: zwf_functions.py(spacetime_power:453) >> Data reduced by latitude bounds. Size is Frozen({'time': 2161, 'lat': 30, 'lon': 360})
2025-05-20 12:48:46,131 [ERROR]: core_parameter.py(_run_diag:353) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/e3sm_diags/parameter/core_parameter.py", line 350, in _run_diag
    single_result = module.run_diag(self)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 50, in run_diag
    test_spectrum, test_start, test_end = calculate_spectrum(test_ds, variable)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 159, in calculate_spectrum
    spec_all = wf_analysis(var, **opt)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 210, in wf_analysis
    z2 = wf.spacetime_power(x, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/e3sm_diags/driver/utils/zwf_functions.py", line 465, in spacetime_power
    xdetr = detrend(data.values, axis=0, type="linear")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/scipy/signal/_signaltools.py", line 3628, in detrend
    coef, resids, rank, s = linalg.lstsq(A, newdata[sl])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/scipy/linalg/_basic.py", line 1253, in lstsq
    b1 = _asarray_validated(b, check_finite=check_finite)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/scipy/_lib/_util.py", line 321, in _asarray_validated
    a = toarray(a)
        ^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py", line 649, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

@chengzhuzhang
Copy link
Contributor Author

@jjbenedict hey Jim, in this PR, I'm working on have e3sm_diags to understand eamxx for tropical variability diagnostics. It works for precipitation and outgoing longwave so far, results can be found here. However the U850 output from eamxx has missing values (I think the EAM version output as extrapolation done at some point, thus no missing), and the wf_analysis function doesn't support data with missing values. See comment #982 (comment). At this point, I'm trying to come up with a way to treat the missing. Could you advise on this? Thanks!

@jjbenedict
Copy link
Collaborator

@chengzhuzhang Yes, the Fourier analyses require that all input fields must have no missing data. A simple and fast solution would be to linearly interpolate in U850 longitude to fill in the missing data "gaps". A more elegant solution might be to use something like a Poisson grid fill -- here's one version from NCL:

https://www.ncl.ucar.edu/Document/Functions/Built-in/poisson_grid_fill.shtml

I've used this NCL method previously, and it works well. Python must have something similar, though I haven't needed to use it yet.

But to be honest, if the Fourier analysis is being applied to a smoothly varying field like U850 for daily averaged data, and we're plotting zonal wavenumbers -15 to +15, using the simple linear gap-fill would probably be sufficient.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented May 23, 2025

Hey Jim, thanks for your comment! Yes, agreed. I now realize if we only target 15N-15S, a simple interpolation method should just work fine..I will add a interpolation routine to work around missing data. Thanks!

@chengzhuzhang chengzhuzhang marked this pull request as ready for review May 23, 2025 23:00
@chengzhuzhang chengzhuzhang requested a review from jjbenedict May 23, 2025 23:03
Comment on lines +210 to +216
# Interpolate missing values along longitude before spectral analysis
if np.any(np.isnan(x)):
logger.info(
"Interpolating missing values along longitude before spectral analysis"
)
x = x.interpolate_na(dim="lon", method="linear", fill_value="extrapolate")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple linear interpolation to fill in missing for U850

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this linear interpolation unintentionally affect other variables, or do we want other variables with missing values to be handled too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I think IMERG precipitation data also has some missing values. Though this gap-filing method seems to be low risk to affect results. I will keep an eye on image tests for potential impact.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented May 23, 2025

@jjbenedict Now works for 3 variables (results see here)
@tomvothecoder this is mostly an FYI, let me know if you notice anything bad code-wise. Thank you!
Note to myself, I still need to fix the daily IMERG data with some bad time values in February, so that xcdat won't triple on it. xCDAT/xcdat#762

@tomvothecoder tomvothecoder requested a review from Copilot May 26, 2025 20:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Integrate EAMxx-specific NE256 workflow into tropical_subseasonal diagnostics by adding example scripts, refactoring the driver to use xarray datasets, and extending variable derivations.

  • Add a Bash script to generate climatology, diurnal cycle, and time-series inputs for EAMxx NE256.
  • Introduce a Python driver to run core, diurnal, and tropical_subseasonal diagnostics for EAMxx output.
  • Refactor tropical_subseasonal_driver to accept xarray datasets and update the spectrum calculation interface.
  • Extend the derivations mapping to support the U850 alias for 850 hPa zonal wind.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
examples/e3sm_diags_for_eamxx/test_eamxx_ne256.sh New Bash workflow for generating EAMxx NE256 climatology and time-series inputs
examples/e3sm_diags_for_eamxx/run_e3sm_diags_eamxx256_climo.py Driver script to run EAMxx-specific diagnostics including tropical_subseasonal
e3sm_diags/driver/tropical_subseasonal_driver.py Refactor to use dataset objects and update calculate_spectrum signature
e3sm_diags/derivations/derivations.py Add a derivation mapping for U850 alias
Comments suppressed due to low confidence (3)

examples/e3sm_diags_for_eamxx/test_eamxx_ne256.sh:17

  • [nitpick] The variable name drc is ambiguous. Consider renaming it to something more descriptive like native_climo_dir.
drc=${drc_out}/native/climo

e3sm_diags/driver/tropical_subseasonal_driver.py:55

  • The variable season is not defined in this scope. You need to loop over seasons or reference parameter.seasons.
parameter.test_name_yrs = test_data.get_name_yrs_attr(season)

e3sm_diags/driver/tropical_subseasonal_driver.py:58

  • The variable run_type is undefined here. Consider passing it in or using a parameter attribute for this condition.
if run_type == "model_vs_model":

@tomvothecoder tomvothecoder force-pushed the feature/tropical-subseasonal-eamxx-support branch from 4644db3 to 4fb0a3f Compare May 26, 2025 20:09
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chengzhuzhang, I had one question about the interpolation. Other than that, looks good to me. Thanks.

parameter.test_end_yr,
)
# Get test dataset
test_ds = test_data.get_time_series_dataset(variable, single_point=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this line replaces the I/O that was being performed in calculate_spectrum to get the variable. This looks much cleaner now

parameter.ref_start_yr,
parameter.ref_end_yr,
)
ref_ds = ref_data.get_time_series_dataset(variable, single_point=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this line too.

Comment on lines +150 to +156

# Extract variable data from dataset and subset to tropical latitudes
var = ds[variable].sel(lat=slice(-15, 15))

# Get actual time range from the data
actual_start = var.time.dt.year.values[0]
actual_end = var.time.dt.year.values[-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_spectrum looks a lot simpler than before!

Comment on lines +210 to +216
# Interpolate missing values along longitude before spectral analysis
if np.any(np.isnan(x)):
logger.info(
"Interpolating missing values along longitude before spectral analysis"
)
x = x.interpolate_na(dim="lon", method="linear", fill_value="extrapolate")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this linear interpolation unintentionally affect other variables, or do we want other variables with missing values to be handled too?

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder thanks for the review!
I have updated the IMERG daily data to fix the 2002 feb time values (documented xCDAT/xcdat#760). It is now fixed in /lcrc/group/e3sm/public_html/diagnostics/observations/Atm/time-series/IMERG_Daily.

@chengzhuzhang chengzhuzhang merged commit 4364561 into main May 27, 2025
6 checks passed
@chengzhuzhang chengzhuzhang deleted the feature/tropical-subseasonal-eamxx-support branch May 27, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement]: Make tropical_subseasonal set support EAMxx high frequency output

4 participants