Skip to content

[Feature]: Improve on the performance of Metrics Function #1305

@zhangshixuan1987

Description

@zhangshixuan1987

Is your feature request related to a problem?

Hi Jiwoo,

During our work on the portrait plots for the E3SMv3 and CMIP6 mean climate metrics, a potential deficiency seems to be shown in the Metrics function built in PCMDI to process the metrics data :

from pcmdi_metrics.graphics import Metrics
from pcmdi_metrics.graphics import combine_ref_dicts, read_mean_clim_json_files
 
cmip_lib = Metrics(cmip_files)

Here, cmip_files is a list of JSON files that are passed to the Metrics object and internally read using read_mean_clim_json_files. The issue arises because this function uses the first file in the list as the base and merges the remaining files into it. This strategy causes problems when the first file lacks metrics for certain models.

For example, in on of our metrics datasets, the UKESM model does not have metrics data available for variables like ta-850 or ta-200. So, if a file like ta-200.cmip6.amip.regrid2.2p5x2p5.v20250702.json is the first element in cmip_files, then UKESM is completely excluded from the final cmip_lib, even if it’s present in other files.

In practical, the user can manually set the precipitation(pr) or surface temperature (ts) metrics file as the first item in the list when calling Metrics(cmip_files), which could potentially maximuize the number of models shown in the portrait plots or parallel coordinate plots, and can be good enough to provide the expected plots.

However, I believe this behavior indicates a limitation in the current merge strategy of read_mean_clim_json_files, for several aspects: a. there is no warning or message that tells the users that which is the "base" set to be merged on, and which group of models are obtained by Metrics data and b. not flexible enough for scenarios, say the comparision that excludes the ts and pr.

One solution is possibly to improve the Metrics data, and loop through each available .json metrics data file, and collect the maximum overlap of models that are shown in these files, and use them as the base, then merge all data accordingly.

I am reporting this here to see if they are useful information for the PMP to further improve its robustness in performance.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions