You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/paper.md
+51-48Lines changed: 51 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,57 +1,60 @@
1
-
---
2
-
title: 'Community Analysis Pipeline: A Python package for processing Mars climate model data'
3
-
tags:
4
-
- Python
5
-
- astronomy
6
-
- Mars global climate model
7
-
- data processing
8
-
- data visualization
9
-
authors:
10
-
- name: Alexandre M. Kling
1
+
---
2
+
title: 'Community Analysis Pipeline: A Python package for processing Mars climate model data'
3
+
tags:
4
+
- Python
5
+
- astronomy
6
+
- Mars global climate model
7
+
- data processing
8
+
- data visualization
9
+
authors:
10
+
- name: Alexandre M. Kling
11
11
orcid: 0000-0002-2980-7743
12
-
equal-contrib: true
13
-
affiliation: 1
14
-
corresponding: true
15
-
- name: Courtney M. L. Batterson
16
-
orcid: 0000-0001-5894-095X
17
-
equal-contrib: true
18
-
affiliation: 1
19
-
- name: Richard A. Urata
20
-
orcid: 0000-0001-8497-5718
21
-
equal-contrib: true
22
-
affiliation: 1
23
-
- name: Victoria L. Hartwick
24
-
orcid: 0000-0002-2082-8986
25
-
equal-contrib: true
26
-
affiliation: 3
27
-
- name: Melinda A. Kahre
12
+
equal-contrib: true
13
+
affiliation: 1
14
+
corresponding: true
15
+
- name: Courtney M. L. Batterson
16
+
orcid: 0000-0001-5894-095X
17
+
equal-contrib: true
18
+
affiliation: 1
19
+
- name: Richard A. Urata
20
+
orcid: 0000-0001-8497-5718
21
+
equal-contrib: true
22
+
affiliation: 1
23
+
- name: Victoria L. Hartwick
24
+
orcid: 0000-0002-2082-8986
25
+
equal-contrib: true
26
+
affiliation: 3
27
+
- name: Melinda A. Kahre
28
28
orcid: 0000-0002-0935-5532
29
-
equal-contrib: true
30
-
affiliation: 2
31
-
affiliations:
32
-
- name: Bay Area Environmental Research Institute, United States
33
-
index: 1
34
-
- name: NASA Ames Research Center, United States
35
-
index: 2
36
-
- name: Southwest Research Institute, United States
37
-
index: 3
38
-
date: 9 May 2025
29
+
equal-contrib: true
30
+
affiliation: 2
31
+
affiliations:
32
+
- name: Bay Area Environmental Research Institute, United States
33
+
index: 1
34
+
ror: 024tt5x58
35
+
- name: NASA Ames Research Center, United States
36
+
index: 2
37
+
ror: 02acart68
38
+
- name: Southwest Research Institute, United States
39
+
index: 3
40
+
ror: 03tghng59
41
+
date: 9 May 2025
39
42
bibliography: paper.bib
40
43
41
44
---
42
45
43
46
# Summary
44
47
45
-
The Community Analysis Pipeline (CAP) is a Python package designed to streamline and simplify the complex process of analyzing large datasets created by global climate models (GCMs). CAP consists of a suite of tools that manipulate NetCDF files in order to produce secondary datasets and figures useful for science and engineering applications. CAP also facilitates inter-model and model-observation comparisons, and it is the first software of its kind to standardize these comparisons. The goal is to enable users with varying levels of programming experience to work with complex data products from a variety of GCMs and thereby lower the barrier to entry for planetary science research.
48
+
The Community Analysis Pipeline (CAP) is a Python package designed to streamline and simplify the complex process of analyzing large datasets created by global climate models (GCMs). CAP consists of a suite of tools that manipulate NetCDF files in order to produce secondary datasets and figures useful for science and engineering applications. CAP also facilitates inter-model and model-observation comparisons, and it is the first software of its kind to standardize these comparisons. The goal is to enable users with varying levels of programming experience to work with complex data products from a variety of GCMs and thereby lower the barrier to entry for planetary science research.
46
49
47
50
# Statement of need
48
51
49
-
GCMs perform numerical simulations that describe the evolution of climate systems on planetary bodies. GCMs simulate physical processes within the atmosphere (and, if applicable, within the surface of the planet, ocean, and any interactions therein), calculate radiative transfer within those mediums, and use a computational fluid dynamics (CFD) solver (the “dynamical core”) to predict the transport of heat and momentum within the atmosphere. Typical GCM products include surface and atmospheric variables such as wind, temperature, and aerosol concentrations. While GCMs have been applied to planetary bodies in our Solar System (e.g. Earth, Venus, Pluto) and in other stellar systems (e.g. [@Hartwick:2023]), CAP is currently compatible with Mars GCMs (MGCMs). Several MGCMs are actively in use and under development in the Mars community, including the NASA Ames MGCM (Legacy and FV3-based versions), NASA Goddard ROCKE-3D, the Laboratoire de Météorologie Dynamique (LMD) Mars Planetary Climate Model (PCM), the Open University OpenMars, NCAR MarsWRF, NCAR MarsCAM, GFDL Mars GCM, Harvard DRAMATIC Mars GCM, Max Planck Institute Mars GCM, and GEM-Mars. Of these, CAP is compatible with four models so far: the NASA Ames MGCM, PCM, OpenMars, and MarsWRF.
52
+
GCMs perform numerical simulations that describe the evolution of climate systems on planetary bodies. GCMs simulate physical processes within the atmosphere (and, if applicable, within the surface of the planet, ocean, and any interactions therein), calculate radiative transfer within those mediums, and use a computational fluid dynamics (CFD) solver (the “dynamical core”) to predict the transport of heat and momentum within the atmosphere. Typical GCM products include surface and atmospheric variables such as wind, temperature, and aerosol concentrations. While GCMs have been applied to planetary bodies in our Solar System (e.g. Earth, Venus, Pluto) and in other stellar systems (e.g. [@Hartwick:2023]), CAP is currently compatible with Mars GCMs (MGCMs). Several MGCMs are actively in use and under development in the Mars community, including the NASA Ames MGCM (Legacy and FV3-based versions), NASA Goddard ROCKE-3D, the Laboratoire de Météorologie Dynamique (LMD) Mars Planetary Climate Model (PCM), the Open University OpenMars, NCAR MarsWRF, NCAR MarsCAM, GFDL Mars GCM, Harvard DRAMATIC Mars GCM, Max Planck Institute Mars GCM, and GEM-Mars. Of these, CAP is compatible with four models so far: the NASA Ames MGCM, PCM, OpenMars, and MarsWRF.
50
53
51
54
MGCM output is complex in both size and structure. Analyzing the output requires GCM-specific domain knowledge. We identify the following major challenges for working with MGCM output:
52
55
53
-
\* Files tend to be fairly complex in structure, with output fields represented by multiple variables (e.g. air vs surface temperature), varying units (e.g. Kelvin), complex dimensional structures (e.g. 2–5 dimensions), and a variety of sampling frequencies (e.g. temporally averaged or instantaneous) on different horizontal and vertical grids.
54
-
\* File sizes typically range from \~10 Gb–10 Tb for simulations describing the Martian climate over a full orbit around the Sun (depending on the number of atmospheric fields being analyzed, time sampling, and the horizontal and vertical resolutions of the run). Large files require curated processing pipelines in order to manage memory storage. This can be particularly challenging for users that do not have access to academic or enterprise clusters or supercomputers for their analyses.
56
+
\* Files tend to be fairly complex in structure, with output fields represented by multiple variables (e.g. air vs surface temperature), varying units (e.g. Kelvin), complex dimensional structures (e.g. 2–5 dimensions), and a variety of sampling frequencies (e.g. temporally averaged or instantaneous) on different horizontal and vertical grids.
57
+
\* File sizes typically range from \~10 Gb–10 Tb for simulations describing the Martian climate over a full orbit around the Sun (depending on the number of atmospheric fields being analyzed, time sampling, and the horizontal and vertical resolutions of the run). Large files require curated processing pipelines in order to manage memory storage. This can be particularly challenging for users that do not have access to academic or enterprise clusters or supercomputers for their analyses.
55
58
\* Domain-specific knowledge is required to derive secondary variables, manipulate complex data structures, and visualize results. Working with MGCM data is especially difficult for users unfamiliar with the fields commonly output by MGCMs or the mathematical methods used in climate science.
56
59
57
60
CAP offers a streamlined workflow for processing and analyzing MGCM data products by providing a set of libraries and executables that facilitate file manipulation and data visualization from the command-line. This benefits existing modelers by automating both routine and sophisticated post-processing tasks. It also expands access to MGCM products by removing some of the technical roadblocks associated with processing these complex data products.
@@ -62,43 +65,43 @@ CAP has been used in multiple research projects that have been published and/or
62
65
63
66
# Functionality
64
67
65
-
CAP consists of six command-line executables that can be used sequentially or individually to derive secondary data products, thus offering a high level of flexibility. A configuration text file is provided so that users can define the input file structure (e.g., variable names, longitudinal structure, and interpolation levels) and preferred plotting style (e.g., time axis units) for their analysis. The six executables in CAP are described below:
68
+
CAP consists of six command-line executables that can be used sequentially or individually to derive secondary data products, thus offering a high level of flexibility. A configuration text file is provided so that users can define the input file structure (e.g., variable names, longitudinal structure, and interpolation levels) and preferred plotting style (e.g., time axis units) for their analysis. The six executables in CAP are described below:
66
69
67
70
## MarsPull
68
71
69
-
MarsPull is a data pipeline utility for downloading MGCM data products from the NAS Data Portal ([https://data.nas.nasa.gov/](https://data.nas.nasa.gov/)). Recognizing that each member within the science and engineering community has their own requirements for hosting proprietary Mars climate datasets (e.g. institutional servers, Zenodo, GitHub, etc.), MarsPull is intended to be a mechanism for interfacing those datasets. MarsPull enables users to query data meeting a specific criteria, such as a date range (e.g., solar longitude), which allows users to parse repositories first and download only the necessary data, thus avoiding downloading entire repositories which can be large (\>\>15Gb). A typical application of MarsPull is:
72
+
MarsPull is a data pipeline utility for downloading MGCM data products from the NAS Data Portal ([https://data.nas.nasa.gov/](https://data.nas.nasa.gov/)). Recognizing that each member within the science and engineering community has their own requirements for hosting proprietary Mars climate datasets (e.g. institutional servers, Zenodo, GitHub, etc.), MarsPull is intended to be a mechanism for interfacing those datasets. MarsPull enables users to query data meeting a specific criteria, such as a date range (e.g., solar longitude), which allows users to parse repositories first and download only the necessary data, thus avoiding downloading entire repositories which can be large (\>\>15Gb). A typical application of MarsPull is:
MarsFormat is a utility for converting non-NASA Ames MGCM products into NASA Ames-like MGCM products for compatibility with CAP. MarsFormat reorders dimensions, adds standardized coordinates that are expected by other executables for various computations (e.g., pressure interpolation), converts variable units to conform to the International System of Units (e.g., Pa for pressure), and reorganizes coordinate values as needed (e.g., reversing the vertical pressure array for plotting). Additional, model-specific operations are performed as necessary. For example, MarsWRF data requires un-staggering latitude-longitude grids and calculating absolute fields from perturbation fields. A typical application of MarsFormat is:
76
79
77
-
`> MarsFormat MGCM_file.nc -gcm model_name`
80
+
`MarsFormat MGCM_file.nc -gcm model_name`
78
81
79
82
## MarsFiles
80
83
81
84
MarsFiles provides several tools for file manipulation such as file size reduction, temporal and spatial filtering, and splitting or concatenating data along specified dimensions. Operations performed by MarsFiles are applied to entire NetCDF files producing new data structures with amended file names. A typical application of MarsFiles is:
82
85
83
-
`> MarsFiles MGCM_file.nc -flags`
86
+
`MarsFiles MGCM_file.nc -flags`
84
87
85
88
## MarsVars
86
89
87
90
MarsVars performs variable operations such as adding, removing, and editing variables and computing column integrations. It is standard practice within the modeling community to avoid outputting variables that can be derived outside of the MGCM in order to minimize file size. For example, atmospheric density (rho) is easily derived from temperature and pressure and therefore typically not included in output files. MarsVars derives rho from temperature and pressure and adds it to the file with a single command line argument. A typical application of MarsVars is:
88
91
89
-
`> MarsVars MGCM_file.nc –add rho`
92
+
`MarsVars MGCM_file.nc –add rho`
90
93
91
94
## MarsInterp
92
95
93
96
MarsInterp interpolates the vertical coordinate to a standard grid: pressure, altitude, or altitude above ground level. Vertical grids vary considerably from model to model. Most MGCMs use a pressure or hybrid pressure vertical coordinate (e.g. terrain-following, pure pressure levels, or sigma levels) in which the geometric heights and mid-layer pressures of the atmospheric layers vary in latitude and longitude. It is therefore necessary to interpolate to a standard vertical grid in order to do any rigorous spatial averaging or inter-model or observation-to-model comparisons. A typical application of MarsInterp is:
94
97
95
-
`> MarsInterp MGCM_file.nc -t pstd`
98
+
`MarsInterp MGCM_file.nc -t pstd`
96
99
97
100
## MarsPlot
98
101
99
102
MarsPlot is the plotting utility for CAP. It accepts a modifiable text template containing a list of plots to generate (Custom.in) as input and outputs graphics to PDF or PNG. It supports multiple types of 1-D or 2-D plots, color schemes, map projections, and can customize axes range, plot titles, or contour intervals. It also supports some simple math functions to derive secondary fields not supported by MarsVars. A typical application of MarsPlot is:
0 commit comments