-
Notifications
You must be signed in to change notification settings - Fork 34
CDAT Migration FY24 ‐ General Guide
This is a general guide on how to refactor diagnostic sets. It will cover how to get started, how to refactor code (generally), and how to perform regression testing.
-
GitHub Project Tracker
- This project tracker is used to map out progress and milestones.
- Root Development Branch: cdat-migration-fy24
- This branch stores all of the developmental work for this task. We will merge this branch progressively into
mainwhen sets are refactored and pass regression testing.
- This branch stores all of the developmental work for this task. We will merge this branch progressively into
- Checkout the CDAT migration development branch
git checkout cdat-migration-fy24
- Create a branch stemming from
cdat-migration-fy24git checkout -b refactor/<ISSUE#>-<SET-NAME>- Example:
git checkout -b refactor/658-lat-lon-set
- Create the development conda environment for your branch
mamba env create -f conda-env/dev.yml -n e3sm_diags_dev_658
- Install the local development version of
e3sm_diagsfor your branchpython -m pip install .- This ensures supplementary files are installed (e.g.,
.cfgfiles). - WARNING, if you make any changes to supplementary files AND/OR run e3sm_diags via CLI, you must repeat this command for those changes to be reflected in your environment.
- Setup a test script(s) for the set you are refactoring
- Running a test script provides quicker feedback with your code.
- For example, I used
ex1.pyfor refactoring the lat_lon set. - Additionally, I combine test scripts with VS Code's debugging capabilities to step through the code for real-time feedback. This results in an even more efficient development experience, compared writing print/logger statements and waiting for a response.
- Create a draft pull request early using
cdat-migration-fy24as the base branch
There are three steps in this process: 1. Refactor CDAT logic, 2. Clean up and refactor some more, 3. Regression testing.
Objective: Refactor CDAT logic with Xarray/xCDAT and successfully produce metrics .json and .png files
- Find the description of the diagnostic set you will be refactoring here.
- The core components of a set consist of a driver, plotter, viewer, and some utilities.
- We'll figure out how to refactor the viewer at a later time in #628.
- (RECOMMENDED) Analyze the core components for any direct code or utility imports that use CDAT and make an outline
- (RECOMMENDED) Plan how you will refactor the CDAT portions
- If possible, write failing unit tests beforehand to cover edge cases (test-driven development)
- Start refactoring CDAT logic with Xarray/xCDAT
- Refer to the lat_lon set (PR #677) for some guidance
- Try to reuse as much code as possible. For example, general classes
dataset_xr.Datasetand utility functions inmetrics.py,io.py,regrid.py
- Run your test scripts to get feedback
- Read the stack trace, understand how new code is behaving, fix any issues
-
NOTE: Make sure to
python -m pip install .if running viapython <script_name>.pyto get the latest code changes in your environment. You don't need to do this if you're running with VS Code's Python interactive console and debugger because imports from the local package directory will take precedence.
- Repeat steps until the diagnostic set can produce metrics (
.json) and plots (.png)
Objective: Implement readable, maintainable, and testable code
- Refactor sub-optimal or hard to understand code
- Excessive for loops, repeated logic
- Break up large functions into smaller, maintainable functions
- Write/fix unit tests for refactored code, if possible
-
/tests/e3sm_diagsstores unit tests.pyfiles -
pytestcommand runs unit tests and generated code coverage report (tests_coverage_report/)
-
ALTERNATIVE: Write a TODO: ... statement
- Get back to refactoring at a later time
- If you are not confident in rewriting cleaner code for an implementation, skip it for now.
- Additional refactoring can be risky because there is minimal unit test coverage (easy to unknowingly introduce incorrect behaviors, side-effects, etc.)
(This section is a work-in-progress)
Objective: Regression testing is performed to ensure that a diagnostic set's metrics produced by your branch's refactored code is reasonably close to main
You will be using the auxiliary_tools/issue-658-compare-metrics.py script, which compares the .json of the dev branch against main. Overview of this script:
- Takes two paths for
.jsonfiles, one for the dev branch and the other formain. - Produces an Excel sheet listing the absolute and relative differences for the metrics of each variable
- Relative differences are more useful in most cases because it measures the SCALE of the difference in percentage term, rather than just a raw number.
- Absolute diff threshold 10^-e5 (0.00001)
- Relative diff threshold is 10^e-1 (0.01 -> 1%)
- Pinpoint the highest differences (if any) and try debugging why this is happening
- Are the metric functions producing the same outputs?
- Are derived variables correct?
- Follow this guide for getting started with VS Code with the recommended extensions. Also grab Remote SSH to use VS Code with remote machines via SSH.
- If you are using VS Code, open the
e3sm_diags.code-workspacefile which automatically configures VS Code for you. - Create your mamba development environment (should be done already if you followed "Getting Started")
- Configure VS Code with your mamba environment
VS Code offers the ability to debug by stepping through the code stack in real-time.
By installing the Python extension, you will automatically have access to this feature. I use this feature extensively to develop and debug e3sm_diags using test scripts.
