Add `--on-var-failure` flag to handle failure behaviors #323

tomvothecoder · 2025-10-09T17:23:06Z

Motivation

Previously, e3sm_to_cmip always returned an exit code 0 even if multiple handlers failed.
This made it difficult for calling workflows or CI/CD pipelines to detect and respond to errors automatically.

This enhancement introduces explicit, user-controlled failure semantics while keeping the default behavior unchanged for backward compatibility.

Overview

This PR improves how e3sm_to_cmip reports and responds to handler failures across all run modes. It introduces a new command-line flag, --on-var-failure, which allows users to control how e3sm_to_cmip behaves when one or more variable handlers fail during CMORization or info-mode checks.

The new flag replaces the implicit always-succeed behavior with three explicit modes:

Option	Description	Exit Code
`ignore` (default)	Continue processing regardless of any handler failures.	0
`fail`	Process all handlers, but exit with code 1 if any fail.	1
`stop`	Exit immediately when the first handler fails.	1

Comparison of before and after this PR:

Aspect	Before this PR	After this PR
Failure handling	Always returned exit code 0, even if handlers failed.	Exit code reflects failures based on --on-var-failure (ignore, fail, stop).
User control	No way to stop or fail early — failures only logged.	Users can choose: continue (`"ignore"`), exit after all failures (`"fail"`), or stop immediately ("`stop`").
Parallel mode behavior	No exiting failure of first fail	Gracefully cancels pending jobs but allows active ones to complete before exiting with `--on-var-failure=stop`
Info mode consistency	Did not respect failure semantics.	Fully honors `--on-var-failure`, logging and exiting consistently.
Exit codes	Always 0.	0 (success), 1 (any failure, depending on mode).

Result: More predictable, script-friendly behavior, improved workflow integration, and safer, cleaner exits during parallel CMORization runs.

Closes #272

Details

Added a new CLI argument:
```
 --on-var-failure {ignore,fail,stop}
```
Updated _run_serial(), _run_parallel(), and _run_info_mode() to honor self.on_var_failure.
Introduced two helper methods for consistent behavior:
- _handle_failed_handler() — logs, records, and optionally triggers immediate exit.
- _finalize_failure_exit() — applies final exit logic at the end of processing.
Refactored shared failure logic to reduce duplication and improve testability.
Preserved all existing behaviors when --on-var-failure=ignore (default).
Maintained backward compatibility — no existing workflows are broken.

How Failures Are Handled with Parallel Jobs with `stop`

With --on-var-failure=stop, it gracefully cancels pending jobs but allows active ones to complete before exiting.

Compared to exiting immediately like with serial and info modes, the result is: Cleaner shutdowns, fewer partial outputs, and consistent logs and progress updates during parallel CMORization runs.

Implementation Notes

self.on_var_failure is now a class attribute shared across serial, parallel, and info modes.
Each handler failure is recorded in failed_handlers and processed through _handle_failed_handler().
_finalize_failure_exit() centralizes the exit decision logic.
All progress-bar and logging behavior remains unchanged.
_run_info_mode() now also respects --on-var-failure, treating missing variables or invalid table entries as handler failures.

Backward Compatibility

Default remains --on-var-failure=ignore.
No changes required for existing workflows or scripts.
Pipelines can now use non-zero exit codes for error handling if desired.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

New and existing unit tests pass with my changes (locally and CI/CD build)
I have added tests that prove my fix is effective or that my feature works
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

- Defaults to ignoring failures

tomvothecoder · 2025-10-09T17:31:43Z

TonyB9000 · 2025-10-09T17:45:31Z

@tomvothecoder Outstanding work, both for exit handling and for elucidating the "--info" mode options. (I can't imaging testing all combinations).

tomvothecoder · 2025-10-09T18:17:00Z

@tomvothecoder Outstanding work, both for exit handling and for elucidating the "--info" mode options.

I'm happy to get this going and hope it significantly improves the efficiency of debugging publication workflows!

(I can't imaging testing all combinations).

Thankfully in past refactoring efforts, I generated test cases with regridded data that I can re-use to test these cases. I'll post the test scripts above once I complete testing.

TonyB9000 · 2025-10-09T18:43:45Z

I have v3 data where I know failures will occur (CFmon.clisccp "NaNs" issue is one, and piClim-control-iceini triggers CMOR failure due to bad user-metadata file). The former may fail in NCO phase - before e2c is called, however.

Previously, when info-mode failed, I was not catching that fact, and was passing an empty var-list to ncclimo, which responds by extracting ALL variables and attempting regrid on everything - not the best default in my view.

You can induce a metadata failure by introducing a term in the "activity_id" that is not in the current CV:

<   "activity_id": "RFMIP AerChemMIP",
---
>   "activity_id": "RFMIP",

TonyB9000 · 2025-10-10T00:31:43Z

@tomvothecoder The code looks very good! I downloaded the branch so I could follow the logic more completely. (Minor initial confusion on terms. I knew that a failed "info" mode would indicate a "handler" failure (failure to resolve a handler), but did not think of a runtime CMOR failure (perhaps bad data) as being a "failed handler", but I get it now - all failures pass through "finalize_failure_exit".

I will create a dev env to install and test this behavior (both info mode, and runtime CMOR errors)

tomvothecoder · 2025-10-10T19:08:12Z

@tomvothecoder The code looks very good! I downloaded the branch so I could follow the logic more completely. (Minor initial confusion on terms. I knew that a failed "info" mode would indicate a "handler" failure (failure to resolve a handler), but did not think of a runtime CMOR failure (perhaps bad data) as being a "failed handler", but I get it now - all failures pass through "finalize_failure_exit".

I will create a dev env to install and test this behavior (both info mode, and runtime CMOR errors)

@TonyB9000 Heads up, I'm still working on the code to ensure the implementation is complete and correct. I'll tag you again as needed. Thanks for being eager to test!

tomvothecoder

@TonyB9000 In my most recent commit, 258794a (#323), I applied the behaviors of stop and fail on the initial process of deriving variable handlers.

These two cases will now end with sys.exit(1):

Handler(s) is not defined for a variable (aka missing)
Handler(s) is defined for a variable, but the input dataset(s) don't have the necessary raw E3SM variables.

I've highlighted the relevant code below in my review.

e3sm_to_cmip/cmor_handlers/utils.py

e3sm_to_cmip/runner.py

- Fix bug in `_get_handlers()` not instantianting `missing_handlers` after `_get_mpas_handlers()` call - Add FIXME: comments for duplicate code - Extract stop behaviors to `_stop_with_failed_handler()` and `_stop_with_failed_handler_parallel()`

e3sm_to_cmip/cmor_handlers/utils.py

Copilot

Pull Request Overview

This PR adds a new --on-var-failure command-line flag to provide explicit control over how e3sm_to_cmip handles variable handler failures, replacing the previous always-succeed behavior with user-configurable exit modes.

Key changes:

Added --on-var-failure flag with three modes: ignore (default), fail, and stop
Updated all run modes (serial, parallel, info) to respect failure semantics and exit appropriately
Enhanced logging and error reporting to provide clearer feedback on handler failures

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
e3sm_to_cmip/argparser.py	Adds the new `--on-var-failure` command-line argument
e3sm_to_cmip/runner.py	Core logic updates for failure handling across all run modes
e3sm_to_cmip/util.py	Adds convenience functions for consistent exit behavior
e3sm_to_cmip/cmor_handlers/utils.py	Updates handler loading functions to return missing/non-derivable handlers
e3sm_to_cmip/cmor_handlers/handler.py	Adds type alias for handler dictionaries
tests/cmor_handlers/test_utils.py	Updates test assertions to handle new tuple return values
docs/source/usage.rst	Documents the new flag and its behavior

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

e3sm_to_cmip/runner.py

e3sm_to_cmip/cmor_handlers/utils.py

tomvothecoder

Hey @TonyB9000, this PR is ready for your code review and testing.

I've tested most of these cases successfully. The only ones that need more thorough testing are for CMOR failures during "stop" mode with parallel processing. It should gracefully stop by allowing running jobs to complete and cancels pending jobs to ensure non-partial outputs, logs. Can you try to run this case? For example, run 2-3 variables and introduce faulty values for one of the variables that causes CMOR errors.

tomvothecoder · 2025-10-10T22:14:40Z

e3sm_to_cmip/runner.py

+            "Variable Failure Behavior (--on-var-failure)": self.on_var_failure,
+            "Variable List (--var-list)": f"{self.var_list} ({len(self.var_list)})",
+            "Input Path (--input-path)": self.input_path,
+            "Output Path (--output-path)": self.output_path,
+            "Precheck Path (--precheck)": self.precheck_path,
+            "Log Path (--logdir)": self.log_path,
+            "CMOR Log Path (--logdir)": self.cmor_log_dir,
+            "CMIP Metadata Path (--user-metadata)": self.new_metadata_path,
            "Temp Path for Processing MPAS Files": self.temp_path,
-            "Frequency": self.freq,
-            "Realm": self.realm,
+            "Frequency (--freq)": self.freq,
+            "Realm (--realm)": self.realm,


Improved log summary for e2c configuration, now includes the CLI arg if applicable.

tomvothecoder · 2025-10-10T22:15:07Z

e3sm_to_cmip/runner.py

        if self.info_mode:
            self._run_info_mode()
-            sys.exit(0)
+            exit_success()


Replaced sys.exit(0) and sys.exit(1) with exit_success() and exit_failure(), respectively.

tomvothecoder · 2025-10-10T22:15:43Z

e3sm_to_cmip/runner.py

+    def _log_handler_summary(self):
+        """
+        Logs a summary of the derived CMOR handlers, including any missing or
+        non-derivable handlers.
+        """
+        if self.handlers:
+            cmip_to_e3sm_vars = {
+                handler["name"]: handler["raw_variables"] for handler in self.handlers
+            }
+
+            logger.info("--------------------------------------")
+            logger.info("| SUCCESS: Derived Variable Handlers")
+            logger.info("--------------------------------------")
+            logger.info(f"  * Count: {len(self.handlers)}")
+            logger.info("  * Variable Mappings (CMIP to E3SM):")
+            for k, v in cmip_to_e3sm_vars.items():
+                logger.info(f"    * '{k}' -> {v}")
+
+        if self.missing_handlers:
+            logger.error("--------------------------------------")
+            logger.error("| NOTICE: Missing Handlers")
+            logger.error("---------------------------------------")
+            logger.error(
+                "Solution: Make sure handlers for these variables are defined "
+                "in `handlers.yaml`."
+            )
+            logger.error(f"  * Count: {len(self.missing_handlers)}")
+            logger.error(f"  * Variables: {self.missing_handlers}")
+
+        if self.non_derivable_handlers:
+            logger.error("--------------------------------------")
+            logger.error("| NOTICE: Non-derivable Handlers")
+            logger.error("---------------------------------------")
+            logger.error(
+                "Handlers were defined for these variables, but they could not "
+                "be derived using the input E3SM datasets."
+            )
+            logger.error(
+                "Possible Reasons: 1) No matching CMIP table was found for the "
+                "requested frequency or 2) The input E3SM datasets don't have "
+                "the required variables."
+            )
+            logger.error(f"  * Count: {len(self.non_derivable_handlers)}")
+            logger.error(f"  * Variables: {self.non_derivable_handlers}")


This method logs the summary for handlers after attempting to derive them for each of the variables (--var-list).

tomvothecoder · 2025-10-10T22:16:32Z

e3sm_to_cmip/runner.py

+    def _exit_due_to_handler_issues(self) -> bool:
+        """
+        Determines if the program should exit due to missing or non-derivable
+        handlers based on the ``on_var_failure`` setting.
+
+        Returns
+        -------
+        bool
+            True if the program should exit, False otherwise.
+        """
+        if not self.handlers:
+            logger.error(
+                "No variable handlers are defined or derivable from the raw "
+                "variables found in the E3SM input datasets."
+            )
+            return True
+
+        if self.missing_handlers or self.non_derivable_handlers:
+            if self.on_var_failure in ["stop", "fail"]:
+                logger.error(
+                    "Exiting due to missing or non-derivable handlers with "
+                    f"--on-var-failure={self.on_var_failure}."
+                )
+
+                return True
+
+        return False


This method determines whether to exit or not based on the status of handlers post-attempt at derivation. It also depends on --on-var-failure "stop" or "fail".

tomvothecoder · 2025-10-10T22:16:51Z

e3sm_to_cmip/runner.py

+                    # FIXME: This check is duplicated in mode 3 below. Refactor.
+                    # --- DUPLICATE CODE ---


Added comments to remove duplicate code from a previous PR.

tomvothecoder · 2025-10-10T22:20:36Z

e3sm_to_cmip/runner.py

+                if not is_cmor_successful:
+                    self._stop_with_failed_handler(handler["name"])
+


Stop behavior for failed handler during cmorizing.

tomvothecoder · 2025-10-10T22:21:01Z

e3sm_to_cmip/runner.py

-            return False
-
        self._log_final_result(num_handlers, num_success, failed_handlers)
+        self._finalize_on_failure(failed_handlers)


Finalize on "fail" mode.

tomvothecoder · 2025-10-10T22:21:37Z

e3sm_to_cmip/runner.py

+            if not future_result:
+                self._stop_with_failed_handler_parallel(
+                    handler_name, pool, pbar, futures
+                )


Gracefully stop parallel jobs with "stop" mode.

tomvothecoder · 2025-10-10T22:21:46Z

e3sm_to_cmip/runner.py

        pbar.close()
        pool.shutdown()
        self._log_final_result(num_handlers, num_success, failed_handlers)
+        self._finalize_on_failure(failed_handlers)


Finalize on "fail" mode.

tomvothecoder · 2025-10-10T22:22:10Z

e3sm_to_cmip/runner.py

    def _log_final_result(
        self, num_handlers: int, num_successes: int, failed_handlers: list[str]
    ):
        """
        Logs the final result of the CMORization process.

        Parameters
        ----------
        num_handlers : int
            The total number of handlers that were processed.
        num_successes : int
            The number of handlers that completed successfully.
        failed_handlers : list[str]
            A list of handler names that failed during processing.
        """
-        logger.info("========== FINAL RUN RESULTS ==========")
-        logger.info(f"* {num_successes} of {num_handlers} handlers succeeded.")
+        logger.info("")
+        logger.info("=======================================")
+        logger.info("| FINAL RUN SUMMARY")
+        logger.info("---------------------------------------")
+        logger.info(f"  * Total variables (--var-list): {len(self.var_list)}")
+        logger.info(f"  * Total handlers successfully derived: {num_handlers}")
+        logger.info(
+            f"  * Total handlers successfully cmorized: {num_successes} / {num_handlers}"
+        )

        if failed_handlers:
            logger.error(
-                "* The following handlers failed: "
-                + ", ".join(str(h) for h in failed_handlers)
+                f"  * Total handlers failed to cmorize: {len(failed_handlers)}"
            )
-        else:
-            logger.info("* All handlers completed successfully.")
+            logger.error(f"    - Failed variables: {failed_handlers}")
+
+        if self.missing_handlers:
+            logger.error(
+                f"  * Total handlers missing (not defined in handlers.yaml): "
+                f"{len(self.missing_handlers)}"
+            )
+            logger.error(f"    - Includes: {self.missing_handlers}")
+
+        if self.non_derivable_handlers:
+            logger.error(
+                f"  * Total handlers non-derivable (defined but not derivable): "
+                f"{len(self.non_derivable_handlers)}"
+            )
+            logger.error(f"    - Includes: {self.non_derivable_handlers}")
+
        logger.info("=======================================")


Improved final run summary formatting with helpful info on missing and non-derivable handlers.

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

e3sm_to_cmip/runner.py

e3sm_to_cmip/cmor_handlers/utils.py

TonyB9000 · 2025-10-10T22:54:48Z

@tomvothecoder I did some testing earlier and the new exit (fail) mode works great (both for info-mode and run-mode).

I will re-run these tests with the latest PR. I assume this gets me there:

(dsm_v3_gen) [ac.bartoletti1@chrlogin1 e3sm_to_cmip]$ git status
On branch 272-force-non-zero
Your branch is up to date with 'origin/enhancement/272-force-non-zero'.

nothing to commit, working tree clean
(dsm_v3_gen) [ac.bartoletti1@chrlogin1 e3sm_to_cmip]$ git pull
remote: Enumerating objects: 129, done.
remote: Counting objects: 100% (129/129), done.
remote: Compressing objects: 100% (47/47), done.
remote: Total 129 (delta 93), reused 116 (delta 82), pack-reused 0 (from 0)
Receiving objects: 100% (129/129), 38.07 KiB | 2.93 MiB/s, done.
Resolving deltas: 100% (93/93), completed with 13 local objects.
From https://github.com/E3SM-Project/e3sm_to_cmip
   258794a..bf9bf82  enhancement/272-force-non-zero -> origin/enhancement/272-force-non-zero
 * [new branch]      bump/v0.13.0                   -> origin/bump/v0.13.0
   7878b86..eb844fd  jinboxie_qboi                  -> origin/jinboxie_qboi
   19e3ee8..f90e4ab  master                         -> origin/master
 * [new branch]      preserve-legacy-xr-settings    -> origin/preserve-legacy-xr-settings
 * [new tag]         v1.13.0rc1                     -> v1.13.0rc1
Updating 258794a..bf9bf82
Fast-forward
 e3sm_to_cmip/cmor_handlers/handler.py |   7 ++++--
 e3sm_to_cmip/cmor_handlers/utils.py   | 101 +++++++++++++++++++++++++++++++++++++++++++------------------------------------
 e3sm_to_cmip/runner.py                | 353 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------------------------------------------------------------------
 e3sm_to_cmip/util.py                  |  16 ++++++++-----
 tests/cmor_handlers/test_utils.py     |  37 ++++++++++++++++++++---------
 5 files changed, 293 insertions(+), 221 deletions(-)

pip install, etc ...

Successfully installed e3sm_to_cmip-1.13.0rc1

TonyB9000 · 2025-10-10T23:21:52Z

@tomvothecoder There is an issue with info-mode - maybe not directly related to the exit handling.

I ran the command:

e3sm_to_cmip --info --on-var-failure fail -v pr --freq 3hr --realm atm -t /lcrc/group/e3sm2/DSM/Staging/Resource/cmor/cmip6-cmor-tables/Tables --map no_map --info-out 3hr_pr.yaml

the output to the console reads:

2025-10-10 18:09:51.861280 [INFO]: runner.py(__init__:159) >> --------------------------------------
2025-10-10 18:09:51.861473 [INFO]: runner.py(__init__:160) >> | E3SM to CMIP Configuration
2025-10-10 18:09:51.861542 [INFO]: runner.py(__init__:161) >> --------------------------------------
2025-10-10 18:09:51.866843 [INFO]: runner.py(__init__:187) >>   * Timestamp: 20251010_230951_858896
2025-10-10 18:09:51.866912 [INFO]: runner.py(__init__:187) >>   * Version Info: version 1.13.0rc1
2025-10-10 18:09:51.866957 [INFO]: runner.py(__init__:187) >>   * Mode: Info
2025-10-10 18:09:51.866998 [INFO]: runner.py(__init__:187) >>   * Variable Failure Behavior (--on-var-failure): fail
2025-10-10 18:09:51.867038 [INFO]: runner.py(__init__:187) >>   * Variable List (--var-list): ['pr'] (1)
2025-10-10 18:09:51.867077 [INFO]: runner.py(__init__:187) >>   * Input Path (--input-path): None
2025-10-10 18:09:51.867115 [INFO]: runner.py(__init__:187) >>   * Output Path (--output-path): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896
2025-10-10 18:09:51.867154 [INFO]: runner.py(__init__:187) >>   * Precheck Path (--precheck): None
2025-10-10 18:09:51.867192 [INFO]: runner.py(__init__:187) >>   * Log Path (--logdir): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/20251010_230951_858896.log
2025-10-10 18:09:51.867236 [INFO]: runner.py(__init__:187) >>   * CMOR Log Path (--logdir): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/cmor_logs
2025-10-10 18:09:51.867282 [INFO]: runner.py(__init__:187) >>   * CMIP Metadata Path (--user-metadata): /lcrc/group/e3sm2/DSM/Ops/DSM_Manager/e3sm_to_cmip_run_20251010_230951_858896/user_metadata_2743611.json
2025-10-10 18:09:51.867322 [INFO]: runner.py(__init__:187) >>   * Temp Path for Processing MPAS Files: None
2025-10-10 18:09:51.867360 [INFO]: runner.py(__init__:187) >>   * Frequency (--freq): 3hr
2025-10-10 18:09:51.867398 [INFO]: runner.py(__init__:187) >>   * Realm (--realm): atm
2025-10-10 18:09:52.128810 [INFO]: runner.py(_log_handler_summary:523) >> --------------------------------------
2025-10-10 18:09:52.128919 [INFO]: runner.py(_log_handler_summary:524) >> | SUCCESS: Derived Variable Handlers
2025-10-10 18:09:52.128969 [INFO]: runner.py(_log_handler_summary:525) >> --------------------------------------
2025-10-10 18:09:52.129019 [INFO]: runner.py(_log_handler_summary:526) >>   * Count: 2
2025-10-10 18:09:52.129066 [INFO]: runner.py(_log_handler_summary:527) >>   * Variable Mappings (CMIP to E3SM):
2025-10-10 18:09:52.129116 [INFO]: runner.py(_log_handler_summary:529) >>     * 'pr' -> ['PRECC', 'PRECL']
2025-10-10 18:09:52.180292 [INFO]: util.py(exit_success:80) >> Exiting with success code (0).

The output to the file 3hr_pr.yaml reads:

- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT

Now, the "--help" never says that --realm or --freq do not apply to info mode (but they should, in any case).

(It turns out the for my "Bad_Metadata" test, the first dataset I tried was the 3hr.pr, or else I would have missed this.)

I'll conduct a test with a different variable - but now I think we need to test all "ambiguous var-name" sets (where multiple frequencies are involved).

TonyB9000 · 2025-10-11T21:57:25Z

@tomvothecoder Update. This may not be a real problem. There are only two handlers for "pr", mon.json and day,json. There is no 3hr.json. There should be for name-consistency. Currently the one labeled "Amon_day.json" is really "Amon_sub_mon.json".

I will continue to investigate (look for failures).

tomvothecoder · 2025-10-20T16:35:15Z

@tomvothecoder Update. This may not be a real problem. There are only two handlers for "pr", mon.json and day,json. There is no 3hr.json. There should be for name-consistency. Currently the one labeled "Amon_day.json" is really "Amon_sub_mon.json".

I will continue to investigate (look for failures).

Hey @TonyB9000, any updates on your review for this PR? If you approve, I will do a final self-review before merging.

Add --on-var-failure flag to handle failure behaviors

7b937ab

- Defaults to ignoring failures

tomvothecoder self-assigned this Oct 9, 2025

tomvothecoder added the enhancement New feature or request label Oct 9, 2025

github-project-automation bot added this to E3SM to CMIP Development Oct 9, 2025

github-project-automation bot moved this to In progress in E3SM to CMIP Development Oct 9, 2025

tomvothecoder mentioned this pull request Oct 9, 2025

[Feature]: Flag to force non-zero exit status if any variable failed. #272

Open

Update sys.exit behavior based on derived handlers

258794a

tomvothecoder commented Oct 10, 2025

View reviewed changes

e3sm_to_cmip/cmor_handlers/utils.py Outdated Show resolved Hide resolved

e3sm_to_cmip/runner.py Show resolved Hide resolved

e3sm_to_cmip/runner.py Outdated Show resolved Hide resolved

e3sm_to_cmip/runner.py Outdated Show resolved Hide resolved

tomvothecoder added 3 commits October 10, 2025 12:12

Update e3sm_to_cmip/cmor_handlers/utils.py

cbf2930

Fix unit tests & replace load_module()

153174e

tomvothecoder requested a review from Copilot October 10, 2025 22:12

tomvothecoder commented Oct 10, 2025

View reviewed changes

e3sm_to_cmip/cmor_handlers/utils.py Outdated Show resolved Hide resolved

Apply suggestion from @tomvothecoder

638b1fe

Copilot AI reviewed Oct 10, 2025

View reviewed changes

e3sm_to_cmip/runner.py Outdated Show resolved Hide resolved

e3sm_to_cmip/runner.py Outdated Show resolved Hide resolved

e3sm_to_cmip/runner.py Show resolved Hide resolved

e3sm_to_cmip/cmor_handlers/utils.py Outdated Show resolved Hide resolved

tomvothecoder commented Oct 10, 2025

View reviewed changes

tomvothecoder marked this pull request as ready for review October 10, 2025 22:29

tomvothecoder requested a review from Copilot October 10, 2025 22:29

Adddress copilot review comments

bf9bf82

Copilot AI reviewed Oct 10, 2025

View reviewed changes

e3sm_to_cmip/runner.py Show resolved Hide resolved

e3sm_to_cmip/runner.py Show resolved Hide resolved

e3sm_to_cmip/runner.py Show resolved Hide resolved

e3sm_to_cmip/cmor_handlers/utils.py Show resolved Hide resolved

tomvothecoder mentioned this pull request Oct 20, 2025

326 force nan replacement #327

Closed

5 tasks

tomvothecoder mentioned this pull request Oct 30, 2025

Bump to v1.13.0 #332

Merged

9 tasks

		# FIXME: This check is duplicated in mode 3 below. Refactor.
		# --- DUPLICATE CODE ---

		if not is_cmor_successful:
		self._stop_with_failed_handler(handler["name"])

Add --on-var-failure flag to handle failure behaviors #323

Are you sure you want to change the base?

Add --on-var-failure flag to handle failure behaviors #323

Uh oh!

Conversation

tomvothecoder commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Overview

Details

How Failures Are Handled with Parallel Jobs with stop

Implementation Notes

Backward Compatibility

Checklist

Uh oh!

tomvothecoder commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration Test Checklist

General Behavior

Serial Mode (_run_serial)

Parallel Mode (_run_parallel)

Info Mode (_run_info_mode)

Logging & Exit Behavior

Regression & Compatibility

Uh oh!

TonyB9000 commented Oct 9, 2025

Uh oh!

tomvothecoder commented Oct 9, 2025

Uh oh!

TonyB9000 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Oct 10, 2025

Uh oh!

tomvothecoder commented Oct 10, 2025

Uh oh!

tomvothecoder left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomvothecoder left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Add `--on-var-failure` flag to handle failure behaviors #323

Add `--on-var-failure` flag to handle failure behaviors #323

tomvothecoder commented Oct 9, 2025 •

edited

Loading

How Failures Are Handled with Parallel Jobs with `stop`

tomvothecoder commented Oct 9, 2025 •

edited

Loading

Serial Mode (`_run_serial`)

Parallel Mode (`_run_parallel`)

Info Mode (`_run_info_mode`)

TonyB9000 commented Oct 9, 2025 •

edited

Loading

tomvothecoder left a comment •

edited

Loading

tomvothecoder left a comment •

edited

Loading

TonyB9000 commented Oct 10, 2025 •

edited

Loading

TonyB9000 commented Oct 10, 2025 •

edited

Loading