Skip to content

JP-4103: Set FAILED status for steps that did not succeed#10497

Open
melanieclarke wants to merge 2 commits into
spacetelescope:mainfrom
melanieclarke:jp-4103
Open

JP-4103: Set FAILED status for steps that did not succeed#10497
melanieclarke wants to merge 2 commits into
spacetelescope:mainfrom
melanieclarke:jp-4103

Conversation

@melanieclarke
Copy link
Copy Markdown
Collaborator

@melanieclarke melanieclarke commented Apr 29, 2026

Resolves JP-4103

Update stpipe utility functions record_step_status and query_step_status to expect three possible values for calibration step status: COMPLETE, SKIPPED, or FAILED.

The record_step_status currently has one parameter (success), which was setting COMPLETE if True, SKIPPED if False. In most use cases for this function, success=False is used to indicate that the step attempted to run but couldn't complete, either because of an internal error, or the input data was invalid for the processing. The status "FAILED" seems more appropriate to these cases, so I changed the default for success=False to FAILED instead of SKIPPED. I also added an option to directly set a status instead, and used it in a few places where SKIPPED seemed more appropriate than FAILED.

Steps affected by this default change are:

  • assign_mtwcs
  • cube_build
  • master_background
  • master_background_mos
  • outlier_detection
  • pixel_replace
  • tweakreg

The other steps directly set SKIPPED or COMPLETE -- we can update those on a case by case basis as needed.

Requires spacetelescope/stdatamodels#722 (merged)

Tasks

  • If you have a specific reviewer in mind, tag them.
  • add a build milestone, i.e. Build 12.0 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see changelog readme for instructions)
      • if your change breaks step-level or public API (as defined in the docs), also add a changes/<PR#>.breaking.rst news fragment
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly

@melanieclarke melanieclarke changed the title JP-4103: Set FAILED status steps that did not succeed JP-4103: Set FAILED status for steps that did not succeed Apr 29, 2026
@melanieclarke melanieclarke added this to the Build 13.0 milestone Apr 29, 2026
@melanieclarke
Copy link
Copy Markdown
Collaborator Author

Regtests:
https://github.com/spacetelescope/RegressionTests/actions/runs/25119632091

Diffs show the most likely difference users might see: tweakreg and outlier_detection now report FAILED instead of SKIPPED for a single input file. Open to suggestions, if folks think this use case should still be "SKIPPED".

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.39%. Comparing base (f86541f) to head (8f92830).

Files with missing lines Patch % Lines
jwst/master_background/master_background_step.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10497      +/-   ##
==========================================
- Coverage   86.40%   86.39%   -0.01%     
==========================================
  Files         373      373              
  Lines       40112    40114       +2     
==========================================
  Hits        34658    34658              
- Misses       5454     5456       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emolter
Copy link
Copy Markdown
Collaborator

emolter commented May 4, 2026

I haven't reviewed this yet but I have some general questions about the expectations here.

What step status do we expect to report for failures due to e.g. the wrong input datamodel type, failing to find a reference file, etc, i.e. failures in the step setup? That is, as opposed to some algorithmic component of the step. I think there are a good amount of these in various places, and changing all of them from SKIPPED to FAILED seems both unnecessary and maybe more confusing.

Does this elevate to a "breaking change"? More so than other keywords, I'd think it could be somewhat likely that downstream user workflows would be querying status keywords to decide on what further processing is needed.

Should we just make it so record_step_status always requires an input string? With this change it seems like success=bool is fundamentally no longer a good idea, and rather than patching around it by adding an option we should maybe just rethink it.

In some cases where, say, multiple slits in a MultiSlitModel are processed, is the expectation that some would receive a FAILED and some would pass through fine?

@melanieclarke
Copy link
Copy Markdown
Collaborator Author

melanieclarke commented May 4, 2026

What step status do we expect to report for failures due to e.g. the wrong input datamodel type, failing to find a reference file, etc, i.e. failures in the step setup? That is, as opposed to some algorithmic component of the step. I think there are a good amount of these in various places, and changing all of them from SKIPPED to FAILED seems both unnecessary and maybe more confusing.

The two currently supported requests from INS were to report FAILED when:

  1. background subtraction is attempted for WFSS but does not complete due to insufficient background pixels
  2. clean_flicker_noise is attempted for NIRSpec but does not complete because a WCS could not be assigned.

Both of these were requested so that users would know that the step was attempted, but was not successfully applied for some particular input data.

Extending these requests, the distinction I was aiming for here is that SKIPPED should be used if the step was not attempted at all, and FAILED should be used if it was tried (e.g. it's on by default for this mode) but it could not be successfully completed. I think wrong datamodel, no reference file, etc. should fall under FAILED for this definition. I think this change would have minimal user impact in those cases, though, and I would expect it's low priority to make sure we're consistent with this choice everywhere. Even with this distinction, there are some gray areas I'm not sure about, e.g. the master background subtraction handling here.

I'm open to suggestions, if we prefer a different definition for FAILED. I'd also be okay with being conservative with it, so that we only set FAILED if it's specifically requested.

Does this elevate to a "breaking change"? More so than other keywords, I'd think it could be somewhat likely that downstream user workflows would be querying status keywords to decide on what further processing is needed.

I don't think so. It's not a breaking change for the software -- there's no reason this change should trigger a major release for this package. Significant data changes are communicated by other means.

Should we just make it so record_step_status always requires an input string? With this change it seems like success=bool is fundamentally no longer a good idea, and rather than patching around it by adding an option we should maybe just rethink it.

I went back and forth on this one. I settled on the current implementation because it seemed useful to me to define a default behavior for when a step is "not successful", and because it's less disruptive to current usage, but I'd be okay with always requiring the input string too.

In some cases where, say, multiple slits in a MultiSlitModel are processed, is the expectation that some would receive a FAILED and some would pass through fine?

Current standard in several steps where this is relevant is to mark the step COMPLETE if any slit completes successfully and SKIPPED otherwise. If we move to FAILED instead of SKIPPED in some cases, the same rules could apply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants