Skip to content

JP-4260 - Enable stfitsdiff in nightly regression tests#10300

Merged
melanieclarke merged 7 commits into
spacetelescope:mainfrom
stscirij:enable_stfitsdiff
Apr 8, 2026
Merged

JP-4260 - Enable stfitsdiff in nightly regression tests#10300
melanieclarke merged 7 commits into
spacetelescope:mainfrom
stscirij:enable_stfitsdiff

Conversation

@stscirij
Copy link
Copy Markdown
Contributor

@stscirij stscirij commented Mar 5, 2026

and stfitsdiff script

Resolves JP-4260

This PR enables stfitsdiff in the nightly regression tests and the stfitsdiff script

Tasks

  • If you have a specific reviewer in mind, tag them.
  • add a build milestone, i.e. Build 12.0 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see changelog readme for instructions)
      • if your change breaks step-level or public API (as defined in the docs), also add a changes/<PR#>.breaking.rst news fragment
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly

@stscirij
Copy link
Copy Markdown
Contributor Author

stscirij commented Mar 5, 2026

Regression tests all passed with no errors or warnings: https://github.com/spacetelescope/RegressionTests/actions/runs/22682759463

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.77%. Comparing base (2672c8b) to head (2dd91da).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10300   +/-   ##
=======================================
  Coverage   85.76%   85.77%           
=======================================
  Files         372      372           
  Lines       40032    40032           
=======================================
+ Hits        34334    34337    +3     
+ Misses       5698     5695    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Robert! I'd like to leave this open for now and rerun the tests a few times when we expect there to be real regtest differences, so we can verify the differences are as expected.

Comment thread jwst/regtest/st_fitsdiff.py
@melanieclarke melanieclarke added this to the Build 12.3 milestone Mar 5, 2026
@melanieclarke
Copy link
Copy Markdown
Collaborator

melanieclarke commented Mar 6, 2026

#10299 will merge shortly and make some new regtest differences. I'll okify those diffs then kick off some more regtests here (before merging main in). I expect the report from stfitsdiff should be the same as this run, but with truth and new products reversed: https://github.com/spacetelescope/RegressionTests/actions/runs/22773838631

Stfitsdiff tests here:
https://github.com/spacetelescope/RegressionTests/actions/runs/22779628316

These tests ran into some new changes on main, so they include both the trace_model diffs expected, and also the same diffs as this run on main: https://github.com/spacetelescope/RegressionTests/actions/runs/22783421753

@melanieclarke
Copy link
Copy Markdown
Collaborator

Reviewing the trace_model diffs, comparing: https://github.com/spacetelescope/RegressionTests/actions/runs/22773838631 and
https://github.com/spacetelescope/RegressionTests/actions/runs/22779628316

I think the file diffs reported are correct. I see the same number of pixels different in every case. The max absolute diff values and percent different also all match up. The reported max relative diffs are different, but I think that's because I swapped A and B files for this comparison.

The x1d reports are so much more useful than the fitsdiff versions! And the report suppression for NaN-only changes look good.

Reviewing the NIRCam image diffs, comparing:
https://github.com/spacetelescope/RegressionTests/actions/runs/22779628316 and
https://github.com/spacetelescope/RegressionTests/actions/runs/22783421753

Same. All the diffs look right. The max relative diffs match up in this report, since As and Bs are the same for this report.

@melanieclarke
Copy link
Copy Markdown
Collaborator

melanieclarke commented Mar 10, 2026

Running again for some wfss_contam diffs from #10315, before merging in main. The diffs have been okified, so the new run should be equal and opposite to https://github.com/spacetelescope/RegressionTests/actions/runs/22875594770

Stfitsdiff run here:
https://github.com/spacetelescope/RegressionTests/actions/runs/22913847006

The WFSS contam diffs look right; the others are holdovers from other changes. The table print a bit funny in the first part of the report -- I think it's trying to interpret it as markup language or something? But they are clear in the second part of the report, so I don't think it's stfitsdiff's fault.

@melanieclarke
Copy link
Copy Markdown
Collaborator

Comparing some table diffs for a NIRISS SOSS specprofile reference change:
Scheduled run: https://github.com/spacetelescope/RegressionTests/actions/runs/23122804722
Stfitsdiff run: https://github.com/spacetelescope/RegressionTests/actions/runs/23146532890

Image differences are the same in both reports.

There is a reporting difference for multi-dimensional table entries that we should look into. For test_niriss_soss_stage3_x1dints: jwst.regtest.test_niriss_soss HDU 2, fitsdiff reports 10 different table elements (3.7% different) and stfitsdiff reports 17738 (no percentage reported). I think both are probably right -- there are most likely 17738 total numerical differences, but they are found in 10 different table cells (all the flux entries, for the 10 row table).

I find the stfitsdiff report clearer about what's actually going on here -- it's nearly all of the flux values that have changed, which is easer to overlook in the fitsdiff report. But I think it might be helpful, if we can, to report the number of table cells affected, as well as the total number of numerical differences. I don't think this is a blocker for this PR, though.

@melanieclarke
Copy link
Copy Markdown
Collaborator

melanieclarke commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reports look good in all cases I've tested, so I think we're ready to go ahead and merge this.

@melanieclarke melanieclarke enabled auto-merge (squash) April 8, 2026 17:12
@melanieclarke melanieclarke merged commit 95af53b into spacetelescope:main Apr 8, 2026
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants