Skip to content

Refactor model.update#710

Merged
emolter merged 11 commits into
spacetelescope:mainfrom
emolter:issue-690
Apr 22, 2026
Merged

Refactor model.update#710
emolter merged 11 commits into
spacetelescope:mainfrom
emolter:issue-690

Conversation

@emolter
Copy link
Copy Markdown
Contributor

@emolter emolter commented Apr 7, 2026

Closes #690

This PR modifies model.update to set attributes on the datamodel object nodes directly instead of into its instance dictionary, except in the case of extra_fits when requested, which exist outside the schema.

There are two major effects of the change.

First, validation now occurs attribute-by-attribute on assignment, replacing a call to model.validate() at the end. This is beneficial because it avoids triggering validation warnings/errors that were not introduced by the update call itself. See #690 for a concrete example. This also makes model.update respect the validate_on_assignment flag.

Second, update() will no longer add non-schema-defined attributes to the target model. For example,

import stdatamodels.jwst.datamodels as dm
model = dm.ImageModel()
model.meta.wcsinfo.v2_ref = 100.0
model2 = dm.SpecModel() # wcsinfo is not schema-defined
model2.update(model)
model2.meta.wcsinfo.v2ref

on main the last line returns 100.0, i.e., propagates the value from source to target. On the PR branch, the last line will raise an AttributeError - that attribute was not updated since it wasn't in the schema.

My opinion is that the new behavior is better; however, it seems the old behavior was leveraged in the pipeline to give wcsinfo to MultiSlitModel in the extract2d step. That was the cause of these errors: https://github.com/spacetelescope/RegressionTests/actions/runs/24104400120/attempts/2

An AI tool was used to help prepare this PR.

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run jwst regression tests with this branch installed ("git+https://github.com/<fork>/stdatamodels@<branch>")
news fragment change types...
  • changes/<PR#>.feature.rst: new feature
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.doc.rst: documentation change
  • changes/<PR#>.removal.rst: deprecation or removal of public API
  • changes/<PR#>.misc.rst: infrastructure or miscellaneous change

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.33%. Comparing base (75c2006) to head (dd73b72).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #710   +/-   ##
=======================================
  Coverage   90.32%   90.33%           
=======================================
  Files          99       99           
  Lines        4641     4563   -78     
=======================================
- Hits         4192     4122   -70     
+ Misses        449      441    -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 7, 2026

@melanieclarke
Copy link
Copy Markdown
Contributor

This sounds reasonable to me in principle although I haven't reviewed the code changes in detail.

Does the change to the validation process make a difference in performance? I know validation can be expensive sometimes, I'm wondering how many separate validation calls compares to one big one.

@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 14, 2026

Thanks for suggesting looking at runtime @melanieclarke , as you maybe suspected yes this new method is slower. To update a blank RampModel with the metadata from a random real-data RampModel in the regression tests, I'm getting ~15ms spent inside update() on main versus ~52ms on this PR branch. Most of that is validation in both cases.

You are right that in certain circumstances validation can be a substantial fraction of model overhead, e.g. if data arrays are very small, so this slowdown is definitely worth considering.

I may have found a way to make validation a bit faster - I'll see if that's worth pursuing and then push up a separate PR.
edit: #714 gets some of that back: for the same test as above, the update call takes ~26ms. Still worse, but I'd worry about it less.

I do think something should be done here though. I think I've convinced myself that allowing model.update to silently copy over things that are in the schema of the source datamodel but not the schema of the target datamodel, as was done with wcsinfo in several places, is simply a bug.

@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 17, 2026

@melanieclarke had asked whether the MultiSlitModel use of wcsinfo in the pipeline could be avoided somehow. I think the answer is that I don't see an easy/elegant solution for that. The update call responsible for this lives in extract_2d. The wcsinfo is requested first on this line, and while it would be possible to give the new model calib_mos the wcsinfo from the input data instead of the MultiSlitModel named calibrated, it wouldn't help because calib_mos is itself a MultiSlitModel so you'd still be assigning non-schema-defined things. So we'd need a way to pass wcsinfo into the downstream steps in a different way.

@tapastro had asked if the MultiSlitModels that would newly receive wcsinfo when it's added to the schema are limited to the Spec2Pipeline, and the answer is yes; see the regression test diffs above. I agree with Tyler that in spec2, because all the slits come from the same input rate file, they should all have the same wcsinfo and therefore it's logical to keep that in the top-level MultiSlitModel. The other option I see would be to assign wcsinfo to each slit, but that seems redundant.

@melanieclarke
Copy link
Copy Markdown
Contributor

Thanks for checking @emolter - that all makes sense to me.

@emolter emolter mentioned this pull request Apr 17, 2026
5 tasks
@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 17, 2026

I thought it might be good to address the schema update in a separate PR: #712

That way we can more easily ensure this PR doesn't have other unanticipated impacts. But let me know if you think this is not necessary.

@emolter emolter marked this pull request as ready for review April 17, 2026 14:15
@emolter emolter requested a review from a team as a code owner April 17, 2026 14:15
@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 17, 2026

instead of updating multislit schema, now pointing to a jwst branch where we attempt never to set multislit wcsinfo: https://github.com/spacetelescope/RegressionTests/actions/runs/24575930853

@emolter emolter mentioned this pull request Apr 17, 2026
10 tasks
Copy link
Copy Markdown
Collaborator

@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code changes look good to me. One request for updating the update docstring. Also how does the performance look now?

Finally, are the regtest failures all expected with this PR (and the linked jwst one)?

Comment thread src/stdatamodels/model_base.py
@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 21, 2026

The JWST PR shows the same failures when pointed to stdatamodels main. My expectation is that once that's merged, a regtest rerun of jwst main against this branch will be clean.

The speed numbers from #710 (comment) are still accurate: before this PR, update was taking 15ms to populate a blank RampModel with real regtest RampModel metadata. After, along with #714, it takes 26ms. Nearly all of the difference is still in validation.

Copy link
Copy Markdown
Collaborator

@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This greatly simplifies update. I may poke at the update performance to see if there are any easy wins there but those don't need to hold up this PR.

@emolter
Copy link
Copy Markdown
Contributor Author

emolter commented Apr 22, 2026

companion jwst PR is merged now. https://github.com/spacetelescope/RegressionTests/actions/runs/24787022754 should check for any unexpected differences before merging this.

@emolter emolter enabled auto-merge (squash) April 22, 2026 16:43
@emolter emolter merged commit a9b2808 into spacetelescope:main Apr 22, 2026
21 of 22 checks passed
@emolter emolter deleted the issue-690 branch April 22, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validate method override plus model.update leads to confusing/fragile behavior

4 participants