Analysis model proposal #494

NickGeneva · 2025-09-22T14:03:01Z

Earth2Studio Pull Request

Description

A PR to discuss the proposal of adding a new type of core component / model to the package, an analysis model.
There is a markdown file that discusses the general idea about this, but the mode immediate usage would be to better support things like TC trackers / non AI processes that can be executed in pipelines and can be modularized but dont really fit the mold of producing ND arrays.

Related PR: #485

Hopefully this could open the package for supporting in pipeline impact models as well potentially, or at least easy to use APIs to leverage them

More information to be added.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

pzharrington · 2025-09-22T22:01:49Z

docs/userguide/components/analysis.md

+Analysis models do not need to inherit this protocol, this is simply used to define the
+required APIs.
+The key distinguishing feature is that the `__call__` method returns `None`, making
+these models inference sinks rather than data sources or transformers.


I guess my one complaint about this being a requirement is that in the more interactive/notebook setting, it's nicer to return some analysis "result" directly rather than dumping it to disk and then re-reading it back in later. But I'm not sure if there is a great way to handle arbitrary return types.

e.g., taking the two use-cases you mentioned, the "validate" thing is easily handled, you could just have some console log/print statement show the result. But for something like a TC track/location it'd be nice to be able to get like tracks = TCTracks(c, coords) and tracks would contain the identified TC tracks. Obviously this is more of a nit/preference thing though, there are pretty easy workarounds

I might take the stronger view that I do not think it is appropriate to have a "model" that returns None, that this is really a signal that there is something else wrong with our approach to these types of models and that this is really just a band-aid. I also don't really know how this generalizes beyond TC track/locations, which I would argue are fine to organize by ND arrays anyways. I would say perhaps we need to better understand this class of models and what an appropriate output type might be (dataframe, for example).

I would be in favor of having the model either return the input data (passthrough) or the analysis result (e.g., pd.DataFrame). If the model returns the input data, one could do something like:

for model in pipeline: x, coords = model(x, coords)

without having to check the model type (AnalysisModel or not). I believe returning some actual output data (e.g., pd.DataFrame) would be preferrable, though. Only returning None does not seem to be very convenient in any way. Since according to the current description, AnalysisModel is a designated "end-of-pipeline", I don't think the return type would be a major issue, since the returned data does not have to be compatible with any downstream pipeline elements.

being general open regarding return type would already help a lot, esp for cases in which the shape of the returned data is not known a priori.
Object tracking would be obvious examples here, right now the pressing issue is around tc trackers, but could imagine with more regional and hi-res models cloud tracking could become interesting as well at some point. And I'm sure there are other occasions which are not on my radar.

However, to make the situation even more complicated, analysis models might be lightweight enough so they can run on the host while the GPU continues with the forecast.
This is now the case with the TempestExtremes implementation. In that mode it can run on top of the inference with effectively zero overhead, compressing PBs of data into MBs, but the result is returned at an arbitrary point in time, making it hard to catch in an inference loop. What could be returned is either the Future (as is done now) or the input data as Stefan suggested. That would at least avoid returning None, but it's not esp helpful for the remainder of the inference.

Overall, data sinks could also nudge users towards plugging their downstream models directly into the pipeline in case it makes sense from compute resources perspective.

swbg · 2025-10-01T13:07:37Z

docs/userguide/components/analysis.md

+Analysis models in Earth2Studio provides a set of models that serve as inference sinks
+and pipeline endpoints.
+Unlike {ref}`diagnostic_model_userguide` and {ref}`prognostic_model_userguide` which
+transform and return data, analysis models consume data without returning modified


The "modified" does not make clear whether the model returns anything at all.

ax proposal

fe3f741

NickGeneva requested review from loliverhennigh and pzharrington September 22, 2025 14:04

ax proposal

9d2112e

NickGeneva requested review from dallasfoster, gertln, jleinonen and mariusaurus September 22, 2025 14:05

pzharrington reviewed Sep 22, 2025

View reviewed changes

swbg reviewed Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analysis model proposal #494

Analysis model proposal #494

Uh oh!

NickGeneva commented Sep 22, 2025

Uh oh!

pzharrington Sep 22, 2025

Uh oh!

pzharrington Sep 22, 2025

Uh oh!

dallasfoster Sep 23, 2025

Uh oh!

swbg Oct 1, 2025

Uh oh!

mariusaurus Oct 8, 2025

Uh oh!

swbg Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Analysis model proposal #494

Are you sure you want to change the base?

Analysis model proposal #494

Uh oh!

Conversation

NickGeneva commented Sep 22, 2025

Earth2Studio Pull Request

Description

Checklist

Dependencies

Uh oh!

pzharrington Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

pzharrington Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dallasfoster Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

swbg Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

mariusaurus Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

swbg Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants