Aurora Air Pollution value explosion and collapse

Hi all,

I've been using the air pollution fine-tune of Aurora and found that values in many variables explode and collapse (into NaN) between 60-80 rollout steps. 

For context, I'm researching the model for use in long-term climate applications, so the lengths of rollouts I'm running far exceed the model's intended 5-day horizon. I've seen the paper caveats the air pollution fine-tune in several ways so perhaps this result isn't unexpected, but I've encountered no such issue with `AuroraPretrained` over rollouts exceeding one year (1,460 rollout steps).

Explosion and collapse always occurred earlier when `simulate_indexing_bug` was set to `True` relative to `False` when using the same data as the initial state. The location of collapse also changed, with the former setting producing increasingly extreme values over the Himalayas / Tibet and the latter over Northern China and South America. In all cases, it appears to be identical pixels that precipitate explosion every time, with gradually more extreme values propagating outwards from these over 10-20 steps to produce the blocky artefacts seen in the below images.

`simulate_indexing_bug=True`:
<img width="822" height="390" alt="Image" src="https://github.com/user-attachments/assets/ad8da8f8-8d81-4557-ae74-e75cfbd178ce" />

`simulate_indexing_bug=False`:
<img width="822" height="390" alt="Image" src="https://github.com/user-attachments/assets/1904db73-ee90-443b-9c36-c99cf86bc1b4" />

Thus far, I've found the only solution to be filtering of the 2t surface variable. With a Gaussian or uniform filter applied to 2t in each rollout step and the result assigned to the prediction Batch object's `.surf_vars["2t"]` attribute, I can run rollouts of arbitrary length.

What else I've tried:
- Using CAMS data from the train and test periods, as well as data outside of these - no effect
- Replaced CAMS climatic variables with coarsened ERA5 0.25 data - no effect
- Clamping 2t to sensible values (record global extremes) - no effect
- Per variable re-use of initial state data (i.e. replacing each rollout step's prediction for a given variable with said variable's initial state) - doing this with 2t alone resolved the issue
- Reduced window size - produced less variation in predictions between steps and accelerated collapse
- Changed timestep - 6 hour timestep accelerated collapse

Just looking for your perspectives or thoughts on this, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aurora Air Pollution value explosion and collapse #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aurora Air Pollution value explosion and collapse #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions