Skip to content

Conversation

@ClementinLeron
Copy link
Collaborator

Description

It extends the time interpolation and scaling capabilities of OpenAirClim to support individual scaling factors per species in the emission inventory.
This allows, for instance, different scaling factors to be applied when using multiple fuel types or accounting for species-specific emission changes.

This improvement addresses the TODO note previously present in apply_scaling:

"TODO: implement scaling for individual species"

Main changes

  • interpolate_time.py
    Updated scale_inv and apply_scaling to handle multiple emission scaling factors (one per species).
  • create_time_evolution.py
    Modified to generate a coherent example including species-dependent scaling.
  • interpolate_time_test.py
    Updated unit tests to reflect the new behavior.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this been tested?

  • Updated the existing test suite in interpolate_time_test.py to validate per-species scaling.
  • Verified functionality manually using a sample NETCDF4 file with the following structure:
    <class 'netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): Title: Time scaling example Convention: CF-XXX Type: scaling dimensions(sizes): time(210), species(5) variables(dimensions): float32 scaling(time, species), int64 time(time), <class 'str'> species(species) groups:

Example
print(file2read.variables['species'][:]) # ['fuel' 'CO2' 'H2O' 'NOx' 'distance']

To reproduce:

  1. Create a NetCDF file with the structure above.
  2. Run OpenAirClim using the same .toml configuration as for the standard scaling example.

Test configuration:

  • Operating system: win32
  • Environment: environment_dev.yaml

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any changed dependencies have been added or removed correctly
  • main branch: I have updated the CHANGELOG
  • main branch: I have updated the version numbering in __about__.py according to the semantic versioning scheme

Clementin Leron added 3 commits October 15, 2025 09:25
implementation scaling for individual species
tests and example updated to match the new scaling function
Change the values in create_time_evoltion.py to match a more eco-friendly evolution.
Update the calculations for scaling in interpolate_time.py to have a more sustainable code.
@ClementinLeron ClementinLeron self-assigned this Nov 5, 2025
@stefan-voelk stefan-voelk self-requested a review November 6, 2025 14:27
Copy link
Collaborator

@stefan-voelk stefan-voelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The preferred workflow is that the feature branch is created within our dlr-pa repository. You would clone the existing repository to your local computer and syncing the newly created local branch with remote on dlr-pa. A major advantage of this approach is that reviewers could more easily execute the modified code within an existing OpenAirClim installation. Only the particular feature branch has to be activated ("checkout") then.
  • If you don't want to perform a Pull Request (PR) immediately to main, we usually create a Draft Pull Request to start a conversation with the reviewers.
  • The actual PR to the feature branch on dlr-pa does not need approval by a reviewer. Therefore, you can go ahead with the merge to the feature branch and my comments provided. Once a PR to main is created, an approval is needed before the merge.
  • The previous PR #94 can be closed.
  • One of the automatic checks failed. This must be fixed before merging to main. The particular check is related to a workflow creating the demonstration examples for the website documentation. Since the scaling method changed, but the example was not updated, this check failed. The documentation itself on the time evolution scaling option should be updated as well.

SCALING_TIME = np.arange(1990, 2200, 1)
SCALING_ARR = np.sin(SCALING_TIME * 0.2) * 0.6 + 1.0
SCALING_ARR = SCALING_ARR.astype("float32")
SPECIES = ["fuel", "CO2", "H2O", "NOx", "distance"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the times, I named lists and numpy arrays xxx_arr in OpenAirClim. In addition, species can be a singular and plural word. Therefore, I suggest renaming this list to SPECIES_ARR.



def plot_time_scaling(scaling_time: np.ndarray, scaling_arr: np.ndarray):
def plot_time_scaling(scaling_time: np.ndarray, scaling: np.ndarray, species: list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above. I suggest renaming to scaling_arr and species_arr. I used the notation _arr for arrays of any dimension throughout OpenAirClim.

ax.set_xlabel("year")
ax.set_ylabel("scaling factor")
plt.figure(figsize=(8, 4))
for i, specie in enumerate(species):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example why the notation _arr is unambiguous. First, the reader can better distinguish the two objects species_arr (np.ndarray) and species (str) in the code. Second, the word specie (without "s" at the end) means something different, a coin ;-)

ax.plot(scaling_time, scaling_arr)
ax.set_xlabel("year")
ax.set_ylabel("scaling factor")
plt.figure(figsize=(8, 4))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To better distinguish the two plots which are opened upon execution of this script, I suggest to add a string identifier, e.g. "Scaling" as first argument to function plt.figure(), see : https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html
Instead of "Figure 1", a nice name would show up then.


def create_time_scaling_xr(
scaling_time: np.ndarray, scaling_arr: np.ndarray
scaling_time: np.ndarray, scaling: np.ndarray, species: list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, suggest renaming variables to xxx_arr

Author="Stefan Völk",
Contact="stefan.voelk@dlr.de",
Author="Stefan Völk, Clémentin Léron",
Contact="openairclim@dlr.de",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for updating the attributes/metadata 👍

np.linspace(1.0, 1.5, len(SCALING_TIME)), # H2O: +50% growth
np.linspace(1.0, 0.8, len(SCALING_TIME)), # NOx: -20% reduction
np.linspace(1.0, 1.2, len(SCALING_TIME)) # distance: +120% growth
]).astype("float32")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the scaling factors. Looks much more realistic!

inv_years = np.array(list(inv_dict.keys()))

# Interpolate scaling factors linearly on time_range
time_range, evo_interp_dict = interp_evolution(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works fine, even with higher dimensional scaling factors array and with time steps > 1 year.

"dir": "../oac/example/input/",
"file": "time_scaling_example.nc",
"range": [2020, 2051, 1],
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some guidelines and good practices for software testing. I can provide you with some additional documentation on that topic if you like:

  • According to the testing pyramid there are three types: unit tests (base), integration tests (middle) and end-to-end tests (top). The base comes with the highest amount of tests, top has the fewest. This means one should focus on the unit tests. These tests are fast, isolated and check individual aspects of the individual implemented functions.
  • Decouple tests from external dependencies, e.g. other files in the repository. This concept is known as test isolation. This particular test would fail, if the utility script generating time_scaling_example.nc has not executed beforehand. Test isolation can be achieved by the concept of test doubles (e.g. mock objects). If the focus is unit testing instead of end-to-end, maybe the file itself is not needed anymore and it would be enough to code a minimal xarray Dataset which contains an example time evolution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Pull Request (PR) is from an external repository to the dlr-pa repository. Once the branch is within dlr-pa, reviewers can perform much easier checks with the new code. Otherwise, a fresh clone and install of OpenAirClim from the external repository is needed.

@stefan-voelk stefan-voelk mentioned this pull request Nov 18, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants