Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
aa88afc
doc: modify parameter.md and parameter.py
finozzifa Nov 5, 2025
ab40cc5
doc: move models.md to docs/user_guide
finozzifa Nov 5, 2025
003aca8
doc: enter datapackage.md
finozzifa Nov 6, 2025
f2f2d8a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2025
62c2aa1
doc: add placeholder text to home/users.md
finozzifa Nov 6, 2025
e717a8d
merge from origin
finozzifa Nov 6, 2025
b8da952
doc: modify parameter.md
finozzifa Nov 6, 2025
35f6d01
doc: add source.md
finozzifa Nov 6, 2025
52c8275
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2025
5234e92
changes to source.md
finozzifa Nov 6, 2025
8873c19
Merge branch 'td_doc' of https://github.com/open-energy-transition/te…
finozzifa Nov 6, 2025
ed5274a
Merge branch 'prototype-2' of https://github.com/open-energy-transiti…
finozzifa Nov 6, 2025
a6cb163
Merge branch 'prototype-2' of https://github.com/open-energy-transiti…
finozzifa Nov 6, 2025
0e3907d
doc: add source_collection.md
finozzifa Nov 6, 2025
71f0ab5
doc: add new section to parameter.md
finozzifa Nov 6, 2025
cd1ad12
pre-commit changes
finozzifa Nov 6, 2025
b709cfb
doc: add doc for technology_collection.md
finozzifa Nov 6, 2025
8116c09
doc: add technology.md
finozzifa Nov 6, 2025
5576be9
doc: add technology.md to mkdocs.yaml
finozzifa Nov 6, 2025
a7009e4
doc: add CITATIONS.cff
finozzifa Nov 11, 2025
f5535ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2025
af1ad10
add FF orcid
finozzifa Nov 11, 2025
9b08a04
add FF orcid
finozzifa Nov 11, 2025
91031cc
merge conflict
finozzifa Nov 11, 2025
8412d7c
Merge branch 'prototype-2' of https://github.com/open-energy-transiti…
finozzifa Nov 19, 2025
4b15cae
source api reference'
finozzifa Nov 20, 2025
60f5837
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
0546563
doc: add api reference
finozzifa Nov 26, 2025
5f13f83
doc: add api reference for other classes
finozzifa Nov 26, 2025
97964ce
doc: add api reference for other classes
finozzifa Nov 26, 2025
374c593
doc: update documentation
finozzifa Nov 26, 2025
9ec63b6
doc: update docs
finozzifa Nov 26, 2025
dd1a5bb
update docs (again)
finozzifa Nov 26, 2025
74a43ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2025
17210da
pre-commit hook
finozzifa Nov 26, 2025
5674e45
Merge branch 'td_doc' of https://github.com/open-energy-transition/te…
finozzifa Nov 26, 2025
cef95c1
doc: update doc with comments from PR
finozzifa Nov 26, 2025
75050a8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2025
f5b1a81
pre-commit hooks
finozzifa Nov 26, 2025
ea21f9b
Merge branch 'td_doc' of https://github.com/open-energy-transition/te…
finozzifa Nov 26, 2025
4273b0a
doc: update datapackage.md
finozzifa Nov 26, 2025
829bd14
add new sectioj?
finozzifa Nov 27, 2025
d082fdc
doc: update dea_energy_storage.py
finozzifa Nov 28, 2025
d21a260
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 28, 2025
f7e4e76
doc: update dea_storage.md
finozzifa Dec 1, 2025
376b48f
doc: update doc
finozzifa Dec 1, 2025
e5a824d
doc: update of the dea_storage.md
finozzifa Dec 1, 2025
6aa9724
doc: update dea_storage.md doc
finozzifa Dec 1, 2025
e30aa8e
docs: add utils references
finozzifa Dec 1, 2025
2be0e7a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 1, 2025
3137b46
Merge branch 'prototype-2' of https://github.com/open-energy-transiti…
finozzifa Dec 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ repos:
rev: v0.0.99 # Use the latest release tag
hooks:
- id: rumdl
args: ["--fix", "--disable=MD013, MD024, MD030"]
args: ["--fix", "--disable=MD006, MD013, MD024, MD030, MD041"]
17 changes: 17 additions & 0 deletions CITATIONS.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# SPDX-FileCopyrightText: The technology-data authors
#
# SPDX-License-Identifier: MIT

cff-version: 1.2.0
message: "If you use this package, we suggest the following way of citing it."
title: "technology-data: Data for Energy Systems Models"
repository: https://github.com/pypsa/technology-data
version: v2-prototype # TODO: Update automatically using a release script
license: MIT
authors:
- family-names: Hampp
given-names: Johannes
orcid: 0000-0002-1776-116X
- family-names: Finozzi
given-names: Fabrizio
orcid: 0009-0002-3876-2564
1 change: 1 addition & 0 deletions docs/api/commons/commons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.utils.commons.Commons
1 change: 1 addition & 0 deletions docs/api/commons/dateformatenum.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.utils.commons.DateFormatEnum
1 change: 1 addition & 0 deletions docs/api/commons/fileextensionenum.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.utils.commons.FileExtensionEnum
1 change: 1 addition & 0 deletions docs/api/datapackage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.datapackage.DataPackage
1 change: 1 addition & 0 deletions docs/api/parameter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.parameter.Parameter
1 change: 1 addition & 0 deletions docs/api/source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.source.Source
1 change: 1 addition & 0 deletions docs/api/source_collection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.source_collection.SourceCollection
1 change: 1 addition & 0 deletions docs/api/technology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.technology.Technology
1 change: 1 addition & 0 deletions docs/api/technology_collection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.technology_collection.TechnologyCollection
1 change: 1 addition & 0 deletions docs/api/units/customundefineduniterror.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.utils.units.CustomUndefinedUnitError
1 change: 1 addition & 0 deletions docs/api/units/specialunitregistry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: technologydata.utils.units.SpecialUnitRegistry
2 changes: 1 addition & 1 deletion docs/contributing/instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ We enthusiastically invite anyone interested in `technologydata` to share new id

## Where to go for help

- To **discuss** with other `technologydata` users, organise projects, share news, and get in touch with the community, please refer to the [Contacts](/docs/home/contacts.md) page.
- To **discuss** with other `technologydata` users, organise projects, share news, and get in touch with the community, please refer to the [Contacts](../home/contacts.md) page.
- For **guidelines to contribute**, stay right here.

## Code contributions
Expand Down
80 changes: 80 additions & 0 deletions docs/examples/dea_storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Danish Energy Agency Parser Documentation

## Overview

The Danish Energy Agency (DEA) data parser `dea_energy_storage.py` demonstrates a full data-cleaning and transformation pipeline for converting raw tabular data into the `technologydata` schema files `technologies.json` and `sources.json`. The parser is implemented in `src/technologydata/package_data/dea_energy_storage/dea_energy_storage.py`.

## Dataset Description

The original dataset is available at this [link](https://ens.dk/media/6589/download). A full description of the dataset is available at this [link](https://ens.dk/media/6588/download). The raw source file is included in the repository at `src/technologydata/package_data/raw/Technology_datasheet_for_energy_storage.xlsx`.

The dataset is in Excel format, and it includes, under the data sheet `alldata_flat`, a flat table of technology parameters for a range of energy storage technologies. Columns include `Technology`, `ws`, `par` (parameter name), `val` (value), `unit`, `year`, `est` (case/estimate), `priceyear`, plus metadata columns such as `cat`, `ref`, `note`. Rows are individual parameter records (parameter value + unit + context) for technologies and estimation cases.

## Parser description

The parser is articulated in the following steps.

### Command line argument parsing

Function `parse_input_arguments()` defines and parses the command-line arguments:
- `--num_digits` (int, default 4) — number of decimals used when rounding numeric values. The default value is 4.
- `--store_source` (boolean flag) — whether to store the source on the Wayback Machine. The default value is `false`.
- `--filter_params` (boolean flag) — whether to limit exported parameters to a fixed allowed set. The default value is `false`.
- `--export_schema` (boolean flag) — export JSON schema files. The default value is `false`.

### Read the raw data

The script reads the raw data available at `src/technologydata/package_data/raw/Technology_datasheet_for_energy_storage.xlsx`, under sheet `alldata_flat`, in a `pandas` dataframe. It uses `pandas.read_excel(..., engine=calamine, dtype=str)`. All entries are handled as strings initially.

### Data cleaning, validation and dealing with missing/null values

The data cleaning and validation happens with the following steps.

Function `drop_invalid_rows(df)` validates whether required columns are present. It drops rows with missing/null or empty critical fields (`Technology`, `par`, `val`, `year`) and keeps rows where `year` contains a 4-digit year and `val` contains numeric characters and no comparator symbols (`<`, `>`, `≤`, `≥`).

Function `clean_technology_string()` normalizes text fields by removing leading 3-digit numeric codes, trims whitespace and lower-cases the string for consistent matching. It is applied to the columns `Technology` and `ws`. As an example, `clean_technology_string()` converts `151b Hydrogen Storage - LOHC` to `hydrogen storage - lohc`.

Function `extract_year()` extracts the first sequence of digits from the `year` column and converts it to an integer. The column contains in fact entries like `Uncertainty (2050)` (str) which are converted to `2050` (int).

Function `clean_parameter_string()` removes leading hyphens, removes text inside square brackets (units/notes), collapses extra spaces and lower-cases the parameter name. It is applied to the `par` column.

Function `standardize_units()` is applied to columns `par` and `unit`. It completes missing units based on parameter name (e.g., `energy storage capacity for one unit` is mapped to the unit `MWh`) via a parameter-to-unit map. Moreover, it replaces known incorrect unit strings as `⁰C` -> `C` or `m2` to `meter**2`. The unit substitutions are driven by the `pint` documentation available at this [link](https://github.com/hgrecco/pint/blob/master/pint/default_en.txt).

Function `Commons.update_unit_with_currency_year(unit, priceyear)`, if present, appends `priceyear` information to currency units. This is because `technologydata` follows the currency pattern `\b(?P<cu_iso3>[A-Z]{3})_(?P<year>\d{4})\b`, as for example `EUR_2021`.

Function `format_val_number(value, num_decimals)` parses numeric formats including comma decimal separators and scientific notation variants (e.g., `×10`) and converts them to float and rounds them to `num_decimals`.

The parser also applies the following corrections and substitutions:
- Convert `MEUR_2020` and `kEUR_2020`/`KEUR_2020` to `EUR_2020` and scale numeric `val` accordingly (×1e6 or ×1e3).
- Specific unit fixes (example: `mol/s/m/MPa1/2` → `mol/s/m/Pa` with value scaling).
- Certain `par` values (e.g., `energy storage capacity for one unit`, `tank volume of example`) are normalized to `capacity`.

Function `clean_est_string()` normalizes the `est` column by casefolding it and by replacing `ctrl` with `control`.

Function `filter_parameters(df, filter_flag)`, if `filter_flag` is true, keeps only an allowed set of parameters (e.g., `technical lifetime`, `fixed o&m`, `specific investment`, `variable o&m`, `charge efficiency`, `discharge efficiency`, `capacity`). Otherwise returns the full set.

### Populate and export the source and technology collections

Function `build_technology_collection()`:
- if `store_source` is set, constructs a `Source` object for the DEA dataset, calls `ensure_in_wayback()` and writes `sources.json`; otherwise reads an existing `sources.json`.
- groups the cleaned DataFrame by `est`, `year`, `ws`, `Technology`.
- for each group, builds a dictionary of `Parameter` objects (each with `magnitude`, `units`, `sources`, `provenance`).
- creates a `Technology` object for each group, with `name` = `ws`, `detailed_technology` = `Technology`, `year`=`year`, `region` = `EU`, `case` = `est` and collects them into a `TechnologyCollection` object.
- writes the `TechnologyCollection` object to a `technologies.json`.
- if `--export_schema` is used, schema files produced during export are moved to the sub-folder `src/technologydata/package_data/schemas`.

## Running the parser

### Execution instructions

From repository root:
- Basic run: `python src/technologydata/package_data/dea_energy_storage/dea_energy_storage.py`
- Example with options:
- `--num_digits 3 --store_source --filter_params --export_schema`

### Outputs

The parser generates the following outputs:
- `src/technologydata/package_data/dea_energy_storage/technologies.json`.
- `src/technologydata/package_data/dea_energy_storage/sources.json`.
- Optional schema files moved to `src/technologydata/package_data/schemas` when `--export_schema` is used.
10 changes: 9 additions & 1 deletion docs/home/citing.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# Citing

TODO
If you use `technologydata` for your research, we suggest the following way of citing it.

```text
technology-data: Data for Energy Systems Models.

The package is available at: https://github.com/open-energy-transition/technology-data/tree/prototype-2.

Authors: Johannes Hampp, Fabrizio Finozzi
```
12 changes: 9 additions & 3 deletions docs/home/users.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,18 @@ We would love to hear about your use case and stay in touch, in order to develop

## Academia and Research institutions

TODO
<!-- TODO -->

For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.

## Industry

TODO
<!-- TODO -->

For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.

## Institutions and Organisations

TODO
<!-- TODO -->

For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.
112 changes: 112 additions & 0 deletions docs/user_guide/datapackage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# `DataPackage` Class Documentation

<!--
SPDX-FileCopyrightText: The technology-data authors

SPDX-License-Identifier: MIT

-->

## Overview

The `DataPackage` class in `technologydata` provides a container for managing collections of `Technology` and `Source` objects, supporting batch operations and import/export utilities. It is designed to facilitate the organization, sharing, and processing of technology datasets, including provenance tracking and source management.

## Features

- **Technology Collection**: Stores a collection of `Technology` objects via the `TechnologyCollection` class.
- **Source Collection**: Stores a collection of `Source` objects via the `SourceCollection` class.
- **Batch Operations**: Supports batch export to JSON and CSV formats.
- **Source Extraction**: Automatically extracts and aggregates sources from all parameters in the technology collection.
- **Loading Utilities**: Provides methods to load a data package from JSON files.

## Usage Examples

### Creating a DataPackage

You can create a `DataPackage` by instantiating it directly or by loading from JSON files.

```python
from technologydata import DataPackage, TechnologyCollection, SourceCollection

# Create a DataPackage with existing collections
dp = DataPackage(
technologies=TechnologyCollection(...),
sources=SourceCollection(...),
)
```

### Loading from JSON

To load a `DataPackage` from a folder containing `technologies.json` and (optionally) `sources.json`:

```python
from technologydata import DataPackage
dp = DataPackage.from_json("path/to/data_package_folder")
```

This will automatically extract sources from the technologies if not already present.

### Exporting to JSON

Export the data package to JSON files in a specified folder:

```python
from technologydata import DataPackage, TechnologyCollection, SourceCollection

# Create a DataPackage with existing collections
dp = DataPackage(
technologies=TechnologyCollection(...),
sources=SourceCollection(...),
)
dp.to_json("path/to/output_folder")
```

### Exporting to CSV

Export the data package to CSV files:

```python
from technologydata import DataPackage, TechnologyCollection, SourceCollection

# Create a DataPackage with existing collections
dp = DataPackage(
technologies=TechnologyCollection(...),
sources=SourceCollection(...),
)

dp.to_csv("path/to/output_folder")
# Creates technologies.csv and sources.csv in the output folder
```

### Extracting Source Collection

The `sources` attribute of the `DataPackage` can be automatically populated by extracting the sources from the `TechnologyCollection`.

In this context, `extracting` means scanning the `TechnologyCollection` for all `Source` references that appear in the technology parameters, and aggregating them into a single `SourceCollection`. The extraction process yields a collection of unique sources, by removing duplicates based on all source attributes.

```python
from technologydata.datapackage import DataPackage
from technologydata.technology_collection import TechnologyCollection

# Create a DataPackage with existing collections
dp = DataPackage(
technologies=TechnologyCollection(...),
)

# Populate dp.sources with all unique sources from the technology collection
dp.get_source_collection()
```

Extracting the source collection can be useful in scenarios such as:
- When loading a data package that does not include a `sources.json` file, to ensure that all sources referenced in the technologies are captured.
- Before exporting the data package (to `sources.json`, CSV, or for sharing) so the package includes a consistent, central catalog of sources.
- When you need to produce provenance, citation lists, or run validations that require an explicit `SourceCollection`.

## API Reference

Please refer to the [API documentation](../api/datapackage.md) for detailed information on the `DataPackage` class methods and attributes.

## Limitations & Notes

- **Error Handling**: If neither technologies nor sources are available, source extraction will raise a `ValueError`.
- **No Data Validation**: The class assumes that the underlying `TechnologyCollection` and `SourceCollection` are valid and compatible.
2 changes: 1 addition & 1 deletion docs/models.md → docs/user_guide/models.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Title
# Growth Models Documentation

<!--
SPDX-FileCopyrightText: 2025 The technology-data authors
Expand Down
Loading