open-energy-transition · finozzifa · Nov 5, 2025 · Nov 5, 2025 · Nov 6, 2025 · Nov 6, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -72,4 +72,4 @@ repos:
   rev: v0.0.99  # Use the latest release tag
   hooks:
   - id: rumdl
-    args: ["--fix", "--disable=MD013, MD024, MD030"]
+    args: ["--fix", "--disable=MD006, MD013, MD024, MD030, MD041"]
diff --git a/CITATIONS.cff b/CITATIONS.cff
@@ -0,0 +1,17 @@
+# SPDX-FileCopyrightText: The technology-data authors
+#
+# SPDX-License-Identifier: MIT
+
+cff-version: 1.2.0
+message: "If you use this package, we suggest the following way of citing it."
+title: "technology-data: Data for Energy Systems Models"
+repository: https://github.com/pypsa/technology-data
+version: v2-prototype # TODO: Update automatically using a release script
+license: MIT
+authors:
+  - family-names: Hampp
+    given-names: Johannes
+    orcid: 0000-0002-1776-116X
+  - family-names: Finozzi
+    given-names: Fabrizio
+    orcid: 0009-0002-3876-2564
diff --git a/docs/api/commons/commons.md b/docs/api/commons/commons.md
@@ -0,0 +1 @@
+::: technologydata.utils.commons.Commons
diff --git a/docs/api/commons/dateformatenum.md b/docs/api/commons/dateformatenum.md
@@ -0,0 +1 @@
+::: technologydata.utils.commons.DateFormatEnum
diff --git a/docs/api/commons/fileextensionenum.md b/docs/api/commons/fileextensionenum.md
@@ -0,0 +1 @@
+::: technologydata.utils.commons.FileExtensionEnum
diff --git a/docs/api/datapackage.md b/docs/api/datapackage.md
@@ -0,0 +1 @@
+::: technologydata.datapackage.DataPackage
diff --git a/docs/api/parameter.md b/docs/api/parameter.md
@@ -0,0 +1 @@
+::: technologydata.parameter.Parameter
diff --git a/docs/api/source.md b/docs/api/source.md
@@ -0,0 +1 @@
+::: technologydata.source.Source
diff --git a/docs/api/source_collection.md b/docs/api/source_collection.md
@@ -0,0 +1 @@
+::: technologydata.source_collection.SourceCollection
diff --git a/docs/api/technology.md b/docs/api/technology.md
@@ -0,0 +1 @@
+::: technologydata.technology.Technology
diff --git a/docs/api/technology_collection.md b/docs/api/technology_collection.md
@@ -0,0 +1 @@
+::: technologydata.technology_collection.TechnologyCollection
diff --git a/docs/api/units/customundefineduniterror.md b/docs/api/units/customundefineduniterror.md
@@ -0,0 +1 @@
+::: technologydata.utils.units.CustomUndefinedUnitError
diff --git a/docs/api/units/specialunitregistry.md b/docs/api/units/specialunitregistry.md
@@ -0,0 +1 @@
+::: technologydata.utils.units.SpecialUnitRegistry
diff --git a/docs/contributing/instructions.md b/docs/contributing/instructions.md
@@ -17,7 +17,7 @@ We enthusiastically invite anyone interested in `technologydata` to share new id
 
 ## Where to go for help
 
-- To **discuss** with other `technologydata` users, organise projects, share news, and get in touch with the community, please refer to the [Contacts](/docs/home/contacts.md) page.
+- To **discuss** with other `technologydata` users, organise projects, share news, and get in touch with the community, please refer to the [Contacts](../home/contacts.md) page.
 - For **guidelines to contribute**, stay right here.
 
 ## Code contributions

diff --git a/docs/examples/dea_storage.md b/docs/examples/dea_storage.md
@@ -0,0 +1,80 @@
+# Danish Energy Agency Parser Documentation
+
+## Overview
+
+The Danish Energy Agency (DEA) data parser `dea_energy_storage.py` demonstrates a full data-cleaning and transformation pipeline for converting raw tabular data into the `technologydata` schema files `technologies.json` and `sources.json`. The parser is implemented in `src/technologydata/package_data/dea_energy_storage/dea_energy_storage.py`.
+
+## Dataset Description
+
+The original dataset is available at this [link](https://ens.dk/media/6589/download). A full description of the dataset is available at this [link](https://ens.dk/media/6588/download). The raw source file is included in the repository at `src/technologydata/package_data/raw/Technology_datasheet_for_energy_storage.xlsx`.
+
+The dataset is in Excel format, and it includes, under the data sheet `alldata_flat`, a flat table of technology parameters for a range of energy storage technologies. Columns include `Technology`, `ws`, `par` (parameter name), `val` (value), `unit`, `year`, `est` (case/estimate), `priceyear`, plus metadata columns such as `cat`, `ref`, `note`. Rows are individual parameter records (parameter value + unit + context) for technologies and estimation cases.
+
+## Parser description
+
+The parser is articulated in the following steps.
+
+### Command line argument parsing
+
+Function `parse_input_arguments()` defines and parses the command-line arguments:
+- `--num_digits` (int, default 4) — number of decimals used when rounding numeric values. The default value is 4.
+- `--store_source` (boolean flag) — whether to store the source on the Wayback Machine. The default value is `false`.
+- `--filter_params` (boolean flag) — whether to limit exported parameters to a fixed allowed set. The default value is `false`.
+- `--export_schema` (boolean flag) — export JSON schema files. The default value is `false`.
+
+### Read the raw data
+
+The script reads the raw data available at `src/technologydata/package_data/raw/Technology_datasheet_for_energy_storage.xlsx`, under sheet `alldata_flat`, in a `pandas` dataframe. It uses `pandas.read_excel(..., engine=calamine, dtype=str)`. All entries are handled as strings initially.
+
+### Data cleaning, validation and dealing with missing/null values
+
+The data cleaning and validation happens with the following steps.
+
+Function `drop_invalid_rows(df)` validates whether required columns are present. It drops rows with missing/null or empty critical fields (`Technology`, `par`, `val`, `year`) and keeps rows where `year` contains a 4-digit year and `val` contains numeric characters and no comparator symbols (`<`, `>`, `≤`, `≥`).
+
+Function `clean_technology_string()` normalizes text fields by removing leading 3-digit numeric codes, trims whitespace and lower-cases the string for consistent matching. It is applied to the columns `Technology` and `ws`. As an example, `clean_technology_string()` converts `151b Hydrogen Storage - LOHC` to `hydrogen storage - lohc`.
+
+Function `extract_year()` extracts the first sequence of digits from the `year` column and converts it to an integer. The column contains in fact entries like `Uncertainty (2050)` (str) which are converted to `2050` (int).
+
+Function `clean_parameter_string()` removes leading hyphens, removes text inside square brackets (units/notes), collapses extra spaces and lower-cases the parameter name. It is applied to the `par` column.
+
+Function `standardize_units()` is applied to columns `par` and `unit`. It completes missing units based on parameter name (e.g., `energy storage capacity for one unit` is mapped to the unit `MWh`) via a parameter-to-unit map. Moreover, it replaces known incorrect unit strings as `⁰C` -> `C` or `m2` to `meter**2`. The unit substitutions are driven by the `pint` documentation available at this [link](https://github.com/hgrecco/pint/blob/master/pint/default_en.txt).
+
+Function `Commons.update_unit_with_currency_year(unit, priceyear)`, if present, appends `priceyear` information to currency units. This is because `technologydata` follows the currency pattern `\b(?P<cu_iso3>[A-Z]{3})_(?P<year>\d{4})\b`, as for example `EUR_2021`.
+
+Function `format_val_number(value, num_decimals)` parses numeric formats including comma decimal separators and scientific notation variants (e.g., `×10`) and converts them to float and rounds them to `num_decimals`.
+
+The parser also applies the following corrections and substitutions:
+- Convert `MEUR_2020` and `kEUR_2020`/`KEUR_2020` to `EUR_2020` and scale numeric `val` accordingly (×1e6 or ×1e3).
+- Specific unit fixes (example: `mol/s/m/MPa1/2` → `mol/s/m/Pa` with value scaling).
+- Certain `par` values (e.g., `energy storage capacity for one unit`, `tank volume of example`) are normalized to `capacity`.
+
+Function `clean_est_string()` normalizes the `est` column by casefolding it and by replacing `ctrl` with `control`.
+
+Function `filter_parameters(df, filter_flag)`, if `filter_flag` is true, keeps only an allowed set of parameters (e.g., `technical lifetime`, `fixed o&m`, `specific investment`, `variable o&m`, `charge efficiency`, `discharge efficiency`, `capacity`). Otherwise returns the full set.
+
+### Populate and export the source and technology collections
+
+Function `build_technology_collection()`:
+- if `store_source` is set, constructs a `Source` object for the DEA dataset, calls `ensure_in_wayback()` and writes `sources.json`; otherwise reads an existing `sources.json`.
+- groups the cleaned DataFrame by `est`, `year`, `ws`, `Technology`.
+- for each group, builds a dictionary of `Parameter` objects (each with `magnitude`, `units`, `sources`, `provenance`).
+- creates a `Technology` object for each group, with `name` = `ws`, `detailed_technology` = `Technology`, `year`=`year`, `region` = `EU`, `case` = `est` and collects them into a `TechnologyCollection` object.
+- writes the `TechnologyCollection` object to a `technologies.json`.
+- if `--export_schema` is used, schema files produced during export are moved to the sub-folder `src/technologydata/package_data/schemas`.
+
+## Running the parser
+
+### Execution instructions
+
+From repository root:
+- Basic run: `python src/technologydata/package_data/dea_energy_storage/dea_energy_storage.py`
+- Example with options:
+  - `--num_digits 3 --store_source --filter_params --export_schema`
+
+### Outputs
+
+The parser generates the following outputs:
+- `src/technologydata/package_data/dea_energy_storage/technologies.json`.
+- `src/technologydata/package_data/dea_energy_storage/sources.json`.
+- Optional schema files moved to `src/technologydata/package_data/schemas` when `--export_schema` is used.
diff --git a/docs/home/citing.md b/docs/home/citing.md
@@ -1,3 +1,11 @@
 # Citing
 
-TODO
+If you use `technologydata` for your research, we suggest the following way of citing it.
+
+```text
+technology-data: Data for Energy Systems Models.
+
+The package is available at: https://github.com/open-energy-transition/technology-data/tree/prototype-2.
+
+Authors: Johannes Hampp, Fabrizio Finozzi
+```
diff --git a/docs/home/users.md b/docs/home/users.md
@@ -8,12 +8,18 @@ We would love to hear about your use case and stay in touch, in order to develop
 
 ## Academia and Research institutions
 
-TODO
+<!-- TODO -->
+
+For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.
 
 ## Industry
 
-TODO
+<!-- TODO -->
+
+For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.
 
 ## Institutions and Organisations
 
-TODO
+<!-- TODO -->
+
+For the moment, this is a prototype of the new version of `technologydata`. Users will be specified at a later stage.
diff --git a/docs/user_guide/datapackage.md b/docs/user_guide/datapackage.md
@@ -0,0 +1,112 @@
+# `DataPackage` Class Documentation
+
+<!--
+SPDX-FileCopyrightText: The technology-data authors
+
+SPDX-License-Identifier: MIT
+
+-->
+
+## Overview
+
+The `DataPackage` class in `technologydata` provides a container for managing collections of `Technology` and `Source` objects, supporting batch operations and import/export utilities. It is designed to facilitate the organization, sharing, and processing of technology datasets, including provenance tracking and source management.
+
+## Features
+
+- **Technology Collection**: Stores a collection of `Technology` objects via the `TechnologyCollection` class.
+- **Source Collection**: Stores a collection of `Source` objects via the `SourceCollection` class.
+- **Batch Operations**: Supports batch export to JSON and CSV formats.
+- **Source Extraction**: Automatically extracts and aggregates sources from all parameters in the technology collection.
+- **Loading Utilities**: Provides methods to load a data package from JSON files.
+
+## Usage Examples
+
+### Creating a DataPackage
+
+You can create a `DataPackage` by instantiating it directly or by loading from JSON files.
+
+```python
+from technologydata import DataPackage, TechnologyCollection, SourceCollection
+
+# Create a DataPackage with existing collections
+dp = DataPackage(
+    technologies=TechnologyCollection(...),
+    sources=SourceCollection(...),
+)
+```
+
+### Loading from JSON
+
+To load a `DataPackage` from a folder containing `technologies.json` and (optionally) `sources.json`:
+
+```python
+from technologydata import DataPackage
+dp = DataPackage.from_json("path/to/data_package_folder")
+```
+
+This will automatically extract sources from the technologies if not already present.
+
+### Exporting to JSON
+
+Export the data package to JSON files in a specified folder:
+
+```python
+from technologydata import DataPackage, TechnologyCollection, SourceCollection
+
+# Create a DataPackage with existing collections
+dp = DataPackage(
+    technologies=TechnologyCollection(...),
+    sources=SourceCollection(...),
+)
+dp.to_json("path/to/output_folder")
+```
+
+### Exporting to CSV
+
+Export the data package to CSV files:
+
+```python
+from technologydata import DataPackage, TechnologyCollection, SourceCollection
+
+# Create a DataPackage with existing collections
+dp = DataPackage(
+    technologies=TechnologyCollection(...),
+    sources=SourceCollection(...),
+)
+
+dp.to_csv("path/to/output_folder")
+# Creates technologies.csv and sources.csv in the output folder
+```
+
+### Extracting Source Collection
+
+The `sources` attribute of the `DataPackage` can be automatically populated by extracting the sources from the `TechnologyCollection`.
+
+In this context, `extracting` means scanning the `TechnologyCollection` for all `Source` references that appear in the technology parameters, and aggregating them into a single `SourceCollection`. The extraction process yields a collection of unique sources, by removing duplicates based on all source attributes.
+
+```python
+from technologydata.datapackage import DataPackage
+from technologydata.technology_collection import TechnologyCollection
+
+# Create a DataPackage with existing collections
+dp = DataPackage(
+    technologies=TechnologyCollection(...),
+)
+
+# Populate dp.sources with all unique sources from the technology collection
+dp.get_source_collection()
+```
+
+Extracting the source collection can be useful in scenarios such as:
+- When loading a data package that does not include a `sources.json` file, to ensure that all sources referenced in the technologies are captured.
+- Before exporting the data package (to `sources.json`, CSV, or for sharing) so the package includes a consistent, central catalog of sources.
+- When you need to produce provenance, citation lists, or run validations that require an explicit `SourceCollection`.
+
+## API Reference
+
+Please refer to the [API documentation](../api/datapackage.md) for detailed information on the `DataPackage` class methods and attributes.
+
+## Limitations & Notes
+
+- **Error Handling**: If neither technologies nor sources are available, source extraction will raise a `ValueError`.
+- **No Data Validation**: The class assumes that the underlying `TechnologyCollection` and `SourceCollection` are valid and compatible.
diff --git a/docs/models.md → docs/user_guide/models.md b/docs/models.md → docs/user_guide/models.md
@@ -1,4 +1,4 @@
-# Title
+# Growth Models Documentation
 
 <!--
 SPDX-FileCopyrightText: 2025 The technology-data authors
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.utils.commons.DateFormatEnum
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.utils.commons.FileExtensionEnum
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.source_collection.SourceCollection
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.technology_collection.TechnologyCollection
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.utils.units.CustomUndefinedUnitError
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		::: technologydata.utils.units.SpecialUnitRegistry