Skip to content

Combine datasets to one region or province

Thomas edited this page Oct 12, 2023 · 11 revisions

Invoking the dataset combiner from the command-line

  1. Open your terminal and navigate to your etlocal directory.
  2. Once there, type the rails dataset:combine --help command to get a full help message that describes how to combine different datasets, and which arguments the rails dataset:combine command supports.

For example, if you want to create a combined dataset for the province of Groningen you should issue the following command:

rails dataset:combine target_dataset_geo_id=PV20 target_area_name=Groningen
  source_data_year=2019 source_dataset_geo_ids=GM306,GM307,GM308,(...other municipality geo-ids)
  migration_slug=groningen_2019
  1. A new folder should now have been created in your etlocal/db/migrate directory containing two files: commits.yml and data.csv. Check the commits.yml file: are all datasets you wanted to combine stated here? You can also correct any spelling mistakes if necessary.
  2. Create a new branch (based on the production branch) if you are content with the changes.
  3. Run rake db:migrate to apply your newly created dataset.
  4. Commit the new files (including schema.rb!) and create a PR.

Combination method: weighted_average

The weighted_average combination method in interface element files now supports a nested aggregation method: sum. You can use this aggregation method by adding a key called 'sum' nested under weighted_average, within which you nest item keys for which the value should be summed. For example:

key: buildings_applications
groups:
  ...
  - header: buildings_electricity_space_heating_distribution
    type: slider
    combination_method:
      weighted_average:
        - sum:
          - input_buildings_electricity_demand
          - buildings_final_demand_electricity_buildings_final_demand_for_space_heating_electricity_parent_share
        - input_buildings_electricity_demand
        - buildings_final_demand_electricity_buildings_final_demand_for_space_heating_electricity_parent_share
   ...

When adding a sum key under the weighted_average as shown above, the dataset combiner will:

  1. Sum the values for the keys under it,
  2. Then use the outcome as one of the values to calculate the weighted average over.