Welcome to workflow.scenario.preparation! This tool is designed to streamline the preparation of input scenario datasets for use in either workflow.data.preparation or the r2dii R packages.
Ensure the following files exist in your input directory (default ./inputs):
For GECO 2022, prepare the following files (TODO: Enhance this section):
geco2022_automotive_stocks_geco2021_retirement_rates_CORRECTED.csvGECO2022_Aviation_processed_data.csvgeco2022_15c_ff_rawdata.csvgeco2022_ndc_ff_rawdata.csvgeco2022_ref_ff_rawdata.csvgeco2022_15c_power_rawdata_region.csvgeco2022_ndc_power_rawdata_region.csvgeco2022_ref_power_rawdata_region.csvGECO2022_Steel_processed_data.csv
Use the R_CONFIG_ACTIVE environment variable to specify the active configuration in config.yml. This configuration file determines the active quarter, expected scenarios, and the location of raw scenario files.
This file is only necessary for running with Docker
Create a .env file in the root directory with the following structure:
SCENARIO_PREPARATION_INPUTS_PATH=/PATH/TO/SCENARIO/DATA/INPUTS
SCENARIO_PREPARATION_OUTPUTS_PATH=/PATH/TO/SCENARIO/DATA/OUTPUTS
R_CONFIG_ACTIVE=YYYYQQYou can use the example.env file as a template.
This file specifies the input/output directories and the active configuration (see config.yml for details).
Execute docker-compose up from the root directory to build the Docker image (if necessary) and run the scenario preparation process.
To force a rebuild of the Docker image, use docker-compose build --no-cache.
Running in RStudio supports input/output data in the .inputs/ and ./outputs/ directories, respectively (relative to the root directory).
Set R_CONFIG_ACTIVE:
Sys.setenv(R_CONFIG_ACTIVE = "YYYYQQ")Then, source main.R:
source("main.R")Alternatively, you can step through the script line-by-line for debugging.
Alternatively, you can read in the .env file (specified above for the Docker process) and run the process with:
readRenviron(".env"); source("main.R").env file if you try to build/run the Docker container from there.
A parameter file with the values that the RMI-PACTA team uses for extracting data is available at azure-deploy.rmi-pacta.parameters.json.
# run from repo root
# change this value as needed.
RESOURCEGROUP="RMI-SP-PACTA-DEV"
# Users with access to the RMI-PACTA Azure subscription can run:
az deployment group create --resource-group "$RESOURCEGROUP" --template-file azure-deploy.json --parameters azure-deploy.rmi-pacta.parameters.json
For security, the RMI-PACTA parameters file makes heavy use of extracting secrets from an Azure Key vault, but an example file that passes parameters "in the clear" is available as azure-deploy.example.parameters.json
Non RMI-PACTA users can define their own parameters and invoke the ARM Template with:
# Otherwise:
# Prompts for parameters without defaults
az deployment group create --resource-group "$RESOURCEGROUP" --template-file azure-deploy.json
# if you have created your own parameters file:
az deployment group create --resource-group "$RESOURCEGROUP" --template-file azure-deploy.json --parameters @azure-deploy.parameters.jsonThe GitHub Actions workflow to run this workflow starts an Azure Container Instance. To prepare the Azure landscape:
- Create a User Assigned Managed identity for the repo as described here
- Manually start a container group with
azure-deploy.jsonas documented above - Grant
Contributorrole on the new Container Group to the Managed Identity - Grant
Managed Application Contributor RoleRole to the Managed Identity for the Resource Group in which the Container Group will run - Ensure the Managed identity has deploy permissions to the key vault (if needed)
- Ensure the Managed Identity has the
Managed Identity OperatorRole for the managed idenity used by the container group (specified with theidentityparameter in the deploy template).
See the Microsoft documentation for more information on setting up GH Actions.