Retrieve DTE outage data.
The main.mjs script runs every hour on GitHub Actions.
- The script retrieves high-level data from the internal DTE Kubra API and writes to
data-api.json. - The script also retrieves high-level data from the external-facing DTE dashboard and writes to
data-home.json. - In addition, the script writes to
data.csvwith more granular Kubra data based on ZIP Code.
The script overwrites these files on each run, but we are still able to retrieve historical data based on Git commit history using the git-history package authored by Simon Willison.
- Run
python3 -m venv venvto create a virtual environment. - Run
source venv/bin/activateto activate the virtual environment. - Run
pip install -r requirements.txtto install dependencies.
- Run
./data-api.shto generate a SQL database atdata-api.sqliteof historical data based ondata-api.jsoncommit history. - Run
./data-home.shto generate a SQL database atdata-home.sqliteof historical data based ondata-home.jsoncommit history. - Run
./data.shto generate a SQL database atdata.sqliteof historical data based ondata.csvcommit history.
All pertinent data is stored in the item table.
These scripts will likely take several seconds to execute. Note that for both files, there will be a gap in data from March 1 to April 26 during which the data-api.json and data-home.json files did not exist in Git history. However, you can use the archive/data-api.csv and archive/data-home.csv files to fill in these gaps. The data.csv file does not have such a gap.
The data-api.csv, data-home.csv and history.csv files are analogous to the main data-api.json, data-home.json and data.csv files, respectively. These CSV files store data from around late February 2023 to late April 2023. During that time range, these were the primary storage files. On each workflow run, a line (or several lines in the case of history.csv) would be appended to the file. This was useful for conducting live visualizaiton, but also led to large file and repository sizes that slowed down the workflow. Consider that history.csv is 16 MB!
The history-reduced.csv file is a version of the history.csv file where entries from the same timestamp are compressed into one data row, summing across the total number of customers affected. This was useful for visualization performance purposes as it meant we could load a smaller file on the client.
The files in the backfill directory were used to backfill DTE outage data from their Kubra API that we had not been retrieving. In general, it is difficult to get historical data out of DTE. However, in our main script (main.mjs), we log out a Kubra URL slug that can be used to retrieve historical data. GitHub Actions store a record of anything logged by a workflow.
- First, we retrieved the identifier of each GitHub Action workflow run with
gh run list --limit 1000 > action-runs.txt. At the time we wanted to conduct the backfill, there were less than 1000 runs. - After compiling a list of run identifiers, we ran
./action-slugs.shto retrieve the Kubra URL slug logged from each run. The slugs were written toaction-slugs.txt. - Then, we ran
node data-api.mjsto backfill to thedata-api.csvfile.
The reconcile.mjs script was used to merge a local temporary historical file with the main file. It was a one-off script.
- Shao Hsuan Wu from The Michigan Daily wrote a news briefing on power outages in Southeastern Michigan during February 2023. Eric Lau contributed data visualizations using this data.
- Eric Lau visualized discrepancies between external and internal DTE outage data.
- Eric Lau presented on the data to the Michigan Public Service Commission (MPSC) as part of a technical conference to address energy resiliency in Michigan.
- Simon Willison has written several blog posts on using GitHub Actions and
git-historyto scrape data. - Open Kentuckiana, a group of technologists from Kentucky, conducted a similar power outage analysis using the Kubra API. Their analysis is more granular with respect to geospatial clusters of outages.