FERC XBRL Extractor

The Federal Energy Regulatory Commission (FERC) has moved to collecting and distributing data using XBRL. XBRL is primarily designed for financial reporting, and has been adopted by regulators in the US and other countries. Much of the tooling in the XBRL ecosystem is targeted towards filers, and rendering individual filings in a human readable way, but there is very little targeted towards accessing and analyzing large collections of filings.

The FERC XBRL Extractor is designed to provide that functionality for FERC XBRL data. The library can extract data from a set of XBRL filings, and write that data to SQLite or DuckDB databases whose structure is derived from an XBRL Taxonomy. While each XBRL instance contains a reference to a taxonomy, this tool requires a path to a single taxonomy that will be used to interpret all instances being processed. This means even if instances were created from different versions of a taxonomy, the provided taxonomy will be used when processing all of these instances, so the output database will have a consistent structure. For more information on the technical details of the XBRL extraction, see the docs.

Catalyst Cooperative is currently using this tool to extract and publish the following FERC data. These outputs are updatded at least annually, and typically quarterly.

FERC Form	Taxonomy	Raw Data	SQLite	DuckDB
Form 1 (Electricity)	Browse	10.5281/zenodo.4127043	Download	Download
Form 2 (Natural Gas)	Browse	10.5281/zenodo.5879542	Download	Download
Form 6 (Oil)	Browse	10.5281/zenodo.7126395	Download	Download
Form 60 (Service Companies)	Browse	10.5281/zenodo.7126434	Download	Download
Form 714 (Balancing Authorities)	Browse	10.5281/zenodo.4127100	Download	Download

Usage

Installation

The package can be installed from PyPI or conda-forge using your package manager of choice:

From PyPI

pip install catalystcoop.ferc-xbrl-extractor
uv pip install catalystcoop.ferc-xbrl-extractor

From `conda-forge`

conda install catalystcoop.ferc_xbrl_extractor
mamba install catalystcoop.ferc_xbrl_extractor
pixi install catalystcoop.ferc_xbrl_extractor

Input Data

The FERC XBRL Extractor is generally intended to consume raw XBRL filings and taxonomy information from one of the archives Catalyst Cooperative has published on Zenodo. Each supported form has its own archive lineage, with new snapshots captured from FERC's XBRL filing RSS feeds on a regular basis (see links in the table above). The tool also expects to receive a zipfile containing archived taxonomies.

The archived filings and taxonomies are both produced using the pudl-archiver. The extractor will parse all taxonomies in the archive, then use the taxonomy referenced in each filing while parsing it.

CLI

This tool can be used as a library, as it is in PUDL. There is also a CLI provided for interacting with XBRL data. The only required options for the CLI are a path to the filings to be extracted, and a path to the output database. The path to the filings can point to a directory full of XBRL Filings, a single XBRL filing, or a zipfile with XBRL filings. If the specified output database already exists, it will be overwritten.

xbrl_extract {path_to_filings} --sqlite-path {path_to_database}

This repo contains a small selection of FERC Form 1 filings from 2021, along with an archive of taxonomies in the examples directory. To test the tool on these filings, use the command:

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ./ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip

Parsing XBRL filings can be a time consuming and CPU heavy task, so this tool implements some basic multiprocessing to speed this up. It uses a process pool to do this. There are two options for configuring the process pool, --batch-size and --workers. The batch size configures how many filings will be processed by each child process at a time, and workers specifies how many child processes to create in the pool. It may take some experimentation to get these options optimally configured. The following command will use 5 worker processes to process batches of 50 filings at a time. It will also output both SQLite and DuckDB.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ferc1-2021-sample.sqlite \
    --duckdb-path ferc1-2021-sample.duckdb \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --workers 5 \
    --batch-size 50

There are also several options included for extracting metadata from the taxonomy. First is the --datapackage-path command to save a frictionless datapackage descriptor as JSON, which annotates the generated SQLite database. There is also the --metadata-path option, which writes more extensive taxonomy metadata to a json file, grouped by table name. See the ferc_xbrl_extractor.arelle_interface module for more info on the extracted metadata. To create both of these files using the example filings and taxonomy, run the following command.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path /ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --metadata-path metadata.json \
    --datapackage-path datapackage.json

Contributing / Development

This project uses uv for dependency management and Hatch for environment and task management. It also includes several git pre-commit hooks that help enforce standard coding practices. To set up the environment for development first ensure you have uv installed and then:

# Clone the repository to your local machine
git clone https://github.com/catalyst-cooperative/ferc-xbrl-extractor.git
cd ferc-xbrl-extractor
# Create the development environment with hatch
uv tool install hatch
hatch env create
# Install the pre-commit hooks
hatch run pre-commit install

All available development environments and commands can be shown with:

hatch env show

Some of the available commands:

# Run all tests and collect coverage
hatch run test:all
# Run only unit tests
hatch run test:unit
# Run only integration tests
hatch run test:integration
# Run linters and formatters
hatch run lint:all
# Check code without modifying
hatch run lint:check
# Format code
hatch run lint:format
# Build documentation
hatch run docs:build
# Check documentation formatting
hatch run docs:check

Code style is enforced using ruff with configuration in pyproject.toml.

PUDL Sustainers

This package is part of the Public Utility Data Liberation (PUDL) project.

The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They're also involved in our quarterly planning process. To learn more see the PUDL Project on Open Collective.

Name		Name	Last commit message	Last commit date
Latest commit History 673 Commits
.github		.github
docs		docs
examples		examples
src/ferc_xbrl_extractor		src/ferc_xbrl_extractor
tests		tests
.bandit.yml		.bandit.yml
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

FERC XBRL Extractor

Usage

Installation

From PyPI

From `conda-forge`

Input Data

CLI

Contributing / Development

PUDL Sustainers

About

Uh oh!

Releases 24

Sponsor this project

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

Uh oh!

License

catalyst-cooperative/ferc-xbrl-extractor

Folders and files

Latest commit

History

Repository files navigation

FERC XBRL Extractor

Usage

Installation

From PyPI

From conda-forge

Input Data

CLI

Contributing / Development

PUDL Sustainers

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 24

Sponsor this project

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

From `conda-forge`