The Federal Energy Regulatory Commission (FERC) has moved to collecting and distributing data using XBRL. XBRL is primarily designed for financial reporting, and has been adopted by regulators in the US and other countries. Much of the tooling in the XBRL ecosystem is targeted towards filers, and rendering individual filings in a human readable way, but there is very little targeted towards accessing and analyzing large collections of filings.
The FERC XBRL Extractor is designed to provide that functionality for FERC XBRL data. The library can extract data from a set of XBRL filings, and write that data to SQLite or DuckDB databases whose structure is derived from an XBRL Taxonomy. While each XBRL instance contains a reference to a taxonomy, this tool requires a path to a single taxonomy that will be used to interpret all instances being processed. This means even if instances were created from different versions of a taxonomy, the provided taxonomy will be used when processing all of these instances, so the output database will have a consistent structure. For more information on the technical details of the XBRL extraction, see the docs.
Catalyst Cooperative is currently using this tool to extract and publish the following FERC data. These outputs are updatded at least annually, and typically quarterly.
The package can be installed from PyPI or conda-forge using your package manager of choice:
pip install catalystcoop.ferc-xbrl-extractor
uv pip install catalystcoop.ferc-xbrl-extractorconda install catalystcoop.ferc_xbrl_extractor
mamba install catalystcoop.ferc_xbrl_extractor
pixi install catalystcoop.ferc_xbrl_extractorThe FERC XBRL Extractor is generally intended to consume raw XBRL filings and taxonomy information from one of the archives Catalyst Cooperative has published on Zenodo. Each supported form has its own archive lineage, with new snapshots captured from FERC's XBRL filing RSS feeds on a regular basis (see links in the table above). The tool also expects to receive a zipfile containing archived taxonomies.
The archived filings and taxonomies are both produced using the pudl-archiver. The extractor will parse all taxonomies in the archive, then use the taxonomy referenced in each filing while parsing it.
This tool can be used as a library, as it is in PUDL. There is also a CLI provided for interacting with XBRL data. The only required options for the CLI are a path to the filings to be extracted, and a path to the output database. The path to the filings can point to a directory full of XBRL Filings, a single XBRL filing, or a zipfile with XBRL filings. If the specified output database already exists, it will be overwritten.
xbrl_extract {path_to_filings} --sqlite-path {path_to_database}This repo contains a small selection of FERC Form 1 filings from 2021, along with
an archive of taxonomies in the examples directory. To test the tool on these
filings, use the command:
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path ./ferc1-2021-sample.sqlite \
--taxonomy examples/ferc1-xbrl-taxonomies.zipParsing XBRL filings can be a time consuming and CPU heavy task, so this tool
implements some basic multiprocessing to speed this up. It uses a
process pool
to do this. There are two options for configuring the process pool, --batch-size
and --workers. The batch size configures how many filings will be processed by
each child process at a time, and workers specifies how many child processes to
create in the pool. It may take some experimentation to get these options
optimally configured. The following command will use 5 worker processes to process
batches of 50 filings at a time. It will also output both SQLite and DuckDB.
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path ferc1-2021-sample.sqlite \
--duckdb-path ferc1-2021-sample.duckdb \
--taxonomy examples/ferc1-xbrl-taxonomies.zip \
--workers 5 \
--batch-size 50There are also several options included for extracting metadata from the taxonomy.
First is the --datapackage-path command to save a
frictionless datapackage
descriptor as JSON, which annotates the generated SQLite database. There is also the
--metadata-path option, which writes more extensive taxonomy metadata to a json
file, grouped by table name. See the ferc_xbrl_extractor.arelle_interface module
for more info on the extracted metadata. To create both of these files using the example
filings and taxonomy, run the following command.
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path /ferc1-2021-sample.sqlite \
--taxonomy examples/ferc1-xbrl-taxonomies.zip \
--metadata-path metadata.json \
--datapackage-path datapackage.jsonThis project uses uv for dependency management and Hatch for environment and task management. It also includes several git pre-commit hooks that help enforce standard coding practices. To set up the environment for development first ensure you have uv installed and then:
# Clone the repository to your local machine
git clone https://github.com/catalyst-cooperative/ferc-xbrl-extractor.git
cd ferc-xbrl-extractor
# Create the development environment with hatch
uv tool install hatch
hatch env create
# Install the pre-commit hooks
hatch run pre-commit installAll available development environments and commands can be shown with:
hatch env showSome of the available commands:
# Run all tests and collect coverage
hatch run test:all
# Run only unit tests
hatch run test:unit
# Run only integration tests
hatch run test:integration
# Run linters and formatters
hatch run lint:all
# Check code without modifying
hatch run lint:check
# Format code
hatch run lint:format
# Build documentation
hatch run docs:build
# Check documentation formatting
hatch run docs:checkCode style is enforced using ruff with configuration
in pyproject.toml.
This package is part of the Public Utility Data Liberation (PUDL) project.
The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They're also involved in our quarterly planning process. To learn more see the PUDL Project on Open Collective.