Skip to content

Latest commit

 

History

History
126 lines (102 loc) · 6.03 KB

File metadata and controls

126 lines (102 loc) · 6.03 KB

Project Management

The people working on PUDL are distributed all over North America. Collaboration takes place online. We make extensive use of Github's build int project management tools and we work in public. You can follow our progress in our GitHub Projects

Issues and Project Tracking

We use Github issues to track bugs, enhancements, support requests, and just about any other work that goes into the project. Try to make sure that issues have informative tags so we can find them easily.

Bug triage

When a new issue is discovered, we need to determine how urgent it is to address.

Our core offering is complete, connected, and granular data. Issues that interrupt the availability of that data are of highest importance.

However, not all datasets in PUDL are the same; some are mature and have many downstream users; others are more experimental. We've split them into "tier 1" and "tier 2" groups below.

Tier 1 datasets * FERC 1 schedules XYZ * EIA 860 - XYZ tables * EIA 923 - XYZ tables * EPA CEMS

Tier 2 datasets Everything else

This then informs some reliability goals:

For tier 1 tables:

  • latest source data is incorporated into PUDL within 1 month of publication
  • nightly data build using latest PUDL code is available within 3 business days of any code changes
  • missing/incorrect data starts to be addressed within 2 weeks

For tier 2 tables we only shoot for nightly builds being available within 3 business days.

Which then, in turn, informs our bug triage guidelines:

Urgent (find some way to address ASAP) - nightly build failures - datasette not available - incorrect data in distribution buckets

High (prioritize in the upcoming sprint planning) - missing/incorrect data in Tier 1 tables - new Tier 1 source data available

Medium (stuff in a backlog and don't forget about it) - new Tier 2 source data available

Our GitHub Workflow

  • We have 3 persistent branches: main (the default branch), nightly, and stable.
  • We create temporary feature branches off of main and make pull requests into main throughout our 2 week long sprints. All code that's merged into main should have passed our CI tests and been reviewed by at least one other person.
  • Every night the main branch is used to run the :ref:`nightly-data-builds`. If the builds are successful, then the nightly branch is automatically updated to point to the latest commit on main. If the builds fail, then the nightly branch is left unchanged.
  • Every time we do a versioned data release, the stable branch is updated to point to the commit associated with the most recent release.

Pull Requests

  • Before making a PR, make sure the tests run and pass locally, including the code linters and pre-commit hooks. See :ref:`linting` for details.
  • Don't forget to merge any new commits to the main branch into your feature branch before making a PR.
  • If for some reason the continuous integration tests fail for your PR, try and figure out why and fix it, or ask for help. If the tests fail, we don't want to merge it into main. You can see the status of the CI builds in the GitHub Actions for the PUDL repo.
  • Please don't decrease the overall test coverage -- if you introduce new code, it also needs to be exercised by the tests. See :doc:`testing` for details.
  • Write good docstrings using the Google format
  • Pull Requests should update the documentation to reflect changes to the code, especially if it changes something user-facing, like how one of the command line scripts works.

Releases

  • The PUDL data processing pipeline isn't intended to be used as a library that other Python packages depend on. Rather, it's an end-use application that produces data which other applications and analyses can consume. Because of this, we no longer release installable packages on PyPI or conda-forge.
  • Periodically, we tag a versioned release on main using a calendar based version, like v2023.07.15. This triggers a snapshot of the repository being archived on Zenodo.
  • The nightly build outputs associated with any tagged release will also get archived on Zenodo here and be made available longer term in the AWS Open Data Registry.

User Support

We don't (yet) have funding to do user support, so it's currently all community and volunteer based. In order to ensure that others can find the answers to questions that have already been asked, we try to do all support in public using Github Discussions.