-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Some stats we might want to track:
- Time from Github PR opened to PR merged
- Time from Github PR opened to PR pipeline started
- Time from Github PR opened to PR pipeline succeeded
- Pipeline runtime
If we start tracking pipeline data, we might want to have a separate webhook that fires upon pipeline completion. We'll want to ingest the github data separately, since it's flaky, and we don't want to block a whole job based on that. We could have a celery task that runs on a cron, and looks for pipelines with missing github PR data, that we could attempt to ingest. If it fails because github is down, it will run again.
This is making me think that what we really want is a Pipeline dimension, that would contain information about a specific (PR) pipeline. Then, we could either have extra data on there pertaining to github PRs, or have a separate PullRequest dimension.
If we get to a point where we're storing lots of numeric data (that we're aggregating over), it would potentially warrant having a separate Fact table for pipelines, or whatever it is we're storing data about. Just something to keep in mind.