Skip to content

Track data about Pipelines and Github PRs #1319

@jjnesbitt

Description

@jjnesbitt

Some stats we might want to track:

  • Time from Github PR opened to PR merged
  • Time from Github PR opened to PR pipeline started
  • Time from Github PR opened to PR pipeline succeeded
  • Pipeline runtime

If we start tracking pipeline data, we might want to have a separate webhook that fires upon pipeline completion. We'll want to ingest the github data separately, since it's flaky, and we don't want to block a whole job based on that. We could have a celery task that runs on a cron, and looks for pipelines with missing github PR data, that we could attempt to ingest. If it fails because github is down, it will run again.

This is making me think that what we really want is a Pipeline dimension, that would contain information about a specific (PR) pipeline. Then, we could either have extra data on there pertaining to github PRs, or have a separate PullRequest dimension.

If we get to a point where we're storing lots of numeric data (that we're aggregating over), it would potentially warrant having a separate Fact table for pipelines, or whatever it is we're storing data about. Just something to keep in mind.

Metadata

Metadata

Assignees

No one assigned

    Labels

    analytics-dbRelating to the Analytics Database and/or Django application

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions