-
Notifications
You must be signed in to change notification settings - Fork 19
[Project Automation] Create High level tracker workflow + readme + helper script #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.04
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first pass at this, will do a more thorough review next week. Just publishing a non-blocking review now so you can see my first few comments.
inputs: | ||
PROJECT_ID: | ||
description: "Project ID" | ||
default: "PVT_kwDOAp2shc4ArK9w" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment for all of these default:
values with what look like identifiers from a service.... what are they? Where did the values come from? Are they intended to be overridden?
If every user of this is expected to provide these values at the site where this is called, then I think that:
- the defaults should be removed
- the inputs should all have
required: true
- docs should be added here explaining how to set values for them (either how to find the things to hard-code, or how to get these via templating with the various runtime contexts GitHub Actions provides: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#github-context)
If these values are intended to be used literally and never overridden, then I think that:
- they should be moved out of inputs and instead either hard-coded below or shared via the
env.
context (https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#env-context)
If these values could be used literally but might be overridden, then I think that:
- the inputs should all have
required: true
- docs should be added here explaining how to set values for them
- code comments should be added explaining where these defaults came from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the identifiers
They are graphQL nodeID
s; they are static values associated with GitHub projects. Unfortunately, there's no clean way of dealing with these; either we hard-code them to store them, or we re-query their values using logic shown in the helper python script in docs every time we run them. While they look like private keys etc, they're totally public and anyone can query those values.
On defaults:
- To make it so caller workflows can be as simple as possible
- To allow for flexibility should others want to maintain their own projects using the same structure
If this setup isn't something we're happy with, then I'd go with the hard-code in env. Goal 1 is more important than goal 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyone can query those values.
Ok but how? Let's imagine you're not available and this breaks complaining about a project with this ID not existing... how would I go find the appropriate values?
The other thing I'm confused about... if it's ok to hard-code a single project's ID, then why is this workflow getting checked into shared-workflows
(which contains workflows that are intended to be re-used in many places), instead of in whatever repo the project board is associated with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using logic shown in the helper python script
Ok, I see now that you have docs on how to get these values. Great!
It'd still be helpful to have comments in these files indicated where these specific defaults came from. That's information I'd need to know to update these, or to choose whether or not to override those defaults in calls to these workflows from other repos.
docs/project-automation-readme.md
Outdated
- Linking PRs to issues and keeping them in sync requires complex operations | ||
- Work often begins in an Issue, but then is completed in the PR, requiring this synchronization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these workflows and the GraphQL code very well, so apologies if this is an ignorant question... does any of this automation assume that 1 issue will be resolved by exactly 1 PR?
The language here (talking about "the" PR) makes me think that it might.
I think that's an assumption that's likely to be violated often, especially for large efforts like rapidsai/cuvs#735
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah totally agree it's a little confusing; for any given workflow-triggering PR the update-linked-issues
workflow synchronizes all issues that are linked to that PR (assuming they are in the same project as the PR).
Ultimately, GitHub itself doesn't have a strong mechanism for delineating the 1-to-many or many-to-many relationship of PRs and Issues in their linkages. If you link 5 PRs to 1 issue, closing any of the PRs closes all of the issues.
The linkage between the two lives inside the PR, so I say "the PR" referring to the PR that triggers the workflow, but can totally see changing it or adding some clarity!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text, lmk if it adds any clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text, lmk if it adds any clarity
If you link 5 PRs to 1 issue, closing any of the PRs closes all of the issues.
Thanks. I re-read the docs today and it's good to see they no longer imply a 1-to-1 relationship between PRs and issues.
I think the biggest thing I'm still really unsure about it what you mean (in comments here and in the docs) by "link to".
For example, if I leave a comment like "Contributes to #17963"
in a PR's description or in a comment, I have "linked" the PR to the issue.. GitHub will generate an update on the corresponding issues like these:

(ref: microsoft/LightGBM#3867)
But those types of linkages don't result in the behavior "closing the PR closes the issue".
I suspect you mean specifically using one of these keywords: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue
If so, I'd add a link to that in the docs.
if [[ "$CURRENT_DATE" < "$DEV_START" ]]; then | ||
STAGE_ID="${{ inputs.STAGE_PLANNED_NAME }}" | ||
echo "Stage: Planned" | ||
elif [[ "$CURRENT_DATE" < "$BURNDOWN_START" ]]; then | ||
STAGE_ID="${{ inputs.STAGE_DEVELOPMENT_NAME }}" | ||
echo "Stage: Development" | ||
elif [[ "$CURRENT_DATE" < "$FREEZE_START" ]]; then | ||
STAGE_ID="${{ inputs.STAGE_BURNDOWN_NAME }}" | ||
echo "Stage: Burndown" | ||
elif [[ "$CURRENT_DATE" < "$RELEASE_DATE" ]]; then | ||
STAGE_ID="${{ inputs.STAGE_CODE_FREEZE_NAME }}" | ||
echo "Stage: Code Freeze" | ||
else | ||
STAGE_ID="${{ inputs.STAGE_HOTFIX_NAME }}" | ||
echo "Stage: Hotfix" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move as many of the ${{ }}
pre-processor lines as possible to environment variables to prevent script injection attacks.
That goes for this block and any other lines in this workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair, if these stay as variables vs going hard-coded per James' comment above I'll update.
I don't think this is a vector for script injection though because the only way these variables get set is inside the action (they aren't affected by a user defined field in a workflow-triggering PR). If a bad actor attempted to set the values in their own branch, it wouldn't have an effect unless we merged it.
One of the frustrating parts of these workflows is that testing is challenging because they need to be merged into the default branch to be functional, but it does provide protection against script injection as a benefit in many cases!
This pull request introduces a new GitHub Actions workflow for high-level project tracking and provides supporting documentation and scripts. The main changes include the addition of the workflow configuration, a detailed README explaining the workflow, and a helper script to obtain necessary project and field IDs.
New GitHub Actions Workflow:
.github/workflows/project-high-level-tracker.yaml
: Adds a comprehensive workflow to automate the management of GitHub Projects, including steps to process branch names, determine development stages, and update project fields based on release schedules.Documentation:
docs/project-automation-readme.md
: Provides an overview of the new workflow, its architecture, and configuration details. It includes a mermaid diagram to illustrate the workflow and instructions on how to obtain necessary project and field IDs.Helper Script:
docs/project-graphql-helper.py
: Adds a script to fetch project and field IDs from the GitHub GraphQL API, which are required for configuring the workflow.