Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set up new github workflow for only ontologies that have changed #264

Merged
merged 16 commits into from
Feb 14, 2025

Conversation

joyceyan
Copy link
Contributor

@joyceyan joyceyan commented Feb 12, 2025

Reason for Change

  • now that we go through all of the ancestors in NCBITaxon, it takes about two hours for the generate_all_ontology.yml job to wrap up, which is.. quite long if we just want to do simple things like bump versions of smaller ontology graphs

note: i did try to put all of this into one workflow, but i couldn't figure out how to check if only ontology_info.json was updated and the other files were not updated. there potentially is a way for this to be done so that if we update ontology_info.json and all_ontology_generator.py, then we don't have to run both workflows. that being said, i do think that there are some benefits to having it be a separate workflow entirely. it's immediately clear to the person writing the PR which version of the script we're executing since we would start the "Updates to Diff'd Ontology Files" job instead of the "Updates to Ontology Files" job

Changes

  • creates a new GHA workflow generate_diff_ontology so that we only download / process ontologies that have changed since the last run by executing the script with a --diff flag if the only file we've changed is ontology_info.json
  • the latest ontologies we processed are now stored in latest_ontology_info.json
  • if any changes are made to ontology_info_schema.json, all_ontology_schema.json, or all_ontology_generator.py, then we re-generate all ontology files without the --diff flag by executing the old generate_all_ontology workflow

Testing steps

  • i ran the script locally with the download/parsing commented out to verify that we write to latest_ontology_info.json correctly
  • i also fiddled around locally with changes to latest_ontology_info.json to verify that we're only picking up diffed changes
  • going to be setting up PR's against this branch to verify the GHA performs as expected:
    • this PR only bumps the CL version in ontology_info.json, and only the generate_diff_ontology workflow is executed. the ontology-assets/latest_ontology_info.json file was also automatically generated and added by the workflow
    • this PR updates the all ontology generator script, so it executes the original workflow which re-generates everything

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 15.78947% with 16 lines in your changes missing coverage. Please review.

Project coverage is 88.56%. Comparing base (0ac2219) to head (faa7659).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ols/ontology-builder/src/all_ontology_generator.py 11.11% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #264      +/-   ##
==========================================
- Coverage   89.27%   88.56%   -0.71%     
==========================================
  Files          19       19              
  Lines        1343     1356      +13     
  Branches      121      122       +1     
==========================================
+ Hits         1199     1201       +2     
- Misses        124      135      +11     
  Partials       20       20              
Flag Coverage Δ
unittests 88.56% <15.78%> (-0.71%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joyceyan joyceyan changed the title feat: set up new github workflow for only ontologies that have changed feat: set up new github workflow for only ontologies that have changed [WIP] Feb 12, 2025
@joyceyan joyceyan force-pushed the joyce/diff-ontologies branch 2 times, most recently from c9d0c5b to 7caa371 Compare February 13, 2025 03:46
@joyceyan joyceyan changed the title feat: set up new github workflow for only ontologies that have changed [WIP] feat: set up new github workflow for only ontologies that have changed Feb 13, 2025
@joyceyan joyceyan force-pushed the joyce/diff-ontologies branch from 3772954 to 70d1dac Compare February 13, 2025 22:13
on:
push:
paths:
- "**/ontology-assets/ontology_info.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know what happens if we update both ontology_info and all_ontology_generator.py? do we need to add some sort of "if ontology_info is updated but not all_ontology_generator" conditional logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both the files are updated, then both workflows will be executed. it's not ideal, but i spent 1-2 hours trying to prevent that from happening and i couldn't figure out how to do it. i'm also not super well versed in how to write GHA workflows with conditional logic, so it certainly could be possible with more investigative effort.

what i was thinking for now is that if you do enter this edge case, the developer writing the PR can just cancel the workflow that's not needed, or just allow both to run. allowing both jobs to run shouldn't cause any issues in the expected generated ontology assets, it would just waste some time and some GHA resources. if that ends up being too annoying, we can patch this edge case as a follow up.

since trying to handle this edge case took up about half the total time i spent on this, i figured it's still better to get this change out first so we see an immediate benefit to lower runtime for ~most PR's generating ontology assets--especially since this was already annoying me this week while i was working on adding in the CL terms into EFO for perturbations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thanks for looking into it

Copy link
Collaborator

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thank you! left a note about one github workflow edge case

Copy link
Collaborator

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change looks good, optionally mind adding a TODO or jira ticket or something to document that we're aware of the double update edge case?

@joyceyan joyceyan merged commit 45d0fce into main Feb 14, 2025
8 of 10 checks passed
@joyceyan joyceyan deleted the joyce/diff-ontologies branch February 14, 2025 18:36
@github-actions github-actions bot mentioned this pull request Feb 14, 2025
@github-actions github-actions bot mentioned this pull request Mar 4, 2025
ejmolinelli added a commit that referenced this pull request Mar 6, 2025
🤖 I have created a release *beep* *boop*
---


<details><summary>python-api: 1.6.0-alpha</summary>

##
[1.6.0-alpha](python-api-v1.5.0-alpha...python-api-v1.6.0-alpha)
(2025-03-05)


### Features

* pinned ontologies for schema 5.3.0
([#275](#275))
([cd6a15e](cd6a15e))
* set up new github workflow for only ontologies that have changed
([#264](#264))
([45d0fce](45d0fce))


### Misc

* remove 5.3.1-alpha
([#273](#273))
([a930332](a930332))
</details>

<details><summary>ontology-assets: 1.5.0-alpha</summary>

##
[1.5.0-alpha](ontology-assets-v1.4.0-alpha...ontology-assets-v1.5.0-alpha)
(2025-03-05)


### Features

* pinned ontologies for schema 5.3.0
([#275](#275))
([cd6a15e](cd6a15e))
* set up new github workflow for only ontologies that have changed
([#264](#264))
([45d0fce](45d0fce))


### Misc

* remove 5.3.1-alpha
([#273](#273))
([a930332](a930332))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evan Molinelli <[email protected]>
@github-actions github-actions bot mentioned this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants