Skip to content

Commit c461251

Browse files
joyceyangithub-actions
and
github-actions
authored
fix: parse through NCBITaxon ancestors (#259)
## Reason for Change previously, we skipped over parsing through `NCBITaxon` ancestors since we had no prior use for this. now that we do have a use, we should actually parse through this so that we understand the ancestor / descendant terms. this retains the filter to only go through a subgraph of the `NCBITaxon` graph that's `NCBITaxon:33208` (Animal) or below. note that previously, the `ontology-processing` job took 24 mins. with this change, it will take 2 hrs 5 mins. the generated `NCBITaxon-ontology.json` also increased in size from 140 MB to 990 MB. ## Testing download ontology asset and inspect "NCBITaxon:10090" (mus musculus). previously, this is what we had: ``` "NCBITaxon:10090": { "ancestors": {}, "label": "Mus musculus", "synonyms": [ "house mouse", "mouse" ], "deprecated": false }, ``` now, we have: ``` "NCBITaxon:100900": { "ancestors": { "NCBITaxon:2162899": 1, "NCBITaxon:39087": 2, "NCBITaxon:337677": 3, "NCBITaxon:337687": 4, "NCBITaxon:1963758": 5, "NCBITaxon:9989": 6, "NCBITaxon:314147": 7, "NCBITaxon:314146": 8, "NCBITaxon:1437010": 9, "NCBITaxon:9347": 10, "NCBITaxon:32525": 11, "NCBITaxon:40674": 12, "NCBITaxon:32524": 13, "NCBITaxon:32523": 14, "NCBITaxon:1338369": 15, "NCBITaxon:8287": 16, "NCBITaxon:117571": 17, "NCBITaxon:117570": 18, "NCBITaxon:7776": 19, "NCBITaxon:7742": 20, "NCBITaxon:89593": 21, "NCBITaxon:7711": 22, "NCBITaxon:33511": 23, "NCBITaxon:33213": 24, "NCBITaxon:6072": 25, "NCBITaxon:33208": 26, "NCBITaxon:33154": 27, "NCBITaxon:2759": 28, "NCBITaxon:131567": 29, "NCBITaxon:1": 30 }, "label": "Alexandromys middendorffii", "synonyms": [ "Middendorf's vole" ], "deprecated": false }, ``` --------- Co-authored-by: github-actions <[email protected]>
1 parent d27fe21 commit c461251

File tree

3 files changed

+3
-6
lines changed

3 files changed

+3
-6
lines changed
+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
016968365dd92ee02154dc95656535c0e26c8eff
1+
4656b5d9baf5e10c5838ed9813ed94c563560515
Binary file not shown.

tools/ontology-builder/src/all_ontology_generator.py

+2-5
Original file line numberDiff line numberDiff line change
@@ -245,15 +245,12 @@ def _extract_ontology_term_metadata(
245245
# Gets ancestors
246246
ancestors = _get_ancestors(onto_term, allowed_ontologies)
247247

248-
# Special Case: skip the current term if it is an NCBI Term, but not a descendant of 'NCBITaxon:33208'.
248+
# Special Case: skip the current term if it is an NCBI Term, but not a descendant of 'NCBITaxon:33208' (Animal)
249249
if onto.name == "NCBITaxon" and "NCBITaxon:33208" not in ancestors:
250250
continue
251251

252252
term_dict[term_id] = dict()
253-
254-
# only write the ancestors if it's not NCBITaxon, as this saves a lot of disk space and there is
255-
# no current use-case for NCBITaxon
256-
term_dict[term_id]["ancestors"] = {} if onto.name == "NCBITaxon" else ancestors
253+
term_dict[term_id]["ancestors"] = ancestors
257254

258255
if cross_ontology_terms := _extract_cross_ontology_terms(term_id, map_to_cross_ontologies, cross_ontology_map):
259256
term_dict[term_id]["cross_ontology_terms"] = cross_ontology_terms

0 commit comments

Comments
 (0)