-
Notifications
You must be signed in to change notification settings - Fork 1
updated taxonomy query #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ssion_id, abbreviation, cell_type_set
Summary of ChangesHello @puja-trivedi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the system's ability to retrieve and process taxonomy data by expanding the information gathered for cell type taxa. It updates the core SPARQL query to include Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the SPARQL query for taxonomy data to include additional information. However, the Python code that processes the query results has not been updated accordingly, which will lead to runtime errors. I've identified this critical issue and provided a corrected implementation. Additionally, I've pointed out a potential issue with the removal of the GRAPH clause in the query which might cause it to return no data, and suggested several readability improvements.
| #! TODO: Update lines 119-126 with this: | ||
| # data = {} | ||
| # for row in response: | ||
| # id_, name, accession_id, parentNode, abbrNode, abbrMeaning, abbrTerm, parcellationTerm, geneAnnotation, cellType, hex_, setAccessionId = row | ||
| # id_ = str(id_) | ||
| # name = str(name) if name else None | ||
| # accession_id = str(accession_id) if accession_id else None | ||
| # parentNode = str(parentNode) if parentNode else None | ||
| # abbrNode = str(abbrNode) if abbrNode else None | ||
| # abbrMeaning = str(abbrMeaning) if abbrMeaning else None | ||
| # abbrTerm = str(abbrTerm) if abbrTerm else None | ||
| # denotes = str(parcellationTerm) if parcellationTerm else None | ||
| # denotes = str(geneAnnotation) if geneAnnotation else denotes | ||
| # denotes = str(cellType) if cellType else denotes | ||
| # hex_ = str(hex_) if hex_ else None | ||
|
|
||
| # if str(id_) in data: | ||
| # if abbrNode in data[id_]["abbreviations"]: | ||
| # if denotes: | ||
| # data[id_]["abbreviations"][abbrNode]["denotes"].append(denotes) | ||
| # else: | ||
| # data[id_]["abbreviations"][abbrNode] = {"term": abbrTerm, "meaning": abbrMeaning, "denotes": [denotes]} | ||
| # else: | ||
| # data[id_] = { | ||
| # "name": name, | ||
| # "accession_id": accession_id, | ||
| # "parent": parentNode, | ||
| # "abbreviations": dict({abbrNode: {"term": abbrTerm, "meaning": abbrMeaning, "denotes": [denotes]}}) if abbrNode else {}, | ||
| # "hex": hex_, | ||
| # "belongs_to_set": setAccessionId | ||
| # } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO correctly identifies that the data processing logic (lines 119-126) needs to be updated to match the new query. The current, old logic is still in place and will cause runtime errors because it references fields that are no longer returned by the query.
Additionally, the proposed logic in this commented-out block has some issues:
- It iterates over
responsedirectly, but it should iterate overresponse["message"]["results"]["bindings"]. - It attempts to unpack each
rowas a tuple, but the rows are dictionaries. - The logic for handling
denotesis flawed and will overwrite values.
Since I cannot comment on the unchanged lines (119-128), I'm providing a corrected implementation here. Please replace lines 119-128 with the following code:
response_taxonomy = {}
for taxon_info in response["message"]["results"]["bindings"]:
tax_id = taxon_info.get("id", {}).get("value")
if not tax_id:
continue
if tax_id not in response_taxonomy:
response_taxonomy[tax_id] = {
"id": tax_id,
"parent": taxon_info.get("parentNode", {}).get("value"),
"name": taxon_info.get("name", {}).get("value"),
"hex": taxon_info.get("hex", {}).get("value"),
"accession_id": taxon_info.get("accession_id", {}).get("value"),
"belongs_to_set": taxon_info.get("setAccessionId", {}).get("value"),
"abbreviations": {}
}
abbr_node = taxon_info.get("abbrNode", {}).get("value")
if abbr_node and abbr_node not in response_taxonomy[tax_id]["abbreviations"]:
denotes = []
if "parcellationTerm" in taxon_info:
denotes.append(taxon_info["parcellationTerm"]["value"])
if "geneAnnotation" in taxon_info:
denotes.append(taxon_info["geneAnnotation"]["value"])
if "cellType" in taxon_info:
denotes.append(taxon_info["cellType"]["value"])
response_taxonomy[tax_id]["abbreviations"][abbr_node] = {
"term": taxon_info.get("abbrTerm", {}).get("value"),
"meaning": taxon_info.get("abbrMeaning", {}).get("value"),
"denotes": denotes
}
processed_taxonomy = taxonomy_postprocessing(response_taxonomy)
return processed_taxonomy| SELECT ?id ?parent ?name ?hex | ||
| SELECT ?id ?name ?accession_id ?parentNode ?abbrNode ?abbrMeaning ?abbrTerm ?parcellationTerm ?geneAnnotation ?cellType ?hex ?setAccessionId | ||
| WHERE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GRAPH clause has been removed from this query. The previous version specified GRAPH <http://hmbataxonomy20250927.com/>. If the taxonomy data resides exclusively within that named graph, this query will no longer find it because it now targets the default graph. This could lead to empty results. If the data is in that named graph, the GRAPH clause should be restored around the WHERE clause body.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> | ||
| PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
| PREFIX bican: <https://identifiers.org/brain-bican/vocab/> | ||
| PREFIX DHBA: <https://purl.brain-bican.org/ontology/dmbao/DMBA_> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if el['parent'] is None: | ||
| par_id = "root" | ||
| par_nm, par_col = "root", '#ffffff' | ||
| par_name, par_color, par_accession_id, par_abbreviations, par_belongs_to_set = "root", '#ffffff', None, [], None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is quite long and assigns multiple variables at once, which can harm readability. To improve clarity and adhere to common style guidelines (like PEP 8's line length recommendations), consider breaking this into multiple assignment statements.
| par_name, par_color, par_accession_id, par_abbreviations, par_belongs_to_set = "root", '#ffffff', None, [], None | |
| par_name = "root" | |
| par_color = '#ffffff' | |
| par_accession_id = None | |
| par_abbreviations = [] | |
| par_belongs_to_set = None |
| par_id = el['parent'] | ||
| par_nm, par_col = items[par_id]["name"], items[par_id]["hex"] | ||
| par_name, par_color, par_accession_id, par_abbreviations, par_belongs_to_set = items[par_id]["name"], items[par_id]["hex"], items[par_id]["accession_id"], items[par_id]["abbreviations"].values(), items[par_id]["belongs_to_set"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the line above, this line is very long and assigns many variables from a dictionary lookup. This can be hard to read and debug. Breaking it down into individual assignments would improve maintainability.
| par_name, par_color, par_accession_id, par_abbreviations, par_belongs_to_set = items[par_id]["name"], items[par_id]["hex"], items[par_id]["accession_id"], items[par_id]["abbreviations"].values(), items[par_id]["belongs_to_set"] | |
| par_name = items[par_id]["name"] | |
| par_color = items[par_id]["hex"] | |
| par_accession_id = items[par_id]["accession_id"] | |
| par_abbreviations = items[par_id]["abbreviations"].values() | |
| par_belongs_to_set = items[par_id]["belongs_to_set"] |
| children_list_current = [] | ||
| update_childrens(children_list_current, child_id, taxon_children_dict) | ||
| children_list.append({"name": taxon_children_dict[child_id]["meta"]["name"], "nodeColor": taxon_children_dict[child_id]["meta"]["color"], "children": children_list_current}) | ||
| children_list.append({"name": taxon_children_dict[child_id]["meta"]["name"], "nodeColor": taxon_children_dict[child_id]["meta"]["color"], "accession_id": taxon_children_dict[child_id]["meta"]["accession_id"], "abbreviations": taxon_children_dict[child_id]["meta"]["abbreviations"], "belongs_to_set": taxon_children_dict[child_id]["meta"]["belongs_to_set"], "children": children_list_current}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line has become very long with the addition of new keys. To improve readability, consider creating the dictionary first and then appending it to the list. This also allows you to reduce repetition by getting the meta dictionary once.
| children_list.append({"name": taxon_children_dict[child_id]["meta"]["name"], "nodeColor": taxon_children_dict[child_id]["meta"]["color"], "accession_id": taxon_children_dict[child_id]["meta"]["accession_id"], "abbreviations": taxon_children_dict[child_id]["meta"]["abbreviations"], "belongs_to_set": taxon_children_dict[child_id]["meta"]["belongs_to_set"], "children": children_list_current}) | |
| child_meta = taxon_children_dict[child_id]["meta"] | |
| child_node = { | |
| "name": child_meta["name"], | |
| "nodeColor": child_meta["color"], | |
| "accession_id": child_meta["accession_id"], | |
| "abbreviations": child_meta["abbreviations"], | |
| "belongs_to_set": child_meta["belongs_to_set"], | |
| "children": children_list_current | |
| } | |
| children_list.append(child_node) |
|
@puja-trivedi could you address the issues highlighted by gemini? |
now data contains additonal information: accession_id, abbreviation, cell_type_set
TODO: add parsing of
fetch_data_gdb(query_taxonomy)based off new query