Skip to content

Conversation

@bsantan
Copy link
Collaborator

@bsantan bsantan commented Feb 25, 2025

  • HGNC gene IDs were only being mapped for those that exist in the MONDO ontology
  • Now ingest the HGNC OWL file, and map all human Uniprot IDs to HGNC IDs

@turbomam turbomam requested a review from Copilot August 13, 2025 17:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR transitions from using MONDO ontology for HGNC gene mappings to directly using Node Normalizer API to map all human UniProt proteins to their corresponding HGNC gene IDs. The change addresses the limitation where only genes present in MONDO were being mapped, now enabling complete HGNC gene coverage for human UniProt entries.

Key changes:

  • Replaced MONDO-based gene mapping with Node Normalizer API calls
  • Added HGNC ontology processing to support comprehensive gene mappings
  • Updated Node Normalizer API endpoint to version 1.5

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
kg_microbe/utils/uniprot_utils.py Implements Node Normalizer API function and updates gene parsing logic
kg_microbe/transform_utils/wallen_etal/wallen_etal.py Adds debugging statement
kg_microbe/transform_utils/uniprot_human/uniprot_human.py Removes mondo_gene_dict parameter from function calls
kg_microbe/transform_utils/ontologies/ontologies_transform.py Adds HGNC ontology processing and filters out gene property nodes
kg_microbe/transform_utils/constants.py Adds HGNC-related prefixes and updates Node Normalizer URL

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

wallen_etal_df = pd.read_excel(input_file, skiprows=3, sheet_name=WALLEN_ETAL_TAB_NAME)
wallen_etal_df[FDR_COLUMN] = pd.to_numeric(wallen_etal_df[FDR_COLUMN], errors="coerce")

import pdb;pdb.set_trace()
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug statement with pdb.set_trace() should not be committed to production code. This will cause the program to halt execution and wait for debugger input.

Suggested change
import pdb;pdb.set_trace()

Copilot uses AI. Check for mistakes.
CHEBI_REGEX = re.compile(r'/ligand_id="ChEBI:(.*?)";')
GO_REGEX = re.compile(r"\[(.*?)\]")

# Takes cure in the form PREFIX:ID
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: 'cure' should be 'curie'.

Suggested change
# Takes cure in the form PREFIX:ID
# Takes curie in the form PREFIX:ID

Copilot uses AI. Check for mistakes.
except TypeError:
return None

else:
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function lacks proper error handling for HTTP requests. If the API is unavailable or returns an error status, the function will raise an exception without providing a meaningful error message to help with debugging.

Suggested change
else:
url = NODE_NORMALIZER_URL + node_curie
try:
# Make the HTTP request to NodeNormalizer
response = requests.get(url, timeout=30)
response.raise_for_status()
# Write response to file if it contains data
entries = response.json()[node_curie]
if len(entries) > 1: # .strip().split("\n")
for iden in entries["equivalent_identifiers"]:
if iden["identifier"].split(":")[0] + ":" == HGNC_NEW_PREFIX:
norm_node = iden["identifier"]
return norm_node
else:
return None
except requests.exceptions.RequestException as e:
logging.error(f"HTTP request failed for node_curie '{node_curie}' at URL '{url}': {e}")
return None
except (ValueError, KeyError, TypeError) as e:
logging.error(f"Error processing response for node_curie '{node_curie}' at URL '{url}': {e}")

Copilot uses AI. Check for mistakes.
response.raise_for_status()

# Write response to file if it contains data
entries = response.json()[node_curie]
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line assumes the response JSON contains the node_curie key, but this could fail if the API response structure is different or if the key doesn't exist, potentially causing a KeyError.

Suggested change
entries = response.json()[node_curie]
entries = response.json().get(node_curie)
if entries is None:
return None

Copilot uses AI. Check for mistakes.
# Write response to file if it contains data
entries = response.json()[node_curie]
try:
if len(entries) > 1: # .strip().split("\n")
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition len(entries) > 1 doesn't make logical sense here. The code is checking if entries has more than one element, but then accesses entries["equivalent_identifiers"], suggesting entries is a dictionary. The logic should verify entries is not None and contains the expected structure.

Suggested change
if len(entries) > 1: # .strip().split("\n")
if entries and "equivalent_identifiers" in entries and entries["equivalent_identifiers"]:

Copilot uses AI. Check for mistakes.
# Rewrite edges file
with open(edges_file, "w") as new_ef:
new_ef.write("\t".join(self.edge_header) + "\n")
# new_ef.write("\t".join(self.edge_header) + "\n")
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the codebase. If this header writing logic is needed, it should be uncommented and used, otherwise it should be deleted.

Suggested change
# new_ef.write("\t".join(self.edge_header) + "\n")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants