Skip to content

Conversation

@realmarcin
Copy link
Collaborator

No description provided.

@turbomam turbomam requested a review from Copilot August 13, 2025 16:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues in the BacDive media processing pipeline by refactoring data accumulation and edge creation logic. The changes address data structure inconsistencies and improve the handling of metabolite utilization and enzyme activity data.

  • Removes unused NCBI_TO_ENZYME_EDGE import and refactors enzyme activity handling
  • Changes data structures from dictionaries to tuples for consistent processing
  • Adds accumulation logic to collect data per NCBITaxon before writing edges

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)
for assay_id in info["assays"]:
# Unpacking the assay information stored as tuples
assay_curie, assay_value, utilization_type = assay_id
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple unpacking assumes all assay items have exactly 3 elements, but enzyme activities are stored as 2-element tuples (lines 749, 758) while metabolite utilizations are stored as 3-element tuples. This will cause a ValueError when processing enzyme data.

Suggested change
assay_curie, assay_value, utilization_type = assay_id
# Unpacking the assay information stored as tuples (handle both 2- and 3-element tuples)
if len(assay_id) == 3:
assay_curie, assay_value, utilization_type = assay_id
elif len(assay_id) == 2:
assay_curie, assay_value = assay_id
utilization_type = None
else:
raise ValueError(f"Unexpected assay tuple length: {len(assay_id)} for {assay_id}")

Copilot uses AI. Check for mistakes.
BACDIVE_PREFIX + key,
]
edge_writer.writerow(meta_util_edges_to_write)
for k, _, _ in positive_chebi_activity:
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is incorrectly indented and creates a syntax error. It should be aligned with the previous if statement or properly nested within it.

Copilot uses AI. Check for mistakes.
writer_2.writerow(phys_and_meta_data)

if ncbitaxon_id:
if ncbitaxon_id not in self.ncbitaxon_info:
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data accumulation logic is duplicated later in the code (lines 604-656 and 767-814). The duplicate code should be consolidated into a single location or extracted into a helper method to improve maintainability.

Copilot uses AI. Check for mistakes.
edge_writer.writerow(
[
ncbitaxon_id,
NCBI_TO_METABOLITE_UTILIZATION_EDGE,
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All assay data is being written with NCBI_TO_METABOLITE_UTILIZATION_EDGE edge type, but enzyme activities should use a different edge type since they represent different biological relationships.

Suggested change
NCBI_TO_METABOLITE_UTILIZATION_EDGE,
# Select edge type based on utilization_type
if utilization_type == "enzyme_activity":
edge_type = ENZYME_TO_ASSAY_EDGE
else:
edge_type = NCBI_TO_METABOLITE_UTILIZATION_EDGE
edge_writer.writerow(
[
ncbitaxon_id,
edge_type,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants