-
Notifications
You must be signed in to change notification settings - Fork 4
Bacdive media fix #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Bacdive media fix #174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues in the BacDive media processing pipeline by refactoring data accumulation and edge creation logic. The changes address data structure inconsistencies and improve the handling of metabolite utilization and enzyme activity data.
- Removes unused
NCBI_TO_ENZYME_EDGEimport and refactors enzyme activity handling - Changes data structures from dictionaries to tuples for consistent processing
- Adds accumulation logic to collect data per NCBITaxon before writing edges
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| ) | ||
| for assay_id in info["assays"]: | ||
| # Unpacking the assay information stored as tuples | ||
| assay_curie, assay_value, utilization_type = assay_id |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tuple unpacking assumes all assay items have exactly 3 elements, but enzyme activities are stored as 2-element tuples (lines 749, 758) while metabolite utilizations are stored as 3-element tuples. This will cause a ValueError when processing enzyme data.
| assay_curie, assay_value, utilization_type = assay_id | |
| # Unpacking the assay information stored as tuples (handle both 2- and 3-element tuples) | |
| if len(assay_id) == 3: | |
| assay_curie, assay_value, utilization_type = assay_id | |
| elif len(assay_id) == 2: | |
| assay_curie, assay_value = assay_id | |
| utilization_type = None | |
| else: | |
| raise ValueError(f"Unexpected assay tuple length: {len(assay_id)} for {assay_id}") |
| BACDIVE_PREFIX + key, | ||
| ] | ||
| edge_writer.writerow(meta_util_edges_to_write) | ||
| for k, _, _ in positive_chebi_activity: |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is incorrectly indented and creates a syntax error. It should be aligned with the previous if statement or properly nested within it.
| writer_2.writerow(phys_and_meta_data) | ||
|
|
||
| if ncbitaxon_id: | ||
| if ncbitaxon_id not in self.ncbitaxon_info: |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data accumulation logic is duplicated later in the code (lines 604-656 and 767-814). The duplicate code should be consolidated into a single location or extracted into a helper method to improve maintainability.
| edge_writer.writerow( | ||
| [ | ||
| ncbitaxon_id, | ||
| NCBI_TO_METABOLITE_UTILIZATION_EDGE, |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All assay data is being written with NCBI_TO_METABOLITE_UTILIZATION_EDGE edge type, but enzyme activities should use a different edge type since they represent different biological relationships.
| NCBI_TO_METABOLITE_UTILIZATION_EDGE, | |
| # Select edge type based on utilization_type | |
| if utilization_type == "enzyme_activity": | |
| edge_type = ENZYME_TO_ASSAY_EDGE | |
| else: | |
| edge_type = NCBI_TO_METABOLITE_UTILIZATION_EDGE | |
| edge_writer.writerow( | |
| [ | |
| ncbitaxon_id, | |
| edge_type, |
No description provided.