-
Notifications
You must be signed in to change notification settings - Fork 4
Add IJSEM source #250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add IJSEM source #250
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the IJSEM (International Journal of Systematic and Evolutionary Microbiology) data source to the kg-microbe project. The implementation includes a new transform class for processing IJSEM data and integrates it into the existing transformation pipeline.
- Adds IJSEMTransform class with basic structure for processing IJSEM phenotype database files
- Integrates IJSEM into the main transformation registry and constants
- Updates SLURM job scripts with email notifications for job status monitoring
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| kg_microbe/transform_utils/ijsem/ijsem_transform.py | Implements the main IJSEM transformation class with file processing logic |
| kg_microbe/transform_utils/constants.py | Adds IJSEM-related constants including source name and resource filename |
| kg_microbe/transform.py | Registers IJSEM transform in the main transformation registry |
| download.yaml | Adds download configuration for IJSEM data from Figshare |
| hpc/*.sl | Updates SLURM scripts with email notification settings |
| data/transformed/ijsem/*.tsv | Creates placeholder output files with TSV headers |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| @@ -0,0 +1,56 @@ | |||
| """IJSEM tranform script.""" | |||
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in 'tranform' - it should be 'transform'.
| """IJSEM tranform script.""" | |
| """IJSEM transform script.""" |
| data = line.strip().split("\t") | ||
| data_dict = dict(zip(header, data)) | ||
|
|
||
| import pdb; pdb.set_trace() |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug breakpoint (pdb.set_trace()) should not be left in production code. This will cause the program to halt execution and wait for user input.
| import pdb; pdb.set_trace() | |
| # Debug breakpoint removed |
|
|
||
| # Unzip file | ||
| if not Path(data_file).is_file(): | ||
| os.system(f"unzip {input_file} -d {self.input_base_dir}") |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using os.system() with user-controlled input can lead to command injection vulnerabilities. Consider using subprocess.run() with proper argument handling or the zipfile module instead.
| os.system(f"unzip {input_file} -d {self.input_base_dir}") | |
| with zipfile.ZipFile(input_file, 'r') as zip_ref: | |
| zip_ref.extractall(self.input_base_dir) |
|
|
||
| class IJSEMTransform(Transform): | ||
|
|
||
| """Template for how the transform class would be designed.""" |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring describes this as a template but this appears to be the actual IJSEM implementation. Update the docstring to describe the specific purpose of the IJSEM transform.
| """Template for how the transform class would be designed.""" | |
| """Transform class for processing and converting IJSEM (International Journal of Systematic and Evolutionary Microbiology) data into standardized node and edge files for downstream use.""" |
Fixes #180
from @turbomam
see also