Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the Kraken2 report parsing logic to use indentation-based depth calculation instead of rank-based hierarchical tracking. The change simplifies the lineage tracking by inferring taxonomic depth from leading spaces in the scientific name field.
Key Changes:
- Replaced rank-based parsing with indentation-based depth calculation
- Simplified lineage stack management using depth instead of rank order
- Removed rank filtering and global rank order dictionaries
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
scripts/summarize_kraken2_reports.py |
Refactored parsing to calculate depth from indentation, simplified lineage tracking logic, and moved Snakemake execution code into conditional block |
scripts/test_summarize_kraken2_reports.py |
Updated import path and added assertion for debugging |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| reports, _ = reports | ||
| report = reports[0] | ||
| parsed_report = parse_kraken2_tsv_report(open(report)) | ||
| assert False |
There was a problem hiding this comment.
This assert False statement will cause the test to always fail. Remove this debug assertion before merging.
| assert False |
|
|
||
| def consensus_lineage_str(lineage_stack: list[str]) -> str: | ||
| # Fill missing levels with empty placeholders if needed | ||
| full_lineage = lineage_stack + [f"__"] * (max(7 - len(lineage_stack), 0)) |
There was a problem hiding this comment.
The magic number 7 represents the expected taxonomic levels but is unexplained. Consider defining it as a named constant (e.g., EXPECTED_TAXONOMIC_LEVELS = 7) to improve code clarity.
| if len(lineage_stack) <= depth: | ||
| lineage_stack.extend(["__"] * (depth - len(lineage_stack) + 1)) | ||
| lineage_stack = lineage_stack[: depth + 1] | ||
| lineage_stack[depth] = f"__{name}" |
There was a problem hiding this comment.
The lineage format is inconsistent with the placeholder format. Placeholders use '__' but actual entries use '__{name}', creating strings like '__Species_name'. This should likely be f'{rank}__{name}' to match standard taxonomic notation (e.g., 's__Species_name'), but the rank information is no longer available in the current implementation.
No description provided.