-
Notifications
You must be signed in to change notification settings - Fork 17
Dataops 1178 update checkqc for projman #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nkongenelly
merged 17 commits into
Molmed:master
from
nkongenelly:DATAOPS_1178_update_checkqc_for_projman
Oct 28, 2025
Merged
Changes from 15 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
c9bb466
Updating illumina parser to return values for projman
nkongenelly 09aae6f
Corrected tests after adding more details in qcData sequencing_metrics
nkongenelly c95b2ed
Refactored code
nkongenelly 0ad571c
Testing GHA with python 3.11
nkongenelly ca0a8c5
Testing GHA with python 3.12
nkongenelly a9971fa
Testing GHA with python 3.13
nkongenelly 4938558
Using python-versio matrix i GHA workflow
nkongenelly c1c2744
Update .github/workflows/unit_tests.yml
nkongenelly 1e18686
Update .github/workflows/unit_tests.yml
nkongenelly ba0a7ed
removed pf_clusters from bclconvert sequencing metrics returned
nkongenelly 5d7a453
Updated samplesheet v2 structure
nkongenelly 394f852
Updated test data format
nkongenelly e3ae231
Made run_info available in qc_data and test_runfolders available in m…
nkongenelly 9d971de
Added OverrideCycles in bclconvert samplesheet
nkongenelly a946b2a
Passing runfolder to qc_data_utils
nkongenelly 861ea51
Added exception for bclconvert_test_runfolder
nkongenelly 18b6e7e
Refactoring code
nkongenelly File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| import numpy as np | ||
| from checkQC.parsers.illumina import _read_interop_summary | ||
|
|
||
|
|
||
| def bclconvert_test_runfolder(qc_data, runfolder_path): | ||
| _, _, run_info = _read_interop_summary(runfolder_path) | ||
| flowcell_id = run_info.flowcell_id() | ||
| if "HMTFYDRXX" in flowcell_id: | ||
| return { | ||
| "qc_data": qc_data, | ||
| "expected_instrument": "novaseq_SP", | ||
| "expected_read_length": 36, | ||
| "expected_samplesheet": { | ||
| "len": 4, | ||
| "head": [ | ||
| { | ||
| "lane": 1, | ||
| "sample_id": "Sample_14574-Qiagen-IndexSet1-SP-Lane1", | ||
| "index": "GAACTGAGCG", | ||
| "index2": "TCGTGGAGCG", | ||
| "sample_project": "AB-1234", | ||
| "overridecycles": "Y36;I10;I10", | ||
| "custom_description": "LIBRARY_NAME:test", | ||
| }, | ||
| { | ||
| "lane": 1, | ||
| "sample_id": "Sample_14575-Qiagen-IndexSet1-SP-Lane1", | ||
| "index": "AGGTCAGATA", | ||
| "index2": "CTACAAGATA", | ||
| "sample_project": "CD-5678", | ||
| "overridecycles": "Y36;I10;I10", | ||
| "custom_description": "LIBRARY_NAME:test", | ||
| }, | ||
| { | ||
| "lane": 2, | ||
| "sample_id": "Sample_14574-Qiagen-IndexSet1-SP-Lane2", | ||
| "index": "GAACTGAGCG", | ||
| "index2": "TCGTGGAGCG", | ||
| "sample_project": "AB-1234", | ||
| "overridecycles": "Y36;I10;I10", | ||
| "custom_description": "LIBRARY_NAME:test", | ||
| }, | ||
| { | ||
| "lane": 2, | ||
| "sample_id": "Sample_14575-Qiagen-IndexSet1-SP-Lane2", | ||
| "index": "AGGTCAGATA", | ||
| "index2": "CTACAAGATA", | ||
| "sample_project": "CD-5678", | ||
| "overridecycles": "Y36;I10;I10", | ||
| "custom_description": "LIBRARY_NAME:test", | ||
| }, | ||
| ], | ||
| }, | ||
| "expected_sequencing_metrics": { | ||
| 1: { | ||
| "total_reads_pf": 532_464_327, | ||
| "total_reads": 638_337_024, | ||
| "raw_density": 2_961_270.5, | ||
| "pf_density": 2_470_118.25, | ||
| "yield": 122_605_416, | ||
| "yield_undetermined": 121_940_136, | ||
| "top_unknown_barcodes": { | ||
| "len": 1029, | ||
| "head": [ | ||
| { | ||
| 'index': 'ATATCTGCTT', 'index2': 'TAGACAATCT', | ||
| 'count': 12857, | ||
| }, | ||
| { | ||
| 'index': 'CACCTCTCTT', 'index2': 'CTCGACTCCT', | ||
| 'count': 12406, | ||
| }, | ||
| { | ||
| 'index': 'ATGTAACGTT', 'index2': 'ACGATTGCTG', | ||
| 'count': 12177, | ||
| }, | ||
| { | ||
| 'index': 'TTCGGTGTGA', 'index2': 'GAACAAGTAT', | ||
| 'count': 11590, | ||
| }, | ||
| { | ||
| 'index': 'GGTCCGCTTC', 'index2': 'CTCACACAAG', | ||
| 'count': 11509, | ||
| }, | ||
| ], | ||
| }, | ||
| "reads": { | ||
| 1: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 95.70932006835938, | ||
| "is_index": False, | ||
| "mean_percent_phix_aligned": 0., | ||
| }, | ||
| 2: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 92.57965850830078, | ||
| "is_index": True, | ||
| "mean_percent_phix_aligned": np.nan, | ||
| }, | ||
| 3: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 90.3790283203125, | ||
| "is_index": True, | ||
| "mean_percent_phix_aligned": np.nan, | ||
| }, | ||
| }, | ||
| "reads_per_sample": [ | ||
| { | ||
| "sample_id": "Sample_14574-Qiagen-IndexSet1-SP-Lane1", | ||
| "cluster_count": 9920, | ||
| "percent_of_lane": 0.29, | ||
| "percent_perfect_index_reads": 97.96, | ||
| "mean_q30": 36.37, | ||
| "percent_q30": 96, | ||
| }, | ||
| { | ||
| "sample_id": "Sample_14575-Qiagen-IndexSet1-SP-Lane1", | ||
| "cluster_count": 8560, | ||
| "percent_of_lane": 0.25, | ||
| "percent_perfect_index_reads": 98.15, | ||
| "mean_q30": 36.43, | ||
| "percent_q30": 96, | ||
| }, | ||
| ], | ||
| }, | ||
| 2: { | ||
| "total_reads_pf": 530_917_565, | ||
| "total_reads": 638_337_024, | ||
| "raw_density": 2_961_270.5, | ||
| "pf_density": 2_462_942.5, | ||
| "yield": 124_497_108, | ||
| "yield_undetermined": 123_817_428, | ||
| "top_unknown_barcodes": { | ||
| "len": 1055, | ||
| "head": [ | ||
| { | ||
| 'index': 'ATATCTGCTT', 'index2': 'TAGACAATCT', | ||
| 'count': 13176, | ||
| }, | ||
| { | ||
| 'index': 'ATGTAACGTT', 'index2': 'ACGATTGCTG', | ||
| 'count': 12395, | ||
| }, | ||
| { | ||
| 'index': 'CACCTCTCTT', 'index2': 'CTCGACTCCT', | ||
| 'count': 12247, | ||
| }, | ||
| { | ||
| 'index': 'TTCGGTGTGA', 'index2': 'GAACAAGTAT', | ||
| 'count': 11909, | ||
| }, | ||
| { | ||
| 'index': 'TAATTAGCGT', 'index2': 'TGGTTAAGAA', | ||
| 'count': 11330, | ||
| }, | ||
| ], | ||
| }, | ||
| "reads": { | ||
| 1: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 95.75276184082031, | ||
| "is_index": False, | ||
| "mean_percent_phix_aligned": 0., | ||
| }, | ||
| 2: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 92.60448455810547, | ||
| "is_index": True, | ||
| "mean_percent_phix_aligned": np.nan, | ||
| }, | ||
| 3: { | ||
| "mean_error_rate": np.nan, | ||
| "percent_q30": 90.2811050415039, | ||
| "is_index": True, | ||
| "mean_percent_phix_aligned": np.nan, | ||
| }, | ||
| }, | ||
| "reads_per_sample": [ | ||
| { | ||
| "sample_id": "Sample_14574-Qiagen-IndexSet1-SP-Lane2", | ||
| "cluster_count": 10208, | ||
| "percent_of_lane": 0.3, | ||
| "percent_perfect_index_reads": 98.2, | ||
| "mean_q30": 36.4, | ||
| "percent_q30": 96, | ||
| }, | ||
| { | ||
| "sample_id": "Sample_14575-Qiagen-IndexSet1-SP-Lane2", | ||
| "cluster_count": 8672, | ||
| "percent_of_lane": 0.25, | ||
| "percent_perfect_index_reads": 98.29, | ||
| "mean_q30": 36.48, | ||
| "percent_q30": 97, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| click~=8.1.1 | ||
| PyYAML~=6.0 | ||
| interop~=1.3.2 | ||
| interop~=1.4.0 | ||
| xmltodict~=0.13.0 | ||
| tornado~=6.3.2 | ||
| sample_sheet~=0.13.0 | ||
| pandas~=2.2.2 | ||
| numpy~=1.26.4 | ||
| numpy~=2.2.4 | ||
| samshee~=0.2.3 | ||
| jsonschema~=4.23.0 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just one minor thing, I think it would be nice to throw an exception if the flowcell ID does not match. Explaining that the ouytput of this funtion is adapated for a specific run
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, thanks