-
Notifications
You must be signed in to change notification settings - Fork 17
Dataops 1178 update checkqc for projman #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataops 1178 update checkqc for projman #134
Conversation
12fe762 to
09aae6f
Compare
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left some comments for you.
tests/test_qc_data.py
Outdated
| "expected_sequencing_metrics": { | ||
| 1: { | ||
| "total_cluster_pf": 532_464_327, | ||
| "pf_clusters": 3_413_232.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think we should take another look at if these numbers make sense. It differs so much from the total_clusters_pf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right
|
Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters. |
.github/workflows/unit_tests.yml
Outdated
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: '3.10' | ||
| python-version: '3.13' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be nice to use a test matrix so that we make sure to test all supported versions in the GHA workflow: https://stackoverflow.com/a/61428673
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow!, this is interesting, thank you for this and I have now pushed the updated code with this change.
Great work! I left a final comment for you and we definitely should look a meeting to look at the values. |
I have booked a meeting and sent you an invite email for this |
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but I think you forgot some curly brackets
Co-authored-by: Matilda Åslin <[email protected]>
Co-authored-by: Matilda Åslin <[email protected]>
Oh yes! Sorry for that. I think i was in my branch that i didn't see the unit tests fail. I have now updated this. |
|
Hej @matrulda , I have now pushed the updated code after we finalized samplesheet v2 structure. |
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, looks good! Just a minor comment.
tests/test_qc_data.py
Outdated
| 1: { | ||
| "total_cluster_pf": 532_464_327, | ||
| "total_reads_pf": 532_464_327, | ||
| "total_reads": 638337024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to use the "underscore syntax" here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, thanks
|
Hej @matrulda , a kind reminder on this PR so that we can use the master branch when this is merged in projman_filler to install checkqc |
|
I have just pushed some changes to enable projman_filler to;
|
|
I also tried to move the qc_data_utils file outside the checkQC folder, and adding it is setup.py (under the package_data list) but i was unsuccessful in importing it in the projman_filler side And i have now added the OverrideCycles in the bclconvert samplesheet (i had forgotten about it in my previous commit) |
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I had one question.
checkQC/qc_data_utils.py
Outdated
| import numpy as np | ||
|
|
||
|
|
||
| def bclconvert_test_runfolder(qc_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue I have with this function is that it looks like it is a general utility tool, but it expects qc_data to be based on "200624_A00834_0183_BHMTFYTINY" to match the "expected_*" values.
Could we move this part to the function as well?
qc_data = QCData.from_bclconvert(
Path(__file__).parent / "resources/bclconvert/200624_A00834_0183_BHMTFYTINY",
parser_config,
)
I guess we won't be able to parse the CheckQC test resource when using this function in projman_filler, but in that case maybe the input to this function could be the path to 200624_A00834_0183_BHMTFYTINY? It should state clearly in the doc string that this runfolder is expected as input.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, this function is only tied to 200624_A00834_0183_BHMTFYTINY. thought of using qc_data to make the function dynamic but then realised that would be comparing 2 qc_data that are generated a fresh each time which may always match.
Yes, i think the function can take runfolder as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe check if the flowcell_id for the TINY runfolder is "HMTFYDRXX" because the bclconvert runfolder in projman_filler has a slightly different name i.e 200624_A00834_0184_BHMTFYTINY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now updated the code passing the runfolder Path and checking for the flowcell_id. I have also updated projman_filler to pass the runfolder Path
But this is change is also open for discussion 😃
bfde2bb to
a946b2a
Compare
| def bclconvert_test_runfolder(qc_data, runfolder_path): | ||
| _, _, run_info = _read_interop_summary(runfolder_path) | ||
| flowcell_id = run_info.flowcell_id() | ||
| if "HMTFYDRXX" in flowcell_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just one minor thing, I think it would be nice to throw an exception if the flowcell ID does not match. Explaining that the ouytput of this funtion is adapated for a specific run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, thanks
|
Thanks for the reviews, I have now updated the code. |
checkQC/qc_data_utils.py
Outdated
| raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for " | ||
| f"this fuction but got {flowcell_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, there is a typo in the exception. Also, I thought the exception could be rephrased to something like this.
| raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for " | |
| f"this fuction but got {flowcell_id}" | |
| raise Exception("This function is only compatible with the run with flowcell_id: 'HMTFYDRXX', " | |
| f"the supplied runfolder has flowcell_id: {flowcell_id}" |
|
Just one super minor thing, then we can merge this! :) |
|
Thanks for the reviews. I have now updated the code |
Jira ticket
This version updates the sequencing_metrics returned by the illumina parser to contain all metrics needed by projman_filler to save flowcell_lane and sample metrics