Skip to content

Conversation

@nkongenelly
Copy link
Contributor

@nkongenelly nkongenelly commented Aug 27, 2025

Jira ticket

This version updates the sequencing_metrics returned by the illumina parser to contain all metrics needed by projman_filler to save flowcell_lane and sample metrics

@nkongenelly nkongenelly self-assigned this Aug 27, 2025
@nkongenelly nkongenelly force-pushed the DATAOPS_1178_update_checkqc_for_projman branch 3 times, most recently from 12fe762 to 09aae6f Compare August 28, 2025 05:59
@nkongenelly nkongenelly marked this pull request as ready for review August 28, 2025 08:40
@nkongenelly nkongenelly requested a review from matrulda August 28, 2025 08:58
Copy link
Collaborator

@matrulda matrulda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left some comments for you.

"expected_sequencing_metrics": {
1: {
"total_cluster_pf": 532_464_327,
"pf_clusters": 3_413_232.5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think we should take another look at if these numbers make sense. It differs so much from the total_clusters_pf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right

@nkongenelly
Copy link
Contributor Author

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.13'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be nice to use a test matrix so that we make sure to test all supported versions in the GHA workflow: https://stackoverflow.com/a/61428673

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow!, this is interesting, thank you for this and I have now pushed the updated code with this change.

@matrulda
Copy link
Collaborator

matrulda commented Sep 5, 2025

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

Great work! I left a final comment for you and we definitely should look a meeting to look at the values.

@nkongenelly
Copy link
Contributor Author

Thanks for the review. I have pushed the updated code and maybe we can have a meeting to discuss the values again since as also mentioned that total_clusters_pf is much bigger than pf_clusters.

Great work! I left a final comment for you and we definitely should look a meeting to look at the values.

I have booked a meeting and sent you an invite email for this

Copy link
Collaborator

@matrulda matrulda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but I think you forgot some curly brackets

@nkongenelly
Copy link
Contributor Author

Nice, but I think you forgot some curly brackets

Oh yes! Sorry for that. I think i was in my branch that i didn't see the unit tests fail. I have now updated this.

@nkongenelly
Copy link
Contributor Author

Hej @matrulda , I have now pushed the updated code after we finalized samplesheet v2 structure.

Copy link
Collaborator

@matrulda matrulda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good! Just a minor comment.

1: {
"total_cluster_pf": 532_464_327,
"total_reads_pf": 532_464_327,
"total_reads": 638337024,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to use the "underscore syntax" here as well.

Copy link
Contributor Author

@nkongenelly nkongenelly Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, thanks

@nkongenelly
Copy link
Contributor Author

Hej @matrulda , a kind reminder on this PR so that we can use the master branch when this is merged in projman_filler to install checkqc

@nkongenelly
Copy link
Contributor Author

I have just pushed some changes to enable projman_filler to;

  • get flowcell_id (i.e from checkqc interop run_info ) and

  • also moved some test data inside the checkQC modue so that it can be used by projman_filler tests too ( i tried to include checkqc tests/ test_qc_data.py to be among the files avaliable to import from projman_filler by editing local manifest.in file and setup.py and also using the #egg in the install command but it seemed not to work or i might have implemented them wrongly.....that's why i ended up moving the test_data inside the module folder) ... here is a stackoverflow link where i got the basic information from

@nkongenelly
Copy link
Contributor Author

I also tried to move the qc_data_utils file outside the checkQC folder, and adding it is setup.py (under the package_data list) but i was unsuccessful in importing it in the projman_filler side

And i have now added the OverrideCycles in the bclconvert samplesheet (i had forgotten about it in my previous commit)

Copy link
Collaborator

@matrulda matrulda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I had one question.

import numpy as np


def bclconvert_test_runfolder(qc_data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I have with this function is that it looks like it is a general utility tool, but it expects qc_data to be based on "200624_A00834_0183_BHMTFYTINY" to match the "expected_*" values.
Could we move this part to the function as well?

qc_data = QCData.from_bclconvert(
        Path(__file__).parent / "resources/bclconvert/200624_A00834_0183_BHMTFYTINY",
        parser_config,
    )

I guess we won't be able to parse the CheckQC test resource when using this function in projman_filler, but in that case maybe the input to this function could be the path to 200624_A00834_0183_BHMTFYTINY? It should state clearly in the doc string that this runfolder is expected as input.
Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, this function is only tied to 200624_A00834_0183_BHMTFYTINY. thought of using qc_data to make the function dynamic but then realised that would be comparing 2 qc_data that are generated a fresh each time which may always match.

Yes, i think the function can take runfolder as input

Copy link
Contributor Author

@nkongenelly nkongenelly Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe check if the flowcell_id for the TINY runfolder is "HMTFYDRXX" because the bclconvert runfolder in projman_filler has a slightly different name i.e 200624_A00834_0184_BHMTFYTINY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now updated the code passing the runfolder Path and checking for the flowcell_id. I have also updated projman_filler to pass the runfolder Path

But this is change is also open for discussion 😃

@nkongenelly nkongenelly force-pushed the DATAOPS_1178_update_checkqc_for_projman branch 2 times, most recently from bfde2bb to a946b2a Compare October 20, 2025 14:23
def bclconvert_test_runfolder(qc_data, runfolder_path):
_, _, run_info = _read_interop_summary(runfolder_path)
flowcell_id = run_info.flowcell_id()
if "HMTFYDRXX" in flowcell_id:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just one minor thing, I think it would be nice to throw an exception if the flowcell ID does not match. Explaining that the ouytput of this funtion is adapated for a specific run

Copy link
Contributor Author

@nkongenelly nkongenelly Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, thanks

@nkongenelly
Copy link
Contributor Author

Thanks for the reviews, I have now updated the code.

Comment on lines 200 to 201
raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for "
f"this fuction but got {flowcell_id}"
Copy link
Collaborator

@matrulda matrulda Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is a typo in the exception. Also, I thought the exception could be rephrased to something like this.

Suggested change
raise Exception("Excpected flowcell_id value as 'HMTFYDRXX' only for "
f"this fuction but got {flowcell_id}"
raise Exception("This function is only compatible with the run with flowcell_id: 'HMTFYDRXX', "
f"the supplied runfolder has flowcell_id: {flowcell_id}"

@matrulda
Copy link
Collaborator

Just one super minor thing, then we can merge this! :)

@nkongenelly
Copy link
Contributor Author

Thanks for the reviews. I have now updated the code

@nkongenelly nkongenelly merged commit b7abf00 into Molmed:master Oct 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants