Implementing total_clusters_raw similar to as total_clusters_pf #17

nkongenelly · 2025-05-16T14:39:50Z

Solves the bug found in this confluence report where it was found that raw_reads shouldn't be summed up i.e total_clusters_raw is handled in the same way as total_clusters_pf

matrulda · 2025-05-19T05:14:12Z

projman_filler/interop_run_stats_parser.py

                if index in self._non_index_reads:
                    # These are the same for the lane across all reads
                    total_clusters_pf = row['Reads Pf']
                    # These must be summed


Please remove this comment, as well as the one that reference the issue on line 124.

matrulda · 2025-05-19T05:23:29Z

projman_filler/interop_run_stats_parser.py

            total_clusters_pf = 0
            for index, row in rows.iterrows():
                if index in self._non_index_reads:
                    # These are the same for the lane across all reads


Since this value is the same for all reads, we do not need to loop through all of them? We could just use the first read? I also think we do not necessarily need to look at only non-index reads. I think a lot of lines can be removed from this function.

Could you also add a unit test to verify that Reads and Reads Pf are consistently the same across all reads within the same lane?

nkongenelly · 2025-05-21T07:14:55Z

@matrulda Thanks for the review. I have now updated the code

matrulda

Looks great! Left some comments regarding the test and a few minor formatting suggestions.

matrulda · 2025-05-27T07:42:28Z

projman_filler/interop_run_stats_parser.py

        ar = iop.summary(self._run_metrics, 'Lane')
        df = pd.DataFrame(ar)
-
+ 


Please remove the space here :)

matrulda · 2025-05-27T07:43:11Z

tests/test_interop_run_stats_parser.py

+        iop = InteropRunStatsParser(runfolder, non_index_reads)
+
+
+        data = {


I think the two first assertions (Assert all 'Reads' and 'Reads Pf' values are consistent within the lane) would be more meaningful if we used the test data instead of a mock (where we explicitly define that Reads and Reads Pf should be the same on all rows).
It could be achieved by doing something like this:

interop_lane_summary = interop.summary(iop._run_metrics, 'Lane') df = pd.DataFrame(interop_lane_summary)

I think the resulting data frame will be compatible with the rest of the test (from line 51).

matrulda · 2025-05-27T07:44:00Z

tests/test_interop_run_stats_parser.py

+                    f"Mismatch in total_clusters_raw for lane {lane}"
+                    )
+
+


I think one empty line is enough here.

matrulda · 2025-05-27T07:49:57Z

tests/test_interop_run_stats_parser.py

+        """
+        Verify that 'Reads' and 'Reads Pf' are consistently the same across all reads within the same lane
+        """
+        non_index_reads = [0, 2, 3]


A comment here: I realized now that the test runfolder actually only have one non-index read and it is at index 0. I don't think it affects the test, but it could be change to avoid confusion in the future.

If you think this makes sense I would suggest fixing it in all tests except test_interop_standardize_read_numbers where I think it would be good to see that multiple non index reads are handled correctly.

nkongenelly · 2025-05-27T10:47:56Z

Thanks @matrulda for the help. I have now pushed the updated code.

matrulda

Great, thank you!

Implementing total_clusters_raw similar to as total_clusters_pf

b4f3b10

nkongenelly requested a review from matrulda May 16, 2025 14:41

matrulda reviewed May 19, 2025

View reviewed changes

DATAOPS-832: Optimize code and add unit test

906386f

matrulda reviewed May 27, 2025

View reviewed changes

Edited test_interop file and formatting changes

9cb6f19

matrulda approved these changes May 27, 2025

View reviewed changes

nkongenelly merged commit 18e7feb into Molmed:master Jun 4, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementing total_clusters_raw similar to as total_clusters_pf #17

Implementing total_clusters_raw similar to as total_clusters_pf #17

Uh oh!

nkongenelly commented May 16, 2025

Uh oh!

matrulda May 19, 2025

Uh oh!

matrulda May 19, 2025

Uh oh!

nkongenelly commented May 21, 2025

Uh oh!

matrulda left a comment

Uh oh!

matrulda May 27, 2025

Uh oh!

matrulda May 27, 2025

Uh oh!

matrulda May 27, 2025

Uh oh!

matrulda May 27, 2025

Uh oh!

nkongenelly commented May 27, 2025

Uh oh!

matrulda left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ar = iop.summary(self._run_metrics, 'Lane')
		df = pd.DataFrame(ar)

		iop = InteropRunStatsParser(runfolder, non_index_reads)


		data = {

Implementing total_clusters_raw similar to as total_clusters_pf #17

Implementing total_clusters_raw similar to as total_clusters_pf #17

Uh oh!

Conversation

nkongenelly commented May 16, 2025

Uh oh!

matrulda May 19, 2025

Choose a reason for hiding this comment

Uh oh!

matrulda May 19, 2025

Choose a reason for hiding this comment

Uh oh!

nkongenelly commented May 21, 2025

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

matrulda May 27, 2025

Choose a reason for hiding this comment

Uh oh!

matrulda May 27, 2025

Choose a reason for hiding this comment

Uh oh!

matrulda May 27, 2025

Choose a reason for hiding this comment

Uh oh!

matrulda May 27, 2025

Choose a reason for hiding this comment

Uh oh!

nkongenelly commented May 27, 2025

Uh oh!

matrulda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants