-
Notifications
You must be signed in to change notification settings - Fork 2
Implementing total_clusters_raw similar to as total_clusters_pf #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing total_clusters_raw similar to as total_clusters_pf #17
Conversation
| if index in self._non_index_reads: | ||
| # These are the same for the lane across all reads | ||
| total_clusters_pf = row['Reads Pf'] | ||
| # These must be summed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this comment, as well as the one that reference the issue on line 124.
| total_clusters_pf = 0 | ||
| for index, row in rows.iterrows(): | ||
| if index in self._non_index_reads: | ||
| # These are the same for the lane across all reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this value is the same for all reads, we do not need to loop through all of them? We could just use the first read? I also think we do not necessarily need to look at only non-index reads. I think a lot of lines can be removed from this function.
Could you also add a unit test to verify that Reads and Reads Pf are consistently the same across all reads within the same lane?
|
@matrulda Thanks for the review. I have now updated the code |
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some comments regarding the test and a few minor formatting suggestions.
| ar = iop.summary(self._run_metrics, 'Lane') | ||
| df = pd.DataFrame(ar) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the space here :)
| iop = InteropRunStatsParser(runfolder, non_index_reads) | ||
|
|
||
|
|
||
| data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two first assertions (Assert all 'Reads' and 'Reads Pf' values are consistent within the lane) would be more meaningful if we used the test data instead of a mock (where we explicitly define that Reads and Reads Pf should be the same on all rows).
It could be achieved by doing something like this:
interop_lane_summary = interop.summary(iop._run_metrics, 'Lane')
df = pd.DataFrame(interop_lane_summary)
I think the resulting data frame will be compatible with the rest of the test (from line 51).
| f"Mismatch in total_clusters_raw for lane {lane}" | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one empty line is enough here.
| """ | ||
| Verify that 'Reads' and 'Reads Pf' are consistently the same across all reads within the same lane | ||
| """ | ||
| non_index_reads = [0, 2, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here: I realized now that the test runfolder actually only have one non-index read and it is at index 0. I don't think it affects the test, but it could be change to avoid confusion in the future.
If you think this makes sense I would suggest fixing it in all tests except test_interop_standardize_read_numbers where I think it would be good to see that multiple non index reads are handled correctly.
|
Thanks @matrulda for the help. I have now pushed the updated code. |
matrulda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
Solves the bug found in this confluence report where it was found that raw_reads shouldn't be summed up i.e total_clusters_raw is handled in the same way as total_clusters_pf