-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: minor speedups #56
base: main
Are you sure you want to change the base?
Conversation
src/bin/commands/demux.rs
Outdated
@@ -93,6 +93,7 @@ impl ReadSet { | |||
const SPACE: u8 = b' '; | |||
const COLON: u8 = b':'; | |||
const PLUS: u8 = b'+'; | |||
const READ_NUMBERS: &[u8] = &[b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
profiling showed that converting the read number (usize
) to a string was costing a non-trivial amount of time. Here we have a fast path for read numbers up to 8.
output_dir.join(format!("{}.{}{}.fq.gz", prefix, file_type_code, idx)), | ||
)?)); | ||
output_type_writers.push(BufWriter::with_capacity( | ||
65_536usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing write_all_cold
in the flame graph, which means that our buffer is improperly set (default 8_192usize
). Increasing this by 4x sped things up and the write_all_cold
call no longer shows up.
src/lib/barcode_matching.rs
Outdated
expected_bases.len(), | ||
sample.sample_id | ||
); | ||
if sample.barcode_bytes.len() != observed_bases.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two things:
- The creation of
observed_string
was causing this function to be slower than it needed to be, sinceobserved_string
only is needed if we are going to panic. - Keeping a copy of the sample barcode as a vector of bytes is faster than calling
as_bytes
every time we compared the observed barcode to a sample (which is a lot)
src/lib/barcode_matching.rs
Outdated
for (&expected_base, &observed_base) in expected_bases.iter().zip(observed_bases.iter()) { | ||
if !byte_is_nocall(expected_base) && expected_base != observed_base { | ||
for (&expected_base, &observed_base) in sample.barcode_bytes.iter().zip(observed_bases.iter()) { | ||
if expected_base != observed_base && !byte_is_nocall(expected_base) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swapped the order of the conditionals since it is much more likely that we have a mismatch than a no call in the sample barcode.
f76d08b
to
2e7e419
Compare
10776b3
to
803f453
Compare
* replace ahash with rustc-hash. rustc-hash was faster than ahash and fxhash empirically. * optimize how the read name, specifically the read number, is written. * increase the buffer size by 4xfor the output writers to 65K. * delay creating the read bases from bytes for the display in an panic until it is needed. This occurred during every comparison of the read sample barcode with the expected sample barcode, so while not individually expenseive, expensive enough in aggregate. * create the necessary bytes for each sample barcode only once, since this was being done every time we compared barcodes. Miscellaneous changes: * fix the usage for threads, where it incorrectly said the minimum number of threads was three, and should be five. This also changes the README. * update rust toolchain to 1.85 (from 1.65.0). This requires minor code changes throughout based on clippy requirements and formatting.
803f453
to
4c6a22d
Compare
This speeds up fqtk ~10% in my hands.