Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: minor speedups #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

feat: minor speedups #56

wants to merge 1 commit into from

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Feb 26, 2025

This speeds up fqtk ~10% in my hands.

@nh13 nh13 requested a review from tfenne as a code owner February 26, 2025 05:16
@@ -93,6 +93,7 @@ impl ReadSet {
const SPACE: u8 = b' ';
const COLON: u8 = b':';
const PLUS: u8 = b'+';
const READ_NUMBERS: &[u8] = &[b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8'];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profiling showed that converting the read number (usize) to a string was costing a non-trivial amount of time. Here we have a fast path for read numbers up to 8.

output_dir.join(format!("{}.{}{}.fq.gz", prefix, file_type_code, idx)),
)?));
output_type_writers.push(BufWriter::with_capacity(
65_536usize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing write_all_cold in the flame graph, which means that our buffer is improperly set (default 8_192usize). Increasing this by 4x sped things up and the write_all_cold call no longer shows up.

expected_bases.len(),
sample.sample_id
);
if sample.barcode_bytes.len() != observed_bases.len() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  1. The creation of observed_string was causing this function to be slower than it needed to be, since observed_string only is needed if we are going to panic.
  2. Keeping a copy of the sample barcode as a vector of bytes is faster than calling as_bytes every time we compared the observed barcode to a sample (which is a lot)

for (&expected_base, &observed_base) in expected_bases.iter().zip(observed_bases.iter()) {
if !byte_is_nocall(expected_base) && expected_base != observed_base {
for (&expected_base, &observed_base) in sample.barcode_bytes.iter().zip(observed_bases.iter()) {
if expected_base != observed_base && !byte_is_nocall(expected_base) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped the order of the conditionals since it is much more likely that we have a mismatch than a no call in the sample barcode.

@nh13 nh13 requested a review from jdidion March 7, 2025 21:40
@nh13 nh13 force-pushed the feat/minor-speedups branch from f76d08b to 2e7e419 Compare March 13, 2025 17:59
@nh13 nh13 force-pushed the feat/minor-speedups branch 2 times, most recently from 10776b3 to 803f453 Compare March 14, 2025 03:59
* replace ahash with rustc-hash.  rustc-hash was faster than
  ahash and fxhash empirically.
* optimize how the read name, specifically the read number, is
  written.
* increase the buffer size by 4xfor the output writers to 65K.
* delay creating the read bases from bytes for the display in an
  panic until it is needed.  This occurred during every comparison
  of the read sample barcode with the expected sample barcode, so
  while not individually expenseive, expensive enough in aggregate.
* create the necessary bytes for each sample barcode only once, since
  this was being done every time we compared barcodes.

Miscellaneous changes:

* fix the usage for threads, where it incorrectly said the minimum
  number of threads was three, and should be five.  This also
  changes the README.
* update rust toolchain to 1.85 (from 1.65.0).  This requires minor
  code changes throughout based on clippy requirements and
  formatting.
@nh13 nh13 force-pushed the feat/minor-speedups branch from 803f453 to 4c6a22d Compare March 14, 2025 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants