feat: minor speedups #56

nh13 · 2025-02-26T05:16:25Z

This speeds up fqtk ~10% in my hands.

nh13 · 2025-02-26T05:17:15Z

src/bin/commands/demux.rs

@@ -93,6 +93,7 @@ impl ReadSet {
    const SPACE: u8 = b' ';
    const COLON: u8 = b':';
    const PLUS: u8 = b'+';
+    const READ_NUMBERS: &[u8] = &[b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8'];


profiling showed that converting the read number (usize) to a string was costing a non-trivial amount of time. Here we have a fast path for read numbers up to 8.

nh13 · 2025-02-26T05:18:25Z

src/bin/commands/demux.rs

-                    output_dir.join(format!("{}.{}{}.fq.gz", prefix, file_type_code, idx)),
-                )?));
+                output_type_writers.push(BufWriter::with_capacity(
+                    65_536usize,


I was seeing write_all_cold in the flame graph, which means that our buffer is improperly set (default 8_192usize). Increasing this by 4x sped things up and the write_all_cold call no longer shows up.

nh13 · 2025-02-26T05:20:09Z

src/lib/barcode_matching.rs

-            expected_bases.len(),
-            sample.sample_id
-        );
+        if sample.barcode_bytes.len() != observed_bases.len() {


two things:

The creation of observed_string was causing this function to be slower than it needed to be, since observed_string only is needed if we are going to panic.

Keeping a copy of the sample barcode as a vector of bytes is faster than calling as_bytes every time we compared the observed barcode to a sample (which is a lot)

nh13 · 2025-02-26T05:20:51Z

src/lib/barcode_matching.rs

-        for (&expected_base, &observed_base) in expected_bases.iter().zip(observed_bases.iter()) {
-            if !byte_is_nocall(expected_base) && expected_base != observed_base {
+        for (&expected_base, &observed_base) in sample.barcode_bytes.iter().zip(observed_bases.iter()) {
+            if expected_base != observed_base && !byte_is_nocall(expected_base) {


swapped the order of the conditionals since it is much more likely that we have a mismatch than a no call in the sample barcode.

* replace ahash with rustc-hash. rustc-hash was faster than ahash and fxhash empirically. * optimize how the read name, specifically the read number, is written. * increase the buffer size by 4xfor the output writers to 65K. * delay creating the read bases from bytes for the display in an panic until it is needed. This occurred during every comparison of the read sample barcode with the expected sample barcode, so while not individually expenseive, expensive enough in aggregate. * create the necessary bytes for each sample barcode only once, since this was being done every time we compared barcodes. Miscellaneous changes: * fix the usage for threads, where it incorrectly said the minimum number of threads was three, and should be five. This also changes the README. * update rust toolchain to 1.85 (from 1.65.0). This requires minor code changes throughout based on clippy requirements and formatting.

theJasonFan · 2025-05-20T20:08:10Z

src/lib/samples.rs

@@ -19,6 +19,9 @@ pub struct Sample {
    pub sample_id: String,
    /// DNA barcode associated with the sample
    pub barcode: String,
+    /// DNA barcode as a byte
+    #[serde(skip_deserializing)]
+    pub barcode_bytes: Vec<u8>,


Question: Why add a barcode_bytes field?

String implements Clone and into_bytes so if a user needs bytes they can still call sample.barcode.clone().into_bytes() if they need a Vec<u8>. And if they need a slice they can use sample.barcode.as_bytes().

I'm getting confused about parts of a previous implementation that has since been removed.

theJasonFan · 2025-05-20T20:09:23Z

src/bin/commands/demux.rs

@@ -239,7 +245,11 @@ impl ReadSet {
                        &chars[first_colon_idx + 1..chars.len()]
                    };

-                    write!(writer, "{}:", read_num)?;
+                    if read_num < Self::READ_NUMBERS.len() {


If you don't take the suggestion above... what if read_num == 8?

Suggested change

if read_num < Self::READ_NUMBERS.len() {

if read_num <= Self::READ_NUMBERS.len() {

theJasonFan · 2025-05-20T20:13:41Z

src/bin/commands/demux.rs

@@ -93,6 +93,7 @@ impl ReadSet {
    const SPACE: u8 = b' ';
    const COLON: u8 = b':';
    const PLUS: u8 = b'+';
+    const READ_NUMBERS: &[u8] = b"12345678";


minor (non-blocking) suggestion:

Start at 0 and add 9.

Then, below, you can just write:

writer.write_all(&[Self::READ_NUMBERS[read_num]])?;

Suggested change

const READ_NUMBERS: &[u8] = b"12345678";

const READ_NUMBERS: &[u8] = b"0123456789";

theJasonFan · 2025-05-20T20:14:44Z

src/bin/commands/demux.rs

@@ -213,7 +214,12 @@ impl ReadSet {
            None => {
                // If no pre-existing comment, assume the read is a passing filter, non-control
                // read and generate a comment for it (sample barcode is added below).
-                write!(writer, "{}:N:0:", read_num)?;
+                if read_num < Self::READ_NUMBERS.len() {


Same comment as below about < -> <= if you don't start at 0.

Probably worth leaving a comment about performance and why we want an optimization here.

theJasonFan

@nh13 some minor comments. One additional must-do request below.

Could you add a screenshot of the flamegraph to this PR and share the steps you took to benchmark fqtk in the PR description so others can just reproduce or reference this for future performance testing?

Thanks

nh13 requested a review from tfenne as a code owner February 26, 2025 05:16

nh13 commented Feb 26, 2025

View reviewed changes

nh13 requested a review from jdidion March 7, 2025 21:40

nh13 force-pushed the feat/minor-speedups branch from f76d08b to 2e7e419 Compare March 13, 2025 17:59

nh13 assigned jdidion Mar 13, 2025

nh13 force-pushed the feat/minor-speedups branch 2 times, most recently from 10776b3 to 803f453 Compare March 14, 2025 03:59

nh13 force-pushed the feat/minor-speedups branch from 803f453 to 4c6a22d Compare March 14, 2025 04:09

nh13 removed the request for review from jdidion May 20, 2025 18:01

nh13 assigned theJasonFan and unassigned jdidion May 20, 2025

theJasonFan reviewed May 20, 2025

View reviewed changes

theJasonFan requested changes May 20, 2025

View reviewed changes

nh13 assigned nh13 and unassigned theJasonFan Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: minor speedups #56

feat: minor speedups #56

Uh oh!

nh13 commented Feb 26, 2025

Uh oh!

nh13 Feb 26, 2025

Uh oh!

nh13 Feb 26, 2025

Uh oh!

nh13 Feb 26, 2025

Uh oh!

nh13 Feb 26, 2025

Uh oh!

theJasonFan May 20, 2025

Uh oh!

theJasonFan May 20, 2025

Uh oh!

theJasonFan May 20, 2025 •

edited

Loading

Uh oh!

theJasonFan May 20, 2025

Uh oh!

theJasonFan May 20, 2025

Uh oh!

theJasonFan left a comment •

edited

Loading

Uh oh!

Uh oh!

	if read_num < Self::READ_NUMBERS.len() {
	if read_num <= Self::READ_NUMBERS.len() {

	const READ_NUMBERS: &[u8] = b"12345678";
	const READ_NUMBERS: &[u8] = b"0123456789";

feat: minor speedups #56

Are you sure you want to change the base?

feat: minor speedups #56

Uh oh!

Conversation

nh13 commented Feb 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theJasonFan May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theJasonFan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

theJasonFan May 20, 2025 •

edited

Loading

theJasonFan left a comment •

edited

Loading