Skip to content

feat: minor speedups #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

feat: minor speedups #56

wants to merge 1 commit into from

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Feb 26, 2025

This speeds up fqtk ~10% in my hands.

@nh13 nh13 requested a review from tfenne as a code owner February 26, 2025 05:16
@@ -93,6 +93,7 @@ impl ReadSet {
const SPACE: u8 = b' ';
const COLON: u8 = b':';
const PLUS: u8 = b'+';
const READ_NUMBERS: &[u8] = &[b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8'];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profiling showed that converting the read number (usize) to a string was costing a non-trivial amount of time. Here we have a fast path for read numbers up to 8.

output_dir.join(format!("{}.{}{}.fq.gz", prefix, file_type_code, idx)),
)?));
output_type_writers.push(BufWriter::with_capacity(
65_536usize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing write_all_cold in the flame graph, which means that our buffer is improperly set (default 8_192usize). Increasing this by 4x sped things up and the write_all_cold call no longer shows up.

expected_bases.len(),
sample.sample_id
);
if sample.barcode_bytes.len() != observed_bases.len() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  1. The creation of observed_string was causing this function to be slower than it needed to be, since observed_string only is needed if we are going to panic.
  2. Keeping a copy of the sample barcode as a vector of bytes is faster than calling as_bytes every time we compared the observed barcode to a sample (which is a lot)

for (&expected_base, &observed_base) in expected_bases.iter().zip(observed_bases.iter()) {
if !byte_is_nocall(expected_base) && expected_base != observed_base {
for (&expected_base, &observed_base) in sample.barcode_bytes.iter().zip(observed_bases.iter()) {
if expected_base != observed_base && !byte_is_nocall(expected_base) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped the order of the conditionals since it is much more likely that we have a mismatch than a no call in the sample barcode.

@nh13 nh13 requested a review from jdidion March 7, 2025 21:40
@nh13 nh13 force-pushed the feat/minor-speedups branch from f76d08b to 2e7e419 Compare March 13, 2025 17:59
@nh13 nh13 force-pushed the feat/minor-speedups branch 2 times, most recently from 10776b3 to 803f453 Compare March 14, 2025 03:59
* replace ahash with rustc-hash.  rustc-hash was faster than
  ahash and fxhash empirically.
* optimize how the read name, specifically the read number, is
  written.
* increase the buffer size by 4xfor the output writers to 65K.
* delay creating the read bases from bytes for the display in an
  panic until it is needed.  This occurred during every comparison
  of the read sample barcode with the expected sample barcode, so
  while not individually expenseive, expensive enough in aggregate.
* create the necessary bytes for each sample barcode only once, since
  this was being done every time we compared barcodes.

Miscellaneous changes:

* fix the usage for threads, where it incorrectly said the minimum
  number of threads was three, and should be five.  This also
  changes the README.
* update rust toolchain to 1.85 (from 1.65.0).  This requires minor
  code changes throughout based on clippy requirements and
  formatting.
@nh13 nh13 force-pushed the feat/minor-speedups branch from 803f453 to 4c6a22d Compare March 14, 2025 04:09
@nh13 nh13 removed the request for review from jdidion May 20, 2025 18:01
@nh13 nh13 assigned theJasonFan and unassigned jdidion May 20, 2025
@@ -19,6 +19,9 @@ pub struct Sample {
pub sample_id: String,
/// DNA barcode associated with the sample
pub barcode: String,
/// DNA barcode as a byte
#[serde(skip_deserializing)]
pub barcode_bytes: Vec<u8>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why add a barcode_bytes field?

String implements Clone and into_bytes so if a user needs bytes they can still call sample.barcode.clone().into_bytes() if they need a Vec<u8>. And if they need a slice they can use sample.barcode.as_bytes().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting confused about parts of a previous implementation that has since been removed.

@@ -239,7 +245,11 @@ impl ReadSet {
&chars[first_colon_idx + 1..chars.len()]
};

write!(writer, "{}:", read_num)?;
if read_num < Self::READ_NUMBERS.len() {
Copy link
Member

@theJasonFan theJasonFan May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't take the suggestion above... what if read_num == 8?

Suggested change
if read_num < Self::READ_NUMBERS.len() {
if read_num <= Self::READ_NUMBERS.len() {

@@ -93,6 +93,7 @@ impl ReadSet {
const SPACE: u8 = b' ';
const COLON: u8 = b':';
const PLUS: u8 = b'+';
const READ_NUMBERS: &[u8] = b"12345678";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor (non-blocking) suggestion:

Start at 0 and add 9.

Then, below, you can just write:

writer.write_all(&[Self::READ_NUMBERS[read_num]])?;
Suggested change
const READ_NUMBERS: &[u8] = b"12345678";
const READ_NUMBERS: &[u8] = b"0123456789";

@@ -213,7 +214,12 @@ impl ReadSet {
None => {
// If no pre-existing comment, assume the read is a passing filter, non-control
// read and generate a comment for it (sample barcode is added below).
write!(writer, "{}:N:0:", read_num)?;
if read_num < Self::READ_NUMBERS.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Same comment as below about < -> <= if you don't start at 0.
  2. Probably worth leaving a comment about performance and why we want an optimization here.

Copy link
Member

@theJasonFan theJasonFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 some minor comments. One additional must-do request below.

Could you add a screenshot of the flamegraph to this PR and share the steps you took to benchmark fqtk in the PR description so others can just reproduce or reference this for future performance testing?

Thanks

@nh13 nh13 assigned nh13 and unassigned theJasonFan Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants