-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: support IUPAC bases - attempt 2 #54
Conversation
@@ -0,0 +1,571 @@ | |||
// Copyright 2014-2016 Johannes Köster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled directly from rust-bio so we don't have to have it as a full dependency.
self.len == 0 | ||
} | ||
|
||
/// Calculate the Hamming distance between this and another bitencoded sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new!
} | ||
|
||
#[test] | ||
fn test_hamming() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is new!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dnr bitenc.rs - please lmk if you want me to come back and look at that as well
4dc2274
to
877f21c
Compare
src/lib/mod.rs
Outdated
pub mod samples; | ||
|
||
use crate::bitenc::BitEnc; | ||
use lazy_static::lazy_static; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no longer necessary to use lazy_static
thanks to the addition of std::sync::LazyLock
. If this was legacy I'd say just add it as a todo, but since it's new I'd say do it in this PR to avoid introducing a new unnecessary dependency.
@@ -1,14 +1,97 @@ | |||
pub mod barcode_matching; | |||
pub mod bitenc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this needs to be exposed externally?
/// N, it will not match anything but an N, and if the other base is an R, it | ||
/// will match R, V, D, and N, since the latter IUPAC codes allow both A and G. | ||
pub fn hamming(&self, other: &BitEnc, max_mismatches: u32) -> u32 { | ||
assert!(self.len == other.len, "Both bitenc sequences must have the same length"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this module were internal only I wouldn't care as much about this, but you're exporting it so panicking here might lead to unpleasant surprises. At a minimum, you should document this potential panic. However, I would strongly recommend to instead return a Result
with a custom error type.
/// will match R, V, D, and N, since the latter IUPAC codes allow both A and G. | ||
pub fn hamming(&self, other: &BitEnc, max_mismatches: u32) -> u32 { | ||
assert!(self.len == other.len, "Both bitenc sequences must have the same length"); | ||
assert!(self.width == other.width, "Both bitenc sequences must have the same width"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
src/lib/bitenc.rs
Outdated
let values_per_block = self.usable_bits_per_block / self.width; | ||
for block_index in 0..self.nr_blocks() { | ||
let intersection = self.storage[block_index] & other.storage[block_index]; | ||
if intersection != self.storage[block_index] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly complicated. I think what you want is self.storage[block_index] & !other.storage[block_index]
.
assert!(self.width == other.width, "Both bitenc sequences must have the same width"); | ||
let mut count: u32 = 0; | ||
let values_per_block = self.usable_bits_per_block / self.width; | ||
for block_index in 0..self.nr_blocks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this could be written using (0..self.nr_blocks()).into_iter().scan(..)
to avoid the need for let mut
.
baefa86
to
eb401f2
Compare
eb401f2
to
4c0de2a
Compare
Alternative to #53
I found that
bio-seq
was significantly slower than usingbitenc
frombio
when there were A LOT of barcodes that didn't match. Rather than pulling in all of rust-bio, I copiedbitenc
from rust-bio, then added thehamming
method (that's slightly faster than our current method) and associated tests.