Skip to content

Commit c0de006

Browse files
authored
feature: support IUPAC bases - attempt 2 (#54)
* feature: support IUPAC bases
1 parent 715735e commit c0de006

9 files changed

+1018
-152
lines changed

Cargo.lock

+80-36
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ authors = [
66
]
77
name = "fqtk"
88
version = "0.3.2-rc.1"
9-
edition = "2021"
9+
edition = "2024"
1010
license = "MIT"
1111
readme = "README.md"
1212
homepage = "https://github.com/fulcrumgenomics/fqtk"
@@ -30,7 +30,7 @@ path = "src/bin/main.rs"
3030
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
3131

3232
[dependencies]
33-
ahash = "0.8.2"
33+
ahash = "0.8.11"
3434
anyhow = "1.0.38"
3535
bstr = "1.0.1"
3636
clap = { version = "4.0.25", features = ["derive"] }

README.md

+10-3
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,21 @@ be concatenated using the `-` delimiter and placed in the given SAM record tag (
6262
default). Similarly, the sample barcode bases from the given read will be placed in the `BC`
6363
tag.
6464

65-
Metadata about the samples should be given as a headered metadata TSV file with at least the
65+
Metadata about the samples should be given as a headered metadata TSV file with at least the
6666
following two columns present:
6767

68-
1. `sample_id` - the id of the sample or library.
68+
1. `sample_id` - the id of the sample or library.
6969
2. `barcode` - the expected barcode sequence associated with the `sample_id`.
7070

71-
For reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be
71+
For reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be
7272
concatenated together in the order they are read and stored in the `barcode` field.
7373

74+
IUPAC bases are supported in the (expected) `barcode` column. An observed IUPAC base must be
75+
at least as specific as the corresponding base in the expected sample barcode. E.g. If the
76+
observed base is an N, it will only match expected sample barcrods with an N. And if the
77+
observed base is an R, it will match R, V, D, and N, since the latter IUPAC codes allow both
78+
A and G (R/V/D/N are a superset of the bases compare to R).
79+
7480
The read structures will be used to extract the observed sample barcode, template bases, and
7581
molecular identifiers from each read. The observed sample barcode will be matched to the
7682
sample barcodes extracted from the bases in the sample metadata and associated read structures.
@@ -80,6 +86,7 @@ An observed barcode matches an expected barcode if all the following are true:
8086
mismatches (see `--max-mismatches`).
8187
2. The difference between number of mismatches in the best and second best barcodes is greater
8288
than or equal to the minimum mismatch delta (`--min-mismatch-delta`).
89+
8390
The expected barcode sequence may contains Ns, which are not counted as mismatches regardless
8491
of the observed base (e.g. the expected barcode `AAN` will have zero mismatches relative to
8592
both the observed barcodes `AAA` and `AAN`).

rust-toolchain.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[toolchain]
2-
channel = "1.65.0"
2+
channel = "1.85"
33
components = ["rustfmt", "clippy"]

0 commit comments

Comments
 (0)