@@ -62,15 +62,21 @@ be concatenated using the `-` delimiter and placed in the given SAM record tag (
62
62
default). Similarly, the sample barcode bases from the given read will be placed in the `BC`
63
63
tag.
64
64
65
- Metadata about the samples should be given as a headered metadata TSV file with at least the
65
+ Metadata about the samples should be given as a headered metadata TSV file with at least the
66
66
following two columns present:
67
67
68
- 1. `sample_id` - the id of the sample or library.
68
+ 1. `sample_id` - the id of the sample or library.
69
69
2. `barcode` - the expected barcode sequence associated with the `sample_id`.
70
70
71
- For reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be
71
+ For reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be
72
72
concatenated together in the order they are read and stored in the `barcode` field.
73
73
74
+ IUPAC bases are supported in the (expected) `barcode` column. An observed IUPAC base must be
75
+ at least as specific as the corresponding base in the expected sample barcode. E.g. If the
76
+ observed base is an N, it will only match expected sample barcrods with an N. And if the
77
+ observed base is an R, it will match R, V, D, and N, since the latter IUPAC codes allow both
78
+ A and G (R/V/D/N are a superset of the bases compare to R).
79
+
74
80
The read structures will be used to extract the observed sample barcode, template bases, and
75
81
molecular identifiers from each read. The observed sample barcode will be matched to the
76
82
sample barcodes extracted from the bases in the sample metadata and associated read structures.
@@ -80,6 +86,7 @@ An observed barcode matches an expected barcode if all the following are true:
80
86
mismatches (see `--max-mismatches`).
81
87
2. The difference between number of mismatches in the best and second best barcodes is greater
82
88
than or equal to the minimum mismatch delta (`--min-mismatch-delta`).
89
+
83
90
The expected barcode sequence may contains Ns, which are not counted as mismatches regardless
84
91
of the observed base (e.g. the expected barcode `AAN` will have zero mismatches relative to
85
92
both the observed barcodes `AAA` and `AAN`).
0 commit comments