Skip to content

Commit 715735e

Browse files
authored
ci: add a check that the usage is up-to-date (#57)
* ci: add a check that the usage is up-to-date Also, update checkout actions to v4.
1 parent 8ccc638 commit 715735e

File tree

3 files changed

+119
-64
lines changed

3 files changed

+119
-64
lines changed

.github/scripts/update-docs.sh

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
3+
set -euo pipefail
4+
5+
echo -e "<!-- start usage -->\n\`\`\`console\n" > usage.txt;
6+
./target/debug/fqtk demux --help | sed -e 's_^[ ]*$__g' >> usage.txt
7+
echo -e "\`\`\`\n<!-- end usage -->" >> usage.txt;
8+
sed -e '/<!-- start usage -->/,/<!-- end usage -->/!b' -e '/<!-- end usage -->/!d;r usage.txt' -e 'd' README.md > README.md.new;
9+
mv README.md.new README.md;
10+
rm usage.txt;

.github/workflows/build_and_test.yml

+49-1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,54 @@ env:
66
CARGO_TERM_COLOR: always
77

88
jobs:
9+
check:
10+
name: Check
11+
runs-on: ubuntu-24.04
12+
steps:
13+
- name: Checkout sources
14+
uses: actions/checkout@v4
15+
16+
- name: Install stable toolchain
17+
uses: codota/toolchain@v1
18+
with:
19+
profile: minimal
20+
toolchain: stable
21+
22+
- name: Cache dependencies
23+
uses: Swatinem/rust-cache@v2
24+
25+
- name: Run cargo check
26+
uses: actions-rs/cargo@v1
27+
with:
28+
command: check
29+
30+
- name: Run cargo build
31+
uses: actions-rs/cargo@v1
32+
with:
33+
command: build
34+
35+
- name: Append usage to the README.md
36+
shell: bash
37+
run: .github/scripts/update-docs.sh
38+
39+
- name: Verify no unstaged changes
40+
shell: bash
41+
run: |
42+
if [[ "$(git status --porcelain)" != "" ]]; then
43+
echo ----------------------------------------
44+
echo git status
45+
echo ----------------------------------------
46+
git status
47+
echo ----------------------------------------
48+
echo git diff
49+
echo ----------------------------------------
50+
git diff
51+
echo ----------------------------------------
52+
echo Troubleshooting
53+
echo ----------------------------------------
54+
echo "::error::Unstaged changes detected. You probably need to update the usage in the README.md. Use `.github/scripts/update-docs`."
55+
exit 1
56+
fi
957
precommit:
1058
name: Pre-commit
1159
runs-on: ${{ matrix.os }}
@@ -14,7 +62,7 @@ jobs:
1462
os: [ubuntu-24.04, macOS-latest]
1563
steps:
1664
- name: Checkout sources
17-
uses: actions/checkout@v2
65+
uses: actions/checkout@v4
1866

1967
- name: Install stable toolchain
2068
uses: codota/toolchain@v1

README.md

+60-63
Original file line numberDiff line numberDiff line change
@@ -25,96 +25,96 @@ It is highly efficient and multi-threaded for high performance.
2525

2626
Usage for `fqtk demux` follows:
2727

28+
<!-- start usage -->
2829
```console
30+
2931
Performs sample demultiplexing on FASTQs.
3032

31-
The sample barcode for each sample in the metadata TSV will be compared against
32-
the sample barcode bases extracted from the FASTQs, to assign each read to a
33-
sample. Reads that do not match any sample within the given error tolerance
34-
will be placed in the ``unmatched_prefix`` file.
33+
The sample barcode for each sample in the metadata TSV will be compared against the sample
34+
barcode bases extracted from the FASTQs, to assign each read to a sample. Reads that do not
35+
match any sample within the given error tolerance will be placed in the ``unmatched_prefix``
36+
file.
3537

3638
FASTQs and associated read structures for each sub-read should be given:
3739

38-
- a single fragment read (with inline index) should have one FASTQ and one read
39-
structure
40-
- paired end reads should have two FASTQs and two read structures
41-
- a dual-index sample with paired end reads should have four FASTQs and four read
42-
structures given: two for the two index reads, and two for the template reads.
40+
- a single fragment read (with inline index) should have one FASTQ and one read structure
41+
- paired end reads should have two FASTQs and two read structures
42+
- a dual-index sample with paired end reads should have four FASTQs and four read structures
43+
given: two for the two index reads, and two for the template reads.
4344

44-
If multiple FASTQs are present for each sub-read, then the FASTQs for each
45-
sub-read should be concatenated together prior to running this tool (e.g.
46-
`zcat s_R1_L001.fq.gz s_R1_L002.fq.gz | bgzip -c > s_R1.fq.gz`).
45+
If multiple FASTQs are present for each sub-read, then the FASTQs for each sub-read should be
46+
concatenated together prior to running this tool
47+
(e.g. `zcat s_R1_L001.fq.gz s_R1_L002.fq.gz | bgzip -c > s_R1.fq.gz`).
4748

48-
Read structures are made up of `<number><operator>` pairs much like the `CIGAR`
49-
string in BAM files. Four kinds of operators are recognized:
49+
(Read structures)[<https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures>] are made up of
50+
`<number><operator>` pairs much like the `CIGAR` string in BAM files.
51+
Four kinds of operators are recognized:
5052

5153
1. `T` identifies a template read
5254
2. `B` identifies a sample barcode read
5355
3. `M` identifies a unique molecular index read
5456
4. `S` identifies a set of bases that should be skipped or ignored
5557

56-
The last `<number><operator>` pair may be specified using a `+` sign instead of
57-
number to denote "all remaining bases". This is useful if, e.g., fastqs have
58-
been trimmed and contain reads of varying length. Both reads must have template
59-
bases. Any molecular identifiers will be concatenated using the `-` delimiter
60-
and placed in the given SAM record tag (`RX` by default). Similarly, the sample
61-
barcode bases from the given read will be placed in the `BC` tag.
58+
The last `<number><operator>` pair may be specified using a `+` sign instead of number to
59+
denote "all remaining bases". This is useful if, e.g., fastqs have been trimmed and contain
60+
reads of varying length. Both reads must have template bases. Any molecular identifiers will
61+
be concatenated using the `-` delimiter and placed in the given SAM record tag (`RX` by
62+
default). Similarly, the sample barcode bases from the given read will be placed in the `BC`
63+
tag.
6264

63-
Metadata about the samples should be given as a headered metadata TSV file with
64-
at least the following two columns present:
65+
Metadata about the samples should be given as a headered metadata TSV file with at least the
66+
following two columns present:
6567

6668
1. `sample_id` - the id of the sample or library.
6769
2. `barcode` - the expected barcode sequence associated with the `sample_id`.
6870

69-
For reads containing multiple barcodes (such as dual-indexed reads), all barcodes
70-
should be concatenated together in the order they are read and stored in the
71-
`barcode` field.
71+
For reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be
72+
concatenated together in the order they are read and stored in the `barcode` field.
7273

73-
The read structures will be used to extract the observed sample barcode, template
74-
bases, and molecular identifiers from each read. The observed sample barcode
75-
will be matched to the sample barcodes extracted from the bases in the sample
76-
metadata and associated read structures.
74+
The read structures will be used to extract the observed sample barcode, template bases, and
75+
molecular identifiers from each read. The observed sample barcode will be matched to the
76+
sample barcodes extracted from the bases in the sample metadata and associated read structures.
7777

7878
An observed barcode matches an expected barcode if all the following are true:
79-
80-
1. The number of mismatches (edits/substitutions) is less than or equal to the
81-
maximum mismatches (see --max-mismatches).
82-
2. The difference between number of mismatches in the best and second best
83-
barcodes is greater than or equal to the minimum mismatch delta
84-
(`--min-mismatch-delta`). The expected barcode sequence may contains Ns,
85-
which are not counted as mismatches regardless of the observed base (e.g.
86-
the expected barcode `AAN` will have zero mismatches relative to both the
87-
observed barcodes `AAA` and `AAN`).
79+
1. The number of mismatches (edits/substitutions) is less than or equal to the maximum
80+
mismatches (see `--max-mismatches`).
81+
2. The difference between number of mismatches in the best and second best barcodes is greater
82+
than or equal to the minimum mismatch delta (`--min-mismatch-delta`).
83+
The expected barcode sequence may contains Ns, which are not counted as mismatches regardless
84+
of the observed base (e.g. the expected barcode `AAN` will have zero mismatches relative to
85+
both the observed barcodes `AAA` and `AAN`).
8886

8987
## Outputs
9088

91-
All outputs are generated in the provided `--output` directory. For each sample
92-
plus the unmatched reads, FASTQ files are written for each read segment
93-
(specified in the read structures) of one of the types supplied to
94-
`--output-types`.
95-
96-
FASTQ files have names of the format:
89+
All outputs are generated in the provided `--output` directory. For each sample plus the
90+
unmatched reads, FASTQ files are written for each read segment (specified in the read
91+
structures) of one of the types supplied to `--output-types`. FASTQ files have names
92+
of the format:
9793

94+
```bash
9895
{sample_id}.{segment_type}{read_num}.fq.gz
96+
```
9997

100-
where `segment_type` is one of `R`, `I`, and `U` (for template, barcode/index
101-
and molecular barcode/UMI reads respectively) and `read_num` is a number starting
102-
at 1 for each segment type.
98+
where `segment_type` is one of `R`, `I`, and `U` (for template, barcode/index and molecular
99+
barcode/UMI reads respectively) and `read_num` is a number starting at 1 for each segment
100+
type.
103101

104-
In addition a `demux-metrics.txt` file is written that is a tab-delimited file
105-
with counts of how many reads were assigned to each sample and derived metrics.
102+
In addition a `demux-metrics.txt` file is written that is a tab-delimited file with counts
103+
of how many reads were assigned to each sample and derived metrics.
106104

107105
## Example Command Line
108106

109-
As an example, if the sequencing run was 2x100bp (paired end) with two 8bp index
110-
reads both reading a sample barcode, as well as an in-line 8bp sample barcode in
111-
read one, the command line would be:
107+
As an example, if the sequencing run was 2x100bp (paired end) with two 8bp index reads both
108+
reading a sample barcode, as well as an in-line 8bp sample barcode in read one, the command
109+
line would be:
112110

111+
```bash
113112
fqtk demux \
114-
--inputs r1.fq.gz i1.fq.gz i2.fq.gz r2.fq.gz \
115-
--read-structures 8B92T 8B 8B 100T \
116-
--sample-metadata metadata.tsv \
117-
--output output_folder
113+
--inputs r1.fq.gz i1.fq.gz i2.fq.gz r2.fq.gz \
114+
--read-structures 8B92T 8B 8B 100T \
115+
--sample-metadata metadata.tsv \
116+
--output output_folder
117+
```
118118

119119
Usage: fqtk demux [OPTIONS] --inputs <INPUTS>... --read-structures <READ_STRUCTURES>... --sample-metadata <SAMPLE_METADATA> --output <OUTPUT>
120120

@@ -126,8 +126,7 @@ Options:
126126
The read structures, one per input FASTQ in the same order
127127

128128
-b, --output-types <OUTPUT_TYPES>...
129-
The read structure types to write to their own files (Must be one of T, B,
130-
or M for template reads, sample barcode reads, and molecular barcode reads)
129+
The read structure types to write to their own files (Must be one of T, B, or M for template reads, sample barcode reads, and molecular barcode reads).
131130

132131
Multiple output types may be specified as a space-delimited list.
133132

@@ -150,8 +149,7 @@ Options:
150149
[default: 1]
151150

152151
-d, --min-mismatch-delta <MIN_MISMATCH_DELTA>
153-
Minimum difference between number of mismatches in the best and second best barcodes
154-
for a barcode to be considered a match
152+
Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match
155153

156154
[default: 2]
157155

@@ -168,16 +166,15 @@ Options:
168166
-S, --skip-reasons <SKIP_REASONS>
169167
Skip demultiplexing reads for any of the following reasons, otherwise panic.
170168

171-
1. `too-few-bases`: there are too few bases or qualities to extract given the
172-
read structures. For example, if a read is 8bp long but the read structure
173-
is `10B`, or if a read is empty and the read structure is `+T`.
169+
1. `too-few-bases`: there are too few bases or qualities to extract given the read structures. For example, if a read is 8bp long but the read structure is `10B`, or if a read is empty and the read structure is `+T`.
174170

175171
-h, --help
176172
Print help information (use `-h` for a summary)
177173

178174
-V, --version
179175
Print version information
180176
```
177+
<!-- end usage -->
181178
182179
## Installing
183180

0 commit comments

Comments
 (0)