Skip to content

Commit b488d1b

Browse files
committed
update cxxargs and check & fix usage instructions
1 parent a2e99a1 commit b488d1b

File tree

3 files changed

+57
-15
lines changed

3 files changed

+57
-15
lines changed

README.md

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,25 @@
11
# msweep-assembly
22

3-
mSWEEP genome assembly plugin code.
3+
mSWEEP binning + assembly plugin code.
44

55
# Installation
6+
## Dependencies
7+
To run the binning + assembly pipeline, you will need a program that
8+
does pseudoalignment and another program that estimates an assignment
9+
probability matrix for the reads to the alignment targets.
10+
11+
We recommend to use [Themisto](https://github.com/jnalanko/themisto)
12+
(v0.1.1 or newer) for pseudoalignment and
13+
[mSWEEP](https://github.com/probic/msweep-assembly) (v1.3.2 or newer)
14+
for estimating the probability matrix.
15+
616
## Compiling from source
717
### Requirements
818
- C++11 compliant compiler.
919
- cmake
1020

1121
### Compilation
12-
Clone the repository (note the --recursive option in git clone)
22+
Clone the repository (note the *--recursive* option in git clone)
1323
```
1424
git clone --recursive https://github.com/PROBIC/msweep-assembly.git
1525
```
@@ -20,42 +30,62 @@ enter the directory and run
2030
> cmake ..
2131
> make
2232
```
23-
This will compile the read_alignment, assign_reads, and build_sample executables in the build/bin/ directory.
24-
33+
This will compile the read_alignment, assign_reads, build_sample, and telescope executables in the build/bin/ directory.
2534

2635
# Usage
27-
Align paired-end reads 'reads_1.fastq.gz' and 'reads_2.fastq.gz' with [Themisto]()
36+
## Indexing
37+
Build a [Themisto](https://github.com/jnalanko/themisto) index to
38+
align against.
2839
```
29-
pseudoalign --index-dir themisto_index --query-file reads_1.fastq.gz --outfile pseudoalignments_1.txt --rc --temp-dir tmp --n-threads 16 --mem-megas 8192
30-
pseudoalign --index-dir themisto_index --query-file reads_2.fastq.gz --outfile pseudoalignments_2.txt --rc --temp-dir tmp --n-threads 16 --mem-megas 8192
40+
mkdir themisto_index
41+
mkdir themisto_index/tmp
42+
build_index --k 31 --input-file example.fasta --auto-colors --index-dir themisto_index --temp-dir themisto_index/tmp
3143
```
3244

33-
Convert the pseudoalignment to [kallisto]() format using [telescope]()
45+
Align paired-end reads 'reads_1.fastq.gz' and 'reads_2.fastq.gz' with Themisto
46+
```
47+
pseudoalign --index-dir themisto_index --query-file reads_1.fastq.gz --outfile pseudoalignments_1.txt --rc --temp-dir themisto_index/tmp --n-threads 16 --mem-megas 8192
48+
pseudoalign --index-dir themisto_index --query-file reads_2.fastq.gz --outfile pseudoalignments_2.txt --rc --temp-dir themisto_index/tmp --n-threads 16 --mem-megas 8192
3449
```
50+
51+
Convert the pseudoalignment to
52+
[kallisto](https://github.com/pachterlab/kallisto) format using
53+
[telescope](https://github.com/tmaklin/telescope) (supplied with the msweep-assembly installation).
54+
```
55+
mkdir outfolder
56+
3557
ntargets=$(sort themisto_index/coloring-names.txt | uniq | wc -l)
3658
telescope --n-refs $ntargets -r pseudoalignments_1.txt,pseudoalignments_2.txt -o outfolder --mode intersection
3759
```
3860

39-
Create a fake kallisto-style run_info.json file
61+
Create a fake kallisto-style run_info.json file using the
62+
Themisto_run_info.sh script in the root directory of this project
4063
```
41-
Themisto_run_info.sh $(wc -l outfolder_1.txt) $ntargets > outfolder/run_info.json
64+
Themisto_run_info.sh $(wc -l < pseudoalignments_1.txt) $ntargets > outfolder/run_info.json
4265
```
4366

4467
Determine read assignments to equivalence classes from the kallisto
4568
format files
4669
```
47-
read_alignment -e outfolder/outfolder.ec -s outfolder/read-to-ref.txt -o outfolder --write-ecs --themisto --n-refs $ntargets --gzip-output
70+
read_alignment -e outfolder/pseudoalignments.ec -s outfolder/read-to-ref.txt -o outfolder --write-ecs --themisto --n-refs $ntargets --gzip-output
4871
```
4972

50-
Estimate the relative abundances with mSWEEP
73+
Estimate the relative abundances with mSWEEP (reference_grouping.txt
74+
should contain the groups the sequences in 'example.fasta' are
75+
assigned to. See the [mSWEEP](https://github.com/probic/msweep-assembly) usage instructions for details).
5176
```
5277
mSWEEP -f outfolder -i reference_grouping.txt -o msweep-out --write-probs --gzip-probs
5378
```
5479

55-
Extract the names of the 3 most abundant reference groups
80+
(Optional) Extract the names of the 3 most abundant reference
81+
groups.
5682
```
5783
grep -v "^[#]" msweep-out_abundances.txt | sort -rgk2 | cut -f1 | head -n3 > most_abundant_groups.txt
5884
```
85+
If you use a more refined method or know which reference groups (as
86+
specified in the reference_grouping.txt file) you want to assemble,
87+
put their names in a .txt file where each line corresponds to a
88+
cluster name instead.
5989

6090
Assign reads to the 3 most abundant reference groups based on the estimated probabilities
6191
```
@@ -66,6 +96,9 @@ Construct the binned samples from the original files
6696

6797
```
6898
while read -r sample; do
69-
build_sample -a outfolder/$sample\"\"_reads.txt.gz -o outfolder/$sample -1 reads_1.fastq.gz -2 reads_2.fastq.gz --gzip-output
99+
build_sample -a outfolder/$sample""_reads.txt.gz -o outfolder/$sample -1 reads_1.fastq.gz -2 reads_2.fastq.gz --gzip-output
70100
done < most_abundant_groups.txt
71101
```
102+
This will create the <group name>_1.fastq.gz and <group
103+
name>_2.fastq.gz files in the outfolder, which you can assemble with
104+
your assembler of choice.

Themisto_run_info.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
echo "{
2+
"n_targets": $2,
3+
"n_bootstraps": 0,
4+
"n_processed": $1,
5+
"kallisto_version": "0.43.1",
6+
"index_version": 10,
7+
"start_time": "Tue Nov 5 16:19:25 2019",
8+
"call": "/proj/temaklin/kallisto/kallisto pseudo -i /wrk/users/temaklin/reference_msweep_preprint_all_removed -o /wrk/users/temaklin/splits/ERR434699 /wrk/users/temaklin/msweep_reads/reads/ERR434699_1.fastq.gz /wrk/users/temaklin/msweep_reads/reads/ERR434699_2.fastq.gz"
9+
}"

external/cxxargs

0 commit comments

Comments
 (0)