Skip to content

Commit 6380bff

Browse files
committed
Fix readme
1 parent f0c0c4f commit 6380bff

File tree

1 file changed

+86
-39
lines changed

1 file changed

+86
-39
lines changed

README.md

Lines changed: 86 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@
66

77
BinSPreader is a novel tool that attempts to refine metagenome-assembled genomes (MAGs) obtained from existing tools. BinSPreader exploits the assembly graph topology and other connectivity information, such as paired-end and Hi-C reads, to refine the existing binning, correct binning errors, propagate binning from longer contigs to shorter contigs and infer contigs belonging to multiple bins.
88

9+
### Dependencies
10+
11+
- g++ (version 5.3.1 or higher)
12+
- cmake (version 3.12 or higher)
13+
- zlib
14+
- libbz2
15+
916
### Installation
1017

1118
```
@@ -15,50 +22,89 @@ make bin-refine
1522
```
1623
Now to run BinSPreader move to folder `assembler/` and execute
1724

18-
`build/bin/hicspades-binner`
25+
`build/bin/bin-refine`
1926

2027
### Input
2128

22-
The tool has two mandatory options:
23-
- Assembly graph file in [GFA 1.0 format](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md), with scaffolds included as path lines. Alternatively, scaffolds can be provided separately using `--path` option.
24-
-Initial
29+
The tool requires initial binning to refine, as well as assembly graph as a source of information for refining. Optionally, BinSPreader can be provided with multiple Hi-C and/or paired-end libraries.
30+
31+
Required positional arguments:
32+
33+
- Assembly graph file in [GFA 1.0 format](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md), with scaffolds included as path lines. Alternatively, scaffold paths can be provided separately using `--path` option in the `.paths` format accepted by Bandage (see [Bandage wiki](https://github.com/rrwick/Bandage/wiki/Graph-paths) for details).
34+
- Binning output from an existing tool (in `.tsv` format)
35+
36+
Synopsis: `bin-refine <graph (in GFA)> <binning (in .tsv)> <output directory> [OPTION...]`
37+
38+
Main options:
2539

26-
Synopsis: `hicspades-binner <graph (in GFA)> <dataset description (in YAML)> <output directory> [OPTION...]`
40+
- `--paths` provide contigs paths from file separately from GFA
41+
- `--dataset` Dataset in YAML format (see #yaml) describing Hi-C and paired-end reads
2742

28-
The options are:
43+
- `-l` L Library index (0-based, default: 0). Only the library specified by this index will be used.
44+
- `-t` T # of threads to use (default: 1/2 of available threads)
45+
- `-e` E convergence relative tolerance threshold (default: 1e-5)
46+
- `-n` ITERATIONS maximum number of iterations (default: 5000)
47+
- `-m` allow multiple bin assignment (defalut: false)
48+
- `-Smax|-Smle` simple maximum or maximum likelihood binning assignment strategy (default: max likelihood)
49+
- `-Rcorr|-Rprop` Select propagation or correction mode (default: correction)
50+
- `--cami` use CAMI bioboxes binning format
51+
- `--zero-bin` emit zero bin for unbinned sequences
52+
- `--tall-multi` use tall table for multiple binning result
53+
- `--bin-dist` estimate pairwise bin distance (could be slow on large graphs!)
54+
- `-la` LA labels correction regularization parameter for labeled data (default: 0.6)
2955

30-
`-t, --threads <int> `
31-
# of threads to use
56+
Sparse propagation options:
57+
- `--sparse-propagation` Gradually reduce regularization parameter from binned to unbinned edges. Recommended for sparse binnings with low assembly fraction.
58+
- `--no-unbinned-bin` Do not create a special bin for unbinned contigs. More agressive strategy.
59+
- `-ma, --metaalpha` Regularization parameter for sparse propagation procedure. Increase/decrease for more agressive/conservative refining (default: 0.6)
60+
- `-lt, --length-threshold` LENGTH_THRESHOLD Binning will not be propagated to edges longer than threshold
61+
- `-db' --distance-bound` DISTANCE_BOUND Binning will not be propagated further than bound from initially binned edges
3262

33-
`-e, --enzymes <string> `
34-
Comma-separated string of restriction enzyme recognition sites
63+
Read splitting options:
64+
- `-r, --reads` Split reads according to binning. Can be used for reassembly.
65+
- `-b, --bin-weight` BIN_WEIGHT Reads bin weight threshold (default: 0.1).
3566

36-
`--tmp-dir <dir name> `
37-
scratch directory to use
67+
Developer options:
68+
- `--bin-load` Load binary-converted reads from tmpdir
69+
- `--debug` produce lots of debug data
70+
- `--tmp-dir` TMP_DIR scratch directory to use
71+
- `-h, --help ` print help message
3872

39-
`--min-ctg-len <int> `
40-
Minimum contig length for binning
73+
### BinSPreader output
4174

42-
`--path-links-thr <int> `
43-
Minimum total number of links between contigs
75+
BinSPreader stores all output files in output directory `<output_dir> `, which is set by the user.
4476

45-
`--edge-links-thr <int>`
46-
Minimum number of links between long edges
77+
- `<output_dir>/binning.tsv` contains refined binning in `.tsv` format
78+
- `<output_dir>/bin_stats.tsv` contains various per-bin statistics
79+
- `<output_dir>/bin_weights.tsv` contains soft bin weights per contig
80+
- `<output_dir>/edge_weights.tsv` contains soft bin weights per edge
4781

48-
`-h, --help `
49-
print help message
82+
In addition
83+
84+
- `<output_dir>/bin_dist.tsv` contains refined bin distance matrix (if `--bin-dist` was used)
85+
- `<output_dir>/bin_label_1.fastq, <output_dir>/bin_label_2.fastq` read set for bin labeled by `bin_label` (if `--reads` was used)
86+
- `<output_dir>/pe_links.tsv` list of paired-end links between assembly graph edges with weights (if `--debug` was used)
87+
- `<output_dir>/graph_links.tsv` list of graph links between assembly graph edges with weights (if `--debug` was used)
5088

5189
<a name="yaml"></a>
5290
**_Specifying input data with YAML data set file_**
5391

54-
hicSPAdes-binner currently supports a single Hi-C library described in a YAML file. For example, if your Hi-C library is split into two pairs of files
92+
BinSPreader currently supports multiple paired-end or Hi-C libraries described in a YAML file. For example, if you have one paired-end library split into two sets of files
5593

5694
``` bash
5795

58-
lib_hic_left_1.fastq
59-
lib_hic_right_1.fastq
60-
lib_hic_left_2.fastq
61-
lib_hic_right_2.fastq
96+
lib_pe1_left_1.fastq
97+
lib_pe1_right_1.fastq
98+
lib_pe1_left_2.fastq
99+
lib_pe1_right_2.fastq
100+
```
101+
102+
and one Hi-C library
103+
104+
``` bash
105+
106+
lib_hic1_left.fastq
107+
lib_hic1_right.fastq
62108
```
63109

64110
YAML file should look like this:
@@ -68,24 +114,25 @@ YAML file should look like this:
68114
[
69115
{
70116
orientation: "fr",
71-
type: "hic",
117+
type: "paired-end",
118+
right reads: [
119+
"/FULL_PATH_TO_DATASET/lib_pe1_right_1.fastq",
120+
"/FULL_PATH_TO_DATASET/lib_pe1_right_2.fastq"
121+
],
122+
left reads: [
123+
"/FULL_PATH_TO_DATASET/lib_pe1_left_1.fastq",
124+
"/FULL_PATH_TO_DATASET/lib_pe1_left_2.fastq"
125+
]
126+
},
127+
{
128+
orientation: "fr",
129+
type: "paired-end",
72130
right reads: [
73-
"/FULL_PATH_TO_DATASET/lib_hic_right_1.fastq",
74-
"/FULL_PATH_TO_DATASET/lib_hic_right_2.fastq"
131+
"/FULL_PATH_TO_DATASET/lib_hic1_right.fastq"
75132
],
76133
left reads: [
77-
"/FULL_PATH_TO_DATASET/lib_hic_left_1.fastq",
78-
"/FULL_PATH_TO_DATASET/lib_hic_left_2.fastq"
134+
"/FULL_PATH_TO_DATASET/lib_hic1_left.fastq"
79135
]
80136
}
81137
]
82138
```
83-
84-
### Output
85-
86-
hicSPAdes-binner stores all output files in `<output_dir> `, which is set by the user.
87-
88-
- `<output_dir>/clustering.mcl` contains resulting scaffold clustering in MCL format
89-
- `<output_dir>/clustering.tsv` contains resulting scaffold clustering in TSV format
90-
- `<output_dir>/basic_stats.tsv` contains various per-cluster statistics
91-
- `<output_dir>/contact_map.tsv` contains hicSPAdes scores between input scaffolds, as well as other scaffold statistics

0 commit comments

Comments
 (0)