You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-57Lines changed: 34 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1

2
2
3
-
# `RabbitTClust v.2.1.0`
3
+
# `RabbitTClust v.2.2.0`
4
4
RabbitTClust is a fast and memory-efficient genome clustering tool based on sketch-based distance estimations.
5
5
It enables processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms.
6
6
RabbitTClust supports classical single-linkage hierarchical (clust-mst) and greedy incremental clustering (clust-greedy) algorithms for different scenarios.
7
7
8
8
## Installation
9
-
RabbitTClust version 2.1.0 can only support 64-bit Linux Systems.
9
+
`RabbitTClust v.2.2.0` can only support 64-bit Linux Systems.
10
10
11
11
The detailed update information for this version, as well as the version history, can be found in the [`version_history`](version_history/history.md) document.
12
12
@@ -15,45 +15,12 @@ The detailed update information for this version, as well as the version history
# clust-mst, minimum-spanning-tree-based module for RabbitTClust
@@ -62,65 +29,75 @@ Options:
62
29
-h,--help Print this help message and exit
63
30
-t,--threads INT set the thread number, default all CPUs of the platform
64
31
-m,--min-length UINT set the filter minimum length (minLen), genome length less than minLen will be ignore, default 10,000
65
-
-c,--containment INT use AAF distance with containment coefficient, set the containCompress, the sketch size is in proportion with 1/containCompress
66
-
-k,--kmer-size INT set the kmer size
32
+
-c,--containment INT use AAF distance with containment coefficient, set the containCompress, the sketch size is in proportion with 1/containCompress -k,--kmer-size INT set the kmer size
67
33
-s,--sketch-size INT set the sketch size for Jaccard Index and Mash distance, default 1000
68
-
-l,--inputlist input is genome list, one genome per line
34
+
-l,--list input is genome list, one genome per line
69
35
-e,--no-save not save the intermediate files, such as sketches or MST
70
36
-d,--threshold FLOAT set the distance threshold for clustering
71
37
-F,--function TEXT set the sketch function, such as MinHash, KSSD, default MinHash
72
38
-o,--output TEXT REQUIRED set the output name of cluster result
73
-
-i,--input TEXT set the input file
39
+
-i,--input TEXT Excludes: --append
40
+
set the input file, single FASTA genome file (without -l option) or genome list file (with -l option)
74
41
--presketched TEXT clustering by the pre-generated sketch files rather than genomes
75
42
--premsted TEXT clustering by the pre-generated mst files rather than genomes for clust-mst
43
+
--append TEXT Excludes: --input
44
+
append genome file or file list with the pre-generated sketch or MST files
76
45
77
46
# clust-greedy, greedy incremental clustering module for RabbitTClust
78
47
Usage: ./clust-greedy [OPTIONS]
79
48
Options:
80
49
-h,--help Print this help message and exit
81
50
-t,--threads INT set the thread number, default all CPUs of the platform
82
51
-m,--min-length UINT set the filter minimum length (minLen), genome length less than minLen will be ignore, default 10,000
83
-
-c,--containment INT use AAF distance with containment coefficient, set the containCompress, the sketch size is in proportion with 1/containCompress
84
-
-k,--kmer-size INT set the kmer size
52
+
-c,--containment INT use AAF distance with containment coefficient, set the containCompress, the sketch size is in proportion with 1/containCompress -k,--kmer-size INT set the kmer size
85
53
-s,--sketch-size INT set the sketch size for Jaccard Index and Mash distance, default 1000
86
-
-l,--inputlist input is genome list, one genome per line
54
+
-l,--list input is genome list, one genome per line
87
55
-e,--no-save not save the intermediate files, such as sketches or MST
88
56
-d,--threshold FLOAT set the distance threshold for clustering
89
57
-F,--function TEXT set the sketch function, such as MinHash, KSSD, default MinHash
90
58
-o,--output TEXT REQUIRED set the output name of cluster result
91
-
-i,--input TEXT set the input file
59
+
-i,--input TEXT Excludes: --append
60
+
set the input file, single FASTA genome file (without -l option) or genome list file (with -l option)
92
61
--presketched TEXT clustering by the pre-generated sketch files rather than genomes
62
+
--append TEXT Excludes: --input
63
+
append genome file or file list with the pre-generated sketch or MST files
Copy file name to clipboardExpand all lines: src/MST.cpp
+6-4Lines changed: 6 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -171,11 +171,12 @@ vector<int> getNoiseNode(vector<PairInt> densePairArr, int alpha){
171
171
return noiseArr;
172
172
}
173
173
174
-
vector<EdgeInfo> modifyMST(vector<SketchInfo>& sketches, int sketch_func_id, int threads, int** &denseArr, int denseSpan, uint64_t* &aniArr, string prefixName, double threshold){
174
+
175
+
176
+
vector<EdgeInfo> modifyMST(vector<SketchInfo>& sketches, int start_index, int sketch_func_id, int threads, int** &denseArr, int denseSpan, uint64_t* &aniArr){
175
177
//int denseSpan = 10;
176
178
double step = 1.0 / denseSpan;
177
179
178
-
179
180
//double step = threshold / denseSpan;
180
181
//cerr << "the threshold is: " << threshold << endl;
181
182
//cerr << "the step is: " << step << endl;
@@ -211,13 +212,14 @@ vector<EdgeInfo> modifyMST(vector<SketchInfo>& sketches, int sketch_func_id, int
Copy file name to clipboardExpand all lines: src/MST.h
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,9 @@ std::vector<EdgeInfo> kruskalAlgorithm(std::vector<EdgeInfo>graph, int vertices)
44
44
45
45
vector<EdgeInfo> generateMST(vector<SketchInfo>& sketches, string sketchFunc, int threads);
46
46
47
-
vector<EdgeInfo> modifyMST(vector<SketchInfo>& sketches, int sketch_func_id, int threads, int** &denseArr, int denseSpan, uint64_t* &aniArr, string prefixName, double threshold);
47
+
vector<EdgeInfo> append_MST(vector<SketchInfo>& pre_sketches, vector<SketchInfo>& append_sketches, int sketch_func_id, int threads, int ** &denseArr, int denseSpan, uint64_t* &aniArr);
48
+
49
+
vector<EdgeInfo> modifyMST(vector<SketchInfo>& sketches, int start_index, int sketch_func_id, int threads, int** &denseArr, int denseSpan, uint64_t* &aniArr);
boolsketchSequences(string inputFile, int kmerSize, int sketchSize, int minLen, string sketchFunc, bool isContainment, int containCompress, vector<SketchInfo>& sketches, int threads);
0 commit comments