Skip to content

Commit bcbb191

Browse files
committed
v0.2.1
1 parent ba71a02 commit bcbb191

File tree

61 files changed

+1730
-220
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1730
-220
lines changed

README.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ format. And the name was remained after adding ***seamless support for FASTA/Q f
1212
## Introduction
1313

1414
FASTA and FASTQ are basic formats for storing nucleotide and protein sequences.
15-
The manipulations of FASTA/Q file includes converting, clipping, searching,
15+
The manipulations of FASTA/Q file include converting, clipping, searching,
1616
filtering, deduplication, splitting, shuffling, sampling and so on.
1717
Existed tools only implemented parts of the functions,
1818
and some of them are only available for specific operating systems.
@@ -22,7 +22,8 @@ running environment also make them less friendly to common users.
2222
fakit is a cross-platform, efficient, and practical FASTA/Q manipulations tool
2323
that is friendly for researchers to complete wide ranges of FASTA file processing.
2424
The suite supports plain or gzip-compressed input and output
25-
from either standard stream or files, therefore, it could be easily used in pipelines.
25+
from either standard stream or files,
26+
therefore, it could be easily used in command-line pipe.
2627

2728
## Features
2829

@@ -224,10 +225,10 @@ Most of the subcommands do not read whole FASTA/Q records in to memory,
224225
including `stat`, `fq2fa`, `fx2tab`, `tab2fx`, `grep`, `locate`, `replace`,
225226
`seq`, `sliding`, `subseq`. They just temporarily buffer chunks of records.
226227

227-
However when handling big sequences, e.g. human genome, the memory is high
228+
However when handling big sequences, e.g. Human genome, the memory is high
228229
(2-3 GB) even the buffer size is 1.
229-
This is due to the limitation of Go programming language, it may be solved
230-
in the future.
230+
This is due to the limitation of garbage collection mechanism in
231+
Go programming language, it may be solved in the future.
231232

232233
Note that when using `subseq --gtf | --bed`, if the GTF/BED files are too
233234
big, the memory usage will increase.

benchmark/.Rhistory

Whitespace-only changes.

benchmark/README.md

+20-10
Original file line numberDiff line numberDiff line change
@@ -9,22 +9,18 @@ Datasets and results are described at [http://shenwei356.github.io/fakit/benchma
99
Softwares
1010

1111
1. [fakit](https://github.com/shenwei356/fakit). (Go).
12-
Version [v0.1.9](https://github.com/shenwei356/fakit/releases/tag/v0.1.9).
12+
Version [v0.2.1](https://github.com/shenwei356/fakit/releases/tag/v0.2.1).
1313
1. [fasta_utilities](https://github.com/jimhester/fasta_utilities). (Perl).
1414
Version [3dcc0bc](https://github.com/jimhester/fasta_utilities/tree/3dcc0bc6bf1e97839476221c26984b1789482579).
1515
Lots of dependencies to install_.
1616
1. [fastx_toolkit](http://hannonlab.cshl.edu/fastx_toolkit/). (Perl).
1717
Version [0.0.13](http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2).
1818
Can't handle multi-line FASTA files_.
19-
1. [seqmagick](http://seqmagick.readthedocs.org/en/latest/index.html). (Python).
19+
1. [seqmagick](http://seqmagick.readthedocs.io/en/latest/index.html#installation). (Python).
2020
Version 0.6.1
2121
1. [seqtk](https://github.com/lh3/seqtk). (C).
22-
Version [1.0-r82-dirty](https://github.com/lh3/seqtk/commit/4feb6e81444ab6bc44139dd3a125068f81ae4ad8).
22+
Version [1.1-r92-dirty](https://github.com/lh3/seqtk/tree/fb85aad4ce1fc7b3d4543623418a1ae88fe1cea6).
2323

24-
Not used:
25-
26-
1. [pyfaidx](https://github.com/mdshw5/pyfaidx). (Python).
27-
Version [0.4.7.1](https://pypi.python.org/packages/source/p/pyfaidx/pyfaidx-0.4.7.1.tar.gz#md5=f33604a3550c2fa115ac7d33b952127d). *Not used, because it
2824

2925
A Python script [memusg](https://github.com/shenwei356/memusg) was used
3026
to computate running time and peak memory usage of a process.
@@ -45,10 +41,22 @@ The edited code is
4541
if $config{bar_width} < 1;
4642
}
4743

44+
## Clone this repository
45+
46+
git clone https://github.com/shenwei356/fakit
47+
cd fakit/benchmark
48+
4849
## Data preparation
4950

5051
[http://shenwei356.github.io/fakit/benchmark/#datasets](http://shenwei356.github.io/fakit/benchmark/#datasets)
5152

53+
Or download all test data [fakit-benchmark-data.tar.gz](http://bioinf.shenwei.me/fakit-benchmark-data.tar.gz)
54+
(1.7G) and uncompress it, and then move them into directory `fakit/benchmark`
55+
56+
wget ***
57+
tar -zxvf fakit-benchmark-data.tar.gz
58+
mv fakit-benchmark-data/* fakit/benchmark
59+
5260
## Run tests
5361

5462
A Perl scripts
@@ -76,6 +84,8 @@ To compare performance between different softwares, run:
7684

7785
./run.pl run_benchmark*.sh -n 3 -o benchmark.5tests.csv
7886

87+
It costed ~50min for me.
88+
7989
To test performance of other functions in fakit, run:
8090

8191
./run.pl run_test*.sh -n 1 -o benchmark.fakit.csv
@@ -86,8 +96,8 @@ R libraries `dplyr`, `ggplot2`, `scales`, `ggthemes`, `ggrepel` are needed.
8696

8797
Plot for result of the five tests:
8898

89-
./plot2.R -i benchmark.5tests.csv
99+
./plot.R -i benchmark.5tests.csv
90100

91-
Plot for result of the stest of other functions in fakit:
101+
Plot for result of the tests of other functions in fakit:
92102

93-
./plot2.R -i benchmark.fakit.csv --width 5 --height 3
103+
./plot.R -i benchmark.fakit.csv --width 5 --height 3

benchmark/benchmark.5tests.csv

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Reverse complement dataset_A.fa biogo 100.78 1.33 35836 1226
3+
Reverse complement dataset_A.fa fakit 30.53 1.03 47684 1994
4+
Reverse complement dataset_A.fa fasta_utilities 18.87 1.36 58298 859
5+
Reverse complement dataset_A.fa seqmagick 59.68 1.56 49754 1493
6+
Reverse complement dataset_A.fa seqtk 10.06 0.53 7629 18
7+
Reverse complement dataset_B.fa biogo 93.65 2.18 1315030 53727
8+
Reverse complement dataset_B.fa fakit 26.36 1.29 1793784 75222
9+
Reverse complement dataset_B.fa fasta_utilities 27.79 3.25 1256350 92
10+
Reverse complement dataset_B.fa seqmagick 65.42 1.91 1422250 73040
11+
Reverse complement dataset_B.fa seqtk 12.25 0.95 244870 57
12+
Searching by ID list dataset_A.fa fakit 12.23 0.21 53065 2711
13+
Searching by ID list dataset_A.fa fasta_utilities 9.96 0.01 54824 3379
14+
Searching by ID list dataset_A.fa seqmagick 46.22 0.19 42166 932
15+
Searching by ID list dataset_A.fa seqtk 11.72 0.44 9954 60
16+
Searching by ID list dataset_B.fa fakit 12.02 0.06 1695814 1128
17+
Searching by ID list dataset_B.fa fasta_utilities 12.60 0.07 1256420 67
18+
Searching by ID list dataset_B.fa seqmagick 53.86 1.12 973556 36867
19+
Searching by ID list dataset_B.fa seqtk 14.22 0.27 244886 64
20+
Sampling by number dataset_A.fa fakit 28.30 0.39 44261 3344
21+
Sampling by number dataset_A.fa seqmagick 40.88 1.07 541172 912
22+
Sampling by number dataset_A.fa seqtk 4.25 0.09 1081468 1295
23+
Sampling by number dataset_B.fa fakit 31.31 0.66 1558248 149
24+
Sampling by number dataset_B.fa seqmagick 42.20 2.10 3036372 108690
25+
Sampling by number dataset_B.fa seqtk 4.99 0.11 2817700 3
26+
Removing duplicates by seq dataset_A.fa fakit 19.86 0.82 61324 2426
27+
Removing duplicates by seq dataset_A.fa seqmagick 79.47 1.87 60590 576
28+
Removing duplicates by seq dataset_B.fa fakit 16.50 0.78 1927258 133969
29+
Removing duplicates by seq dataset_B.fa seqmagick 90.29 0.67 1123858 28
30+
Subsequence with BED file dataset_B.fa fakit 9.70 0.23 2081481 163038
31+
Subsequence with BED file dataset_B.fa seqtk 6.72 0.06 246277 33

benchmark/benchmark.5tests.csv.png

-64 Bytes
Loading

benchmark/benchmark.fakit.csv

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Shuffling dataset_A.fa fakit 17.70 0.00 2843664 0
3+
Shuffling dataset_B.fa fakit 14.44 0.00 3127368 0
4+
Sorting by length dataset_A.fa fakit 18.77 0.00 2888056 0
5+
Sorting by length dataset_B.fa fakit 14.23 0.00 3254916 0

benchmark/benchmark.fakit.csv.png

-327 Bytes
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Reverse complement dataset_A.fa 1 38.27 0.00 45544 0
3+
Reverse complement dataset_A.fa 2 31.23 0.00 45876 0
4+
Reverse complement dataset_A.fa 3 30.22 0.00 42036 0
5+
Reverse complement dataset_A.fa 4 32.62 0.00 48744 0
6+
Reverse complement dataset_B.fa 1 38.92 0.00 1925872 0
7+
Reverse complement dataset_B.fa 2 25.62 0.00 1833360 0
8+
Reverse complement dataset_B.fa 3 25.02 0.00 1813708 0
9+
Reverse complement dataset_B.fa 4 24.99 0.00 1925272 0
10+
Searching by ID list dataset_A.fa 1 13.28 0.00 53604 0
11+
Searching by ID list dataset_A.fa 2 13.01 0.00 52312 0
12+
Searching by ID list dataset_A.fa 3 13.02 0.00 52572 0
13+
Searching by ID list dataset_A.fa 4 13.41 0.00 56628 0
14+
Searching by ID list dataset_B.fa 1 12.37 0.00 1665156 0
15+
Searching by ID list dataset_B.fa 2 12.06 0.00 1585708 0
16+
Searching by ID list dataset_B.fa 3 12.14 0.00 1912948 0
17+
Searching by ID list dataset_B.fa 4 11.83 0.00 1912032 0
18+
Sampling by number dataset_A.fa 1 33.23 0.00 47588 0
19+
Sampling by number dataset_A.fa 2 28.75 0.00 44032 0
20+
Sampling by number dataset_A.fa 3 29.75 0.00 42520 0
21+
Sampling by number dataset_A.fa 4 30.27 0.00 47688 0
22+
Sampling by number dataset_B.fa 1 36.81 0.00 1869772 0
23+
Sampling by number dataset_B.fa 2 31.51 0.00 1558392 0
24+
Sampling by number dataset_B.fa 3 32.99 0.00 1536872 0
25+
Sampling by number dataset_B.fa 4 31.83 0.00 1604788 0
26+
Removing duplicates by seq content dataset_A.fa 1 21.93 0.00 65416 0
27+
Removing duplicates by seq content dataset_A.fa 2 19.45 0.00 59204 0
28+
Removing duplicates by seq content dataset_A.fa 3 20.12 0.00 59924 0
29+
Removing duplicates by seq content dataset_A.fa 4 19.34 0.00 59940 0
30+
Removing duplicates by seq content dataset_B.fa 1 20.16 0.00 1703012 0
31+
Removing duplicates by seq content dataset_B.fa 2 16.22 0.00 1780832 0
32+
Removing duplicates by seq content dataset_B.fa 3 18.64 0.00 2011904 0
33+
Removing duplicates by seq content dataset_B.fa 4 16.33 0.00 2150284 0
34+
Subsequence with BED file dataset_B.fa 1 14.63 0.22 2105462 1846
35+
Subsequence with BED file dataset_B.fa 2 9.87 0.05 2046796 105814
36+
Subsequence with BED file dataset_B.fa 3 9.61 0.29 2158104 146143
37+
Subsequence with BED file dataset_B.fa 4 8.85 0.27 2124036 177248
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Shuffling dataset_A.fa 1 17.47 0.00 2804292 0
3+
Shuffling dataset_A.fa 2 16.92 0.00 2853488 0
4+
Shuffling dataset_A.fa 3 16.88 0.00 2840376 0
5+
Shuffling dataset_A.fa 4 17.67 0.00 2814828 0
6+
Shuffling dataset_B.fa 1 13.78 0.00 3246272 0
7+
Shuffling dataset_B.fa 2 13.61 0.00 3266400 0
8+
Shuffling dataset_B.fa 3 16.51 0.00 3278048 0
9+
Shuffling dataset_B.fa 4 13.74 0.00 3232840 0
10+
Sorting by length dataset_A.fa 1 18.27 0.00 2853268 0
11+
Sorting by length dataset_A.fa 2 21.88 0.00 2856084 0
12+
Sorting by length dataset_A.fa 3 19.02 0.00 2859320 0
13+
Sorting by length dataset_A.fa 4 19.27 0.00 2880252 0
14+
Sorting by length dataset_B.fa 1 14.72 0.00 3155324 0
15+
Sorting by length dataset_B.fa 2 17.35 0.00 3141520 0
16+
Sorting by length dataset_B.fa 3 14.43 0.00 3202260 0
17+
Sorting by length dataset_B.fa 4 17.66 0.00 3250852 0
Loading

benchmark/fakit_multi_threads/plot.R

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../plot.R

benchmark/fakit_multi_threads/plot2.R

-1
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../revcom_biogo

benchmark/fakit_multi_threads/run.pl

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../run.pl

benchmark/fakit_multi_threads/run_benchmark_00_all.pl

-1
This file was deleted.

benchmark/fakit_multi_threads/run_benchmark_00_all.pl.benchmark.csv

-1
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/bin/sh
2+
3+
echo Test: Shuffling
4+
5+
echo warm-up
6+
for f in dataset_{A,B}.fa; do echo data: $f; cat $f > /dev/null; done
7+
8+
9+
NCPUs=$(grep -c processor /proc/cpuinfo)
10+
for i in $(seq 1 $NCPUs); do
11+
echo == $i
12+
echo recreate FASTA index file
13+
for f in dataset_{A,B}.fa; do
14+
if [[ -f $f.fakit.fai ]]; then
15+
/bin/rm $f.fakit.fai
16+
# fakit faidx $f --id-regexp "^(.+)$" -o $f.fakit.fai;
17+
fi;
18+
done
19+
20+
for f in dataset_{A,B}.fa; do
21+
echo data: $f;
22+
memusg -t -H fakit shuffle -2 $f > $f.fakit.shuffle;
23+
# fakit stat $f.fakit.rc;
24+
/bin/rm $f.fakit.shuffle;
25+
done
26+
done
27+
28+
29+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/bin/sh
2+
3+
echo Test: Sorting by length
4+
5+
echo warm-up
6+
for f in dataset_{A,B}.fa; do echo data: $f; cat $f > /dev/null; done
7+
8+
9+
NCPUs=$(grep -c processor /proc/cpuinfo)
10+
for i in $(seq 1 $NCPUs); do
11+
echo == $i
12+
echo delete old FASTA index file
13+
for f in dataset_{A,B}.fa; do
14+
if [[ -f $f.fakit.fai ]]; then
15+
/bin/rm $f.fakit.fai
16+
# fakit faidx $f --id-regexp "^(.+)$" -o $f.fakit.fai;
17+
fi;
18+
done
19+
20+
for f in dataset_{A,B}.fa; do
21+
echo data: $f;
22+
memusg -t -H fakit sort -l -2 $f > $f.fakit.sort;
23+
# fakit stat $f.fakit.rc;
24+
/bin/rm $f.fakit.sort;
25+
done
26+
done
27+
28+
29+
30+
31+
32+
33+
34+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Reverse complement dataset_A.fa 1 38.27 0.00 45544 0
3+
Reverse complement dataset_A.fa 2 31.23 0.00 45876 0
4+
Reverse complement dataset_A.fa 3 30.22 0.00 42036 0
5+
Reverse complement dataset_A.fa 4 32.62 0.00 48744 0
6+
Reverse complement dataset_B.fa 1 38.92 0.00 1925872 0
7+
Reverse complement dataset_B.fa 2 25.62 0.00 1833360 0
8+
Reverse complement dataset_B.fa 3 25.02 0.00 1813708 0
9+
Reverse complement dataset_B.fa 4 24.99 0.00 1925272 0
10+
Searching by ID list dataset_A.fa 1 13.28 0.00 53604 0
11+
Searching by ID list dataset_A.fa 2 13.01 0.00 52312 0
12+
Searching by ID list dataset_A.fa 3 13.02 0.00 52572 0
13+
Searching by ID list dataset_A.fa 4 13.41 0.00 56628 0
14+
Searching by ID list dataset_B.fa 1 12.37 0.00 1665156 0
15+
Searching by ID list dataset_B.fa 2 12.06 0.00 1585708 0
16+
Searching by ID list dataset_B.fa 3 12.14 0.00 1912948 0
17+
Searching by ID list dataset_B.fa 4 11.83 0.00 1912032 0
18+
Sampling by number dataset_A.fa 1 33.23 0.00 47588 0
19+
Sampling by number dataset_A.fa 2 28.75 0.00 44032 0
20+
Sampling by number dataset_A.fa 3 29.75 0.00 42520 0
21+
Sampling by number dataset_A.fa 4 30.27 0.00 47688 0
22+
Sampling by number dataset_B.fa 1 36.81 0.00 1869772 0
23+
Sampling by number dataset_B.fa 2 31.51 0.00 1558392 0
24+
Sampling by number dataset_B.fa 3 32.99 0.00 1536872 0
25+
Sampling by number dataset_B.fa 4 31.83 0.00 1604788 0
26+
Removing duplicates by seq content dataset_A.fa 1 21.93 0.00 65416 0
27+
Removing duplicates by seq content dataset_A.fa 2 19.45 0.00 59204 0
28+
Removing duplicates by seq content dataset_A.fa 3 20.12 0.00 59924 0
29+
Removing duplicates by seq content dataset_A.fa 4 19.34 0.00 59940 0
30+
Removing duplicates by seq content dataset_B.fa 1 20.16 0.00 1703012 0
31+
Removing duplicates by seq content dataset_B.fa 2 16.22 0.00 1780832 0
32+
Removing duplicates by seq content dataset_B.fa 3 18.64 0.00 2011904 0
33+
Removing duplicates by seq content dataset_B.fa 4 16.33 0.00 2150284 0
34+
Subsequence with BED file dataset_B.fa 1 14.63 0.22 2105462 1846
35+
Subsequence with BED file dataset_B.fa 2 9.87 0.05 2046796 105814
36+
Subsequence with BED file dataset_B.fa 3 9.61 0.29 2158104 146143
37+
Subsequence with BED file dataset_B.fa 4 8.85 0.27 2124036 177248
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Shuffling dataset_A.fa 1 17.47 0.00 2804292 0
3+
Shuffling dataset_A.fa 2 16.92 0.00 2853488 0
4+
Shuffling dataset_A.fa 3 16.88 0.00 2840376 0
5+
Shuffling dataset_A.fa 4 17.67 0.00 2814828 0
6+
Shuffling dataset_B.fa 1 13.78 0.00 3246272 0
7+
Shuffling dataset_B.fa 2 13.61 0.00 3266400 0
8+
Shuffling dataset_B.fa 3 16.51 0.00 3278048 0
9+
Shuffling dataset_B.fa 4 13.74 0.00 3232840 0
10+
Sorting by length dataset_A.fa 1 18.27 0.00 2853268 0
11+
Sorting by length dataset_A.fa 2 21.88 0.00 2856084 0
12+
Sorting by length dataset_A.fa 3 19.02 0.00 2859320 0
13+
Sorting by length dataset_A.fa 4 19.27 0.00 2880252 0
14+
Sorting by length dataset_B.fa 1 14.72 0.00 3155324 0
15+
Sorting by length dataset_B.fa 2 17.35 0.00 3141520 0
16+
Sorting by length dataset_B.fa 3 14.43 0.00 3202260 0
17+
Sorting by length dataset_B.fa 4 17.66 0.00 3250852 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Subsequence with BED file dataset_B.fa 1 14.63 0.22 2105462 1846
3+
Subsequence with BED file dataset_B.fa 2 9.87 0.05 2046796 105814
4+
Subsequence with BED file dataset_B.fa 3 9.61 0.29 2158104 146143
5+
Subsequence with BED file dataset_B.fa 4 8.85 0.27 2124036 177248
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Reverse complement dataset_A.fa 1 113.83 0.00 11740 0
3+
Reverse complement dataset_A.fa 2 124.43 0.00 12232 0
4+
Reverse complement dataset_A.fa 3 125.14 0.00 12088 0
5+
Reverse complement dataset_A.fa 4 126.08 0.00 12364 0
6+
Reverse complement dataset_B.fa 1 47.49 0.00 2026540 0
7+
Reverse complement dataset_B.fa 2 30.09 0.00 2002564 0
8+
Reverse complement dataset_B.fa 3 31.55 0.00 2254176 0
9+
Reverse complement dataset_B.fa 4 29.23 0.00 2462680 0
10+
Searching by ID list dataset_A.fa 1 174.38 0.00 1017628 0
11+
Searching by ID list dataset_A.fa 2 168.26 0.00 979172 0
12+
Searching by ID list dataset_A.fa 3 167.96 0.00 941308 0
13+
Searching by ID list dataset_A.fa 4 170.26 0.00 990276 0
14+
Searching by ID list dataset_B.fa 1 13.48 0.00 2250796 0
15+
Searching by ID list dataset_B.fa 2 11.50 0.00 2075996 0
16+
Searching by ID list dataset_B.fa 3 11.89 0.00 2445820 0
17+
Searching by ID list dataset_B.fa 4 11.72 0.00 2306508 0
18+
Sampling by number dataset_A.fa 1 92.26 0.00 12152 0
19+
Sampling by number dataset_A.fa 2 91.47 0.00 12248 0
20+
Sampling by number dataset_A.fa 3 95.17 0.00 12252 0
21+
Sampling by number dataset_A.fa 4 96.31 0.00 12132 0
22+
Sampling by number dataset_B.fa 1 34.73 0.00 2075620 0
23+
Sampling by number dataset_B.fa 2 29.57 0.00 2076784 0
24+
Sampling by number dataset_B.fa 3 31.20 0.00 1804840 0
25+
Sampling by number dataset_B.fa 4 31.60 0.00 2076920 0
26+
Removing duplicates by seq content dataset_A.fa 1 231.65 0.00 3428340 0
27+
Removing duplicates by seq content dataset_A.fa 2 229.12 0.00 3646984 0
28+
Removing duplicates by seq content dataset_A.fa 3 235.32 0.00 3451840 0
29+
Removing duplicates by seq content dataset_A.fa 4 241.94 0.00 2990240 0
30+
Removing duplicates by seq content dataset_B.fa 1 26.83 0.00 2322172 0
31+
Removing duplicates by seq content dataset_B.fa 2 18.62 0.00 2692244 0
32+
Removing duplicates by seq content dataset_B.fa 3 20.54 0.00 2836324 0
33+
Removing duplicates by seq content dataset_B.fa 4 19.50 0.00 2567764 0
34+
Subsequence with BED file dataset_B.fa 1 17.09 0.00 1931700 0
35+
Subsequence with BED file dataset_B.fa 2 11.56 0.00 1919372 0
36+
Subsequence with BED file dataset_B.fa 3 11.97 0.00 2027240 0
37+
Subsequence with BED file dataset_B.fa 4 10.55 0.00 1910492 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Reverse complement dataset_A.fa 1 38.27 0.00 45544 0
3+
Reverse complement dataset_A.fa 2 31.23 0.00 45876 0
4+
Reverse complement dataset_A.fa 3 30.22 0.00 42036 0
5+
Reverse complement dataset_A.fa 4 32.62 0.00 48744 0
6+
Reverse complement dataset_B.fa 1 38.92 0.00 1925872 0
7+
Reverse complement dataset_B.fa 2 25.62 0.00 1833360 0
8+
Reverse complement dataset_B.fa 3 25.02 0.00 1813708 0
9+
Reverse complement dataset_B.fa 4 24.99 0.00 1925272 0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
test dataset app time_mean time_stdev mem_mean mem_stdev
2+
Searching by ID list dataset_A.fa 1 13.28 0.00 53604 0
3+
Searching by ID list dataset_A.fa 2 13.01 0.00 52312 0
4+
Searching by ID list dataset_A.fa 3 13.02 0.00 52572 0
5+
Searching by ID list dataset_A.fa 4 13.41 0.00 56628 0
6+
Searching by ID list dataset_B.fa 1 12.37 0.00 1665156 0
7+
Searching by ID list dataset_B.fa 2 12.06 0.00 1585708 0
8+
Searching by ID list dataset_B.fa 3 12.14 0.00 1912948 0
9+
Searching by ID list dataset_B.fa 4 11.83 0.00 1912032 0

0 commit comments

Comments
 (0)