Releases: shenwei356/seqkit
Releases · shenwei356/seqkit
SeqKit v2.4.0
Changes
- SeqKit v2.4.0 - 2023-03-17
seqkit
:seqkit locate
:- do not remove embeded regions when searching with regular expressions. #368
seqkit amplicon
:- fix BED coordinates for amplicons found in the minus strand. #367
seqkit split
:- fix forgetting to add extension for
--two-pass
. #332
- fix forgetting to add extension for
seqkit stats
:- fix compute Q1 and Q3 of sequence length for one record. #353
seqkit grep
:- fix count number (
-C
) for matching with mismatch (-m > 0
). #370
- fix count number (
seqkit replace
:- add some flags to match partly records to edit; these flags are transplanted from
seqkit grep
. #348
- add some flags to match partly records to edit; these flags are transplanted from
seqkit faidx
:- allow empty lines at the end of sequences.
seqkit faidx/sort/shuffle/split/subseq
:seqkit seq
:- allow filtering sequences of length zero. thanks to @penglbio.
seqkit rename
:- new flag
-s/--separator
for setting separator between original ID/name and the counter (default "_"). #360 - new flag
-N/--start-num
for setting starting count number for duplicated IDs/names (default 2). #360 - new flag
-1/--rename-1st-rec
for renaming the first record as well. #360 - do not append space if there's no description after the sequene ID.
- new flag
seqkit sliding
:- new flag
-S/--suffix
for change the suffix added to the sequence ID (default: "_sliding").
- new flag
SeqKit v2.3.1
Changes
- SeqKit v2.3.1 - 2022-09-22
SeqKit v2.3.0
Changes
- SeqKit v2.3.0 - 2022-08-12
SeqKit v2.2.0
Changes
- SeqKit v2.2.0 - 2022-03-14
seqkit
:- add support of
zx
andzstd
input/output formats. #274 - fix panic when reading records with header of
ID
+ blanks.
- add support of
- new command
seqkit sum
: computing message digest for all sequences in FASTA/Q files.
The idea comes from @photocyte and the format borrows from seqhash #262 - new command
seqkit fa2fq
: retrieving corresponding FASTQ records by a FASTA file seqkit split2
:seqkit concat
:seqkit locate
:- parallelizing
-F/--use-fmi
and-m
for large number of search patterns.
- parallelizing
seqkit amplicon
:- new flag
-M/--output-mismatches
to append the total mismatches and mismatches of 5' end and 3' end. #286
- new flag
seqkit grep
:- detect FASTA/Q symbol
@
and>
in the searching patterns and show warnings. - add new flag
-C/--count
, likegrep -c
in GNU grep. #267
- detect FASTA/Q symbol
seqkit range
:- support removing leading 100 seqs (
seqkit range -r 101:-1
==tail -n +101
). #279
- support removing leading 100 seqs (
seqkit subseq
:- report error when no options were given.
- update doc:
SeqKit v2.1.0
Changelog
- SeqKit v2.1.0 - 2021-11-15
seqkit seq
:- fix filtering by average quality
-Q/-R
. #257
- fix filtering by average quality
seqkit convert
:seqkit split
:- fix writing an extra empty file when using
--two-pass
#244
- fix writing an extra empty file when using
seqkit subseq
:- fix
--bed
which fail to recognize strand.
.
- fix
seqkit fq2fa
:- faster, and do not wrap sequences.
seqkit grep/locate/mutate
:- detect unquoted comma and show warning message, e.g.,
-p 'A{2,}'
. #250
- detect unquoted comma and show warning message, e.g.,
SeqKit v2.0.0
Changelogs
- SeqKit v2.0.0 - 2021-08-27
- Performance improvements
seqkit
:- faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
- reading (plain text): 4X faster.
seqkit stats dataset_C.fq
- reading (gzip files): 45% faster.
seqkit stats dataset_C.fq.gz
- reading + writing (plain text): 3.5X faster.
seqkit grep -p . -v dataset_C.fq -o t
- reading + writing (gzip files): 2.2X faster.
seqkit grep -p . -v dataset_C.fq.gz -o t.gz
- reading (plain text): 4X faster.
- change default value of
-j/--threads
from 2 to 4, which is faster for writting gzip files.
- faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
seqkit seq
:- fix writing speed, which was slowed down in v0.12.1.
- Breaking changes
seqkit grep/rmdup/common
:- consider reverse complement sequence by default for comparing by sequence, add flag
-P/--only-positive-strand
. #215
- consider reverse complement sequence by default for comparing by sequence, add flag
seqkit rename
:- rename ID only, do not append original header to new ID. #236
seqkit fx2tab
:- for
-s/--seq-hash
: outputing MD5 instead of hash value (integers) of xxhash. #219
- for
- Bugfixes
- New features/enhancements
seqkit grep
:- allow empty pattern files.
seqkit faidx
:- support region with
begin > end
, i.e., returning reverse complement sequence - add new flag
-l/--region-file
: file containing a list of regions.
- support region with
seqkit fx2tab
:- new flag
-Q/--no-qual
for disabling outputing quality even for FASTQ file. #221
- new flag
seqkit amplicon
:- new flag
-u/--save-unmatched
for saving records that do not match any primer.
- new flag
seqkit sort
:- new flag
-b/--by-bases
for sorting by non-gap bases, for multiple sequence alignment files.#216
- new flag
- Performance improvements
SeqKit v0.16.1
Changelog
SeqKit v0.16.0
Changes
- SeqKit v0.16.0
- new command
seqkit head-genome
:- print sequences of the first genome with common prefixes in name
seqkit grep/locate/amplicon -m
- much faster (300-400x) searching with mismatch allowed by optimizing FM-indexing and parallelization.
- new flag
-I/--immediate-output
.
seqkit grep/locate
:seqkit locate
:seqkit amplicon
:- fix bug of
-m
, when mismatch is allowed.
- fix bug of
seqkit fx2tab
:- new flag
-C/--base-count
for counting bases. #183
- new flag
seqkit tab2fx
:- fix a rare bug. #197
seqkit subseq
:- fix bug for BED with empty columns. #195
seqkit genautocomplete
:- support bash|zsh|fish|powershell.
- new command
SeqKit v0.15.0
Changes
- SeqKit v0.15.0
seqkit grep/locate
: update help message.seqkit grep
: search on both strand when searching by sequence.seqkit split2
: fix redundant log when using-s
.seqkit bam
: new fieldRightSoftClipSeq
. #172seqkit sample -2
: remove extra\n
. #173seqkit split2 -l
: fix bug for splitting by accumulative length, this bug occurs when the first record is longer than-l
, no sequences are lost.
SeqKit v0.14.0
Changes
- SeqKit v0.14.0
- new command
seqkit pair
: match up paired-end reads from two fastq files, faster than fastq-pair. seqkit translate
: new flag-F/--append-fram
for optional adding frame info to ID. #159seqkit stats
: reduce memory usage when using-a
for calculating N50. #153seqkit mutate
: fix inserting sequence-i/--insertion
,
this bug occurs wheninsert site
is big in some cases, don't worry if no error reported.seqkit replace
:- new flag
-U/--keep-untouched
: do not change anything when no value found for the key (only for sequence name). - do no support editing FASTQ sequence.
- new flag
seqkit grep/locate
: new flag--circular
for supporting circular genome. #158
- new command