Skip to content

Releases: shenwei356/seqkit

SeqKit v2.4.0

17 Mar 09:05
Compare
Choose a tag to compare

Changes

  • SeqKit v2.4.0 - 2023-03-17Github Releases (by Release)
    • seqkit:
      • support bzip2 format. #361
      • support setting compression level for gzip, zstd, and bzip2 format via --compress-level. #320
      • the global flag --infile-list accepts stdin (-) now.
      • wrap the help message of flags.
    • seqkit locate:
      • do not remove embeded regions when searching with regular expressions. #368
    • seqkit amplicon:
      • fix BED coordinates for amplicons found in the minus strand. #367
    • seqkit split:
      • fix forgetting to add extension for --two-pass. #332
    • seqkit stats:
      • fix compute Q1 and Q3 of sequence length for one record. #353
    • seqkit grep:
      • fix count number (-C) for matching with mismatch (-m > 0). #370
    • seqkit replace:
      • add some flags to match partly records to edit; these flags are transplanted from seqkit grep. #348
    • seqkit faidx:
      • allow empty lines at the end of sequences.
    • seqkit faidx/sort/shuffle/split/subseq:
      • new flag -U/--update-faidx: update the FASTA index file if it exists, to guarantee the index file matches the FASTA files. #364
      • improve log info and update help message. #365
    • seqkit seq:
      • allow filtering sequences of length zero. thanks to @penglbio.
    • seqkit rename:
      • new flag -s/--separator for setting separator between original ID/name and the counter (default "_"). #360
      • new flag -N/--start-num for setting starting count number for duplicated IDs/names (default 2). #360
      • new flag -1/--rename-1st-rec for renaming the first record as well. #360
      • do not append space if there's no description after the sequene ID.
    • seqkit sliding:
      • new flag -S/--suffix for change the suffix added to the sequence ID (default: "_sliding").

SeqKit v2.3.1

22 Sep 09:25
Compare
Choose a tag to compare

Changes

  • SeqKit v2.3.1 - 2022-09-22 Github Releases (by Release)
    • seqkit grep/locate: fix bug of FMIndex building for empty sequences. #321
    • seqkit split2: fix bug of splitting two FASTA files. #325
    • seqkit faidx: --id-regexp works now.

SeqKit v2.3.0

12 Aug 15:19
Compare
Choose a tag to compare

Changes

  • SeqKit v2.3.0 - 2022-08-12 Github Releases (by Release)
    • seqkit grep/rename:
      • reduce memory comsumption for a lot of searching patterns, and it's faster. #305
      • 2X faster -s/--by-seq.
    • seqkit split
      • fix outputting an empty file when the number of sequence equal to the split size. #293
      • add options to set output file prefix and extention. #296
    • seqkit split2
      • reduce memory consumption. #304
      • add options to set output file prefix
    • seqkit stats:
      • add GC content. #294

SeqKit v2.2.0

14 Mar 11:42
Compare
Choose a tag to compare

Changes

  • SeqKit v2.2.0 - 2022-03-14 Github Releases (by Release)
    • seqkit:
      • add support of zx and zstd input/output formats. #274
      • fix panic when reading records with header of ID + blanks.
    • new command seqkit sum: computing message digest for all sequences in FASTA/Q files.
      The idea comes from @photocyte and the format borrows from seqhash #262
    • new command seqkit fa2fq: retrieving corresponding FASTQ records by a FASTA file
    • seqkit split2:
      • new flag -e/--extension for forcing compresson or changing compression format. #276
      • support changing output prefix via -o/--out-file. #275
    • seqkit concat:
      • fix handling of multiple seqs with the same ID in one file. #269
      • performaning out/full join. #270
      • preserve the comments. #271
    • seqkit locate:
      • parallelizing -F/--use-fmi and -m for large number of search patterns.
    • seqkit amplicon:
      • new flag -M/--output-mismatches to append the total mismatches and mismatches of 5' end and 3' end. #286
    • seqkit grep:
      • detect FASTA/Q symbol @ and > in the searching patterns and show warnings.
      • add new flag -C/--count, like grep -c in GNU grep. #267
    • seqkit range:
      • support removing leading 100 seqs (seqkit range -r 101:-1 == tail -n +101). #279
    • seqkit subseq:
      • report error when no options were given.
    • update doc:
      • seqkit head: add doc for "seqkit tail": seqkit range -N:-1 seqs.fasta. #272
      • seqkit rmdup: add the note of only the first record being saved for duplicates. #265

SeqKit v2.1.0

15 Nov 11:37
Compare
Choose a tag to compare

Changelog

  • SeqKit v2.1.0 - 2021-11-15 Github Releases (by Release)
    • seqkit seq:
      • fix filtering by average quality -Q/-R. #257
    • seqkit convert:
      • fix quality encoding checking, change default value of -N/--thresh-B-in-n-most-common from 4 to 2.
        #254 and #239
    • seqkit split:
      • fix writing an extra empty file when using --two-pass#244
    • seqkit subseq:
      • fix --bed which fail to recognize strand ..
    • seqkit fq2fa:
      • faster, and do not wrap sequences.
    • seqkit grep/locate/mutate:
      • detect unquoted comma and show warning message, e.g., -p 'A{2,}'. #250

SeqKit v2.0.0

28 Aug 09:00
Compare
Choose a tag to compare

Changelogs

  • SeqKit v2.0.0 - 2021-08-27 Github Releases (by Release)
    • Performance improvements
      • seqkit:
        • faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
          • reading (plain text): 4X faster. seqkit stats dataset_C.fq
          • reading (gzip files): 45% faster. seqkit stats dataset_C.fq.gz
          • reading + writing (plain text): 3.5X faster. seqkit grep -p . -v dataset_C.fq -o t
          • reading + writing (gzip files): 2.2X faster. seqkit grep -p . -v dataset_C.fq.gz -o t.gz
        • change default value of -j/--threads from 2 to 4, which is faster for writting gzip files.
      • seqkit seq:
        • fix writing speed, which was slowed down in v0.12.1.
    • Breaking changes
      • seqkit grep/rmdup/common:
        • consider reverse complement sequence by default for comparing by sequence, add flag -P/--only-positive-strand. #215
      • seqkit rename:
        • rename ID only, do not append original header to new ID. #236
      • seqkit fx2tab:
        • for -s/--seq-hash: outputing MD5 instead of hash value (integers) of xxhash. #219
    • Bugfixes
      • seqkit seq:
        • fix failing to output gzipped format for file name with extension of .gz since v0.12.1.
      • seqkit tab2fx:
        • fix bug for very long sequences. #214
      • seqkit fish:
        • fix range check. #213
      • seqkit grep:
        • it's not exactly a bug: forgot to use multi-threads for -m > 0.
    • New features/enhancements
      • seqkit grep:
        • allow empty pattern files.
      • seqkit faidx:
        • support region with begin > end, i.e., returning reverse complement sequence
        • add new flag -l/--region-file: file containing a list of regions.
      • seqkit fx2tab:
        • new flag -Q/--no-qual for disabling outputing quality even for FASTQ file. #221
      • seqkit amplicon:
        • new flag -u/--save-unmatched for saving records that do not match any primer.
      • seqkit sort:
        • new flag -b/--by-bases for sorting by non-gap bases, for multiple sequence alignment files.#216

SeqKit v0.16.1

20 May 00:40
Compare
Choose a tag to compare

Changelog

  • SeqKit v0.16.1 Github Releases (by Release)
    • seqkit shuffle --two-pass: fix bug introduced in #173 . #209
    • seqkit pair: fix a dangerous bug: when input files are not in current directory, input files were overwritten.

SeqKit v0.16.0

16 Apr 05:41
Compare
Choose a tag to compare

Changes

  • SeqKit v0.16.0 Github Releases (by Release)
    • new command seqkit head-genome:
      • print sequences of the first genome with common prefixes in name
    • seqkit grep/locate/amplicon -m
      • much faster (300-400x) searching with mismatch allowed by optimizing FM-indexing and parallelization.
      • new flag -I/--immediate-output.
    • seqkit grep/locate:
      • fix bug of -m when querying contains letters not in alphabet, usually for protein sequences. #178, #179
      • onply search on positive strand when searching unlimited or protein sequences.
    • seqkit locate:
      • removing debug info for -r introduced in a0f6b6e. #180
    • seqkit amplicon:
      • fix bug of -m, when mismatch is allowed.
    • seqkit fx2tab:
      • new flag -C/--base-count for counting bases. #183
    • seqkit tab2fx:
      • fix a rare bug. #197
    • seqkit subseq:
      • fix bug for BED with empty columns. #195
    • seqkit genautocomplete:
      • support bash|zsh|fish|powershell.

SeqKit v0.15.0

12 Jan 14:39
Compare
Choose a tag to compare

Changes

  • SeqKit v0.15.0 Github Releases (by Release)
    • seqkit grep/locate: update help message.
    • seqkit grep: search on both strand when searching by sequence.
    • seqkit split2: fix redundant log when using -s.
    • seqkit bam: new field RightSoftClipSeq. #172
    • seqkit sample -2: remove extra \n. #173
    • seqkit split2 -l: fix bug for splitting by accumulative length, this bug occurs when the first record is longer than -l, no sequences are lost.

SeqKit v0.14.0

30 Oct 01:17
Compare
Choose a tag to compare

Changes

  • SeqKit v0.14.0 Github Releases (by Release)
    • new command seqkit pair: match up paired-end reads from two fastq files, faster than fastq-pair.
    • seqkit translate: new flag -F/--append-fram for optional adding frame info to ID. #159
    • seqkit stats: reduce memory usage when using -a for calculating N50. #153
    • seqkit mutate: fix inserting sequence -i/--insertion,
      this bug occurs when insert site is big in some cases, don't worry if no error reported.
    • seqkit replace:
      • new flag -U/--keep-untouched: do not change anything when no value found for the key (only for sequence name).
      • do no support editing FASTQ sequence.
    • seqkit grep/locate: new flag --circular for supporting circular genome. #158