Skip to content

Commit 5c238b9

Browse files
committed
v0.3.3
1 parent 1680862 commit 5c238b9

File tree

5 files changed

+131
-29
lines changed

5 files changed

+131
-29
lines changed

doc/docs/download.md

+18-14
Original file line numberDiff line numberDiff line change
@@ -6,31 +6,30 @@ SeqKit is implemented in [Golang](https://golang.org/) programming language,
66

77
## Latest Version
88

9-
[SeqKit v0.3.2](https://github.com/shenwei356/seqkit/releases/tag/v0.3.2)
9+
[SeqKit v0.3.3](https://github.com/shenwei356/seqkit/releases/tag/v0.3.3)
1010

1111

1212
***64-bit versions are highly recommended.***
1313

1414
### Links
1515

16-
1716
- **Linux**
18-
- [seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_linux_386.tar.gz)
19-
- [seqkit_linux_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_linux_amd64.tar.gz)
20-
- [seqkit_linux_arm.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_linux_arm.tar.gz)
17+
- [seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_linux_386.tar.gz)
18+
- [seqkit_linux_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_linux_amd64.tar.gz)
19+
- [seqkit_linux_arm.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_linux_arm.tar.gz)
2120
- **Mac OS X**
22-
- [seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_darwin_386.tar.gz)
23-
- [seqkit_darwin_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_darwin_amd64.tar.gz)
21+
- [seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_darwin_386.tar.gz)
22+
- [seqkit_darwin_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_darwin_amd64.tar.gz)
2423
- **Windows**
25-
- [seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_windows_386.exe.tar.gz)
26-
- [seqkit_windows_amd64.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_windows_amd64.exe.tar.gz)
24+
- [seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_windows_386.exe.tar.gz)
25+
- [seqkit_windows_amd64.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_windows_amd64.exe.tar.gz)
2726
- **FreeBSD**
28-
- [seqkit_freebsd_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_freebsd_386.tar.gz)
29-
- [seqkit_freebsd_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_freebsd_amd64.tar.gz)
30-
- [seqkit_freebsd_arm.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_freebsd_arm.tar.gz)
27+
- [seqkit_freebsd_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_freebsd_386.tar.gz)
28+
- [seqkit_freebsd_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_freebsd_amd64.tar.gz)
29+
- [seqkit_freebsd_arm.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_freebsd_arm.tar.gz)
3130
- **OpenBSD**
32-
- [seqkit_openbsd_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_openbsd_386.tar.gz)
33-
- [seqkit_openbsd_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.2/seqkit_openbsd_amd64.tar.gz)
31+
- [seqkit_openbsd_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_openbsd_386.tar.gz)
32+
- [seqkit_openbsd_amd64.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.3.3/seqkit_openbsd_amd64.tar.gz)
3433

3534
### Mirror site for Chinese user
3635

@@ -78,6 +77,11 @@ For Go developer, just one command:
7877

7978
## Release History
8079

80+
- [SeqKit v0.3.3](https://github.com/shenwei356/seqkit/releases/tag/v0.3.3)
81+
- fix bug of `seqkit replace`, wrongly starting from 2 when using `{nr}`
82+
in `-r` (`--replacement`)
83+
- new feature: `seqkit replace` supports replacement symbols `{nr}` (record number)
84+
and `{kv}` (corresponding value of the key ($1) by key-value file)
8185
- [SeqKit v0.3.2](https://github.com/shenwei356/seqkit/releases/tag/v0.3.2)
8286
- fix bug of `seqkit split`, error when target file is in a directory.
8387
- improve performance of `seqkit spliding` for big sequences, and output

doc/docs/usage.md

+9-3
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Usage
9999
```
100100
SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
101101
102-
Version: 0.3.1.1
102+
Version: 0.3.3
103103
104104
Author: Wei Shen <[email protected]>
105105
@@ -1022,7 +1022,7 @@ Examples
10221022
Usage
10231023

10241024
```
1025-
replace name/sequence/by regular expression.
1025+
replace name/sequence by regular expression.
10261026
10271027
Note that the replacement supports capture variables.
10281028
e.g. $1 represents the text of the first submatch.
@@ -1038,14 +1038,20 @@ Or use the \ escape character.
10381038
10391039
more on: http://shenwei356.github.io/seqkit/usage/#replace
10401040
1041+
Special repalcement symbols:
1042+
1043+
{nr} Record number, starting from 1
1044+
{kv} Corresponding value of the key ($1) by key-value file
1045+
10411046
Usage:
10421047
seqkit replace [flags]
10431048
10441049
Flags:
10451050
-s, --by-seq replace seq
10461051
-i, --ignore-case ignore case
1052+
-k, --kv-file string tab-delimited key-value file for replacing key with value when using "{kv}" in -r (--replacement)
10471053
-p, --pattern string search regular expression
1048-
-r, --replacement string replacement. supporting capture variables. e.g. $1 represents the text of the first submatch. ATTENTION: use SINGLE quote NOT double quotes in *nix OS or use the \ escape character. record number is also supported by "{NR}"
1054+
-r, --replacement string replacement. supporting capture variables. e.g. $1 represents the text of the first submatch. ATTENTION: use SINGLE quote NOT double quotes in *nix OS or use the \ escape character. Record number is also supported by "{nr}"
10491055
10501056
```
10511057

doc/site

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Subproject commit 782503376f2ab54df795036d02c9dadfab6155f7
1+
Subproject commit 463f2ef8617e0d1ebfcc764124b8e70bf5f76ebf

seqkit/cmd/helper.go

+32-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ import (
4343
)
4444

4545
// VERSION of seqkit
46-
const VERSION = "0.3.2"
46+
const VERSION = "0.3.3"
4747

4848
func checkError(err error) {
4949
if err != nil {
@@ -378,3 +378,34 @@ var bufferedByteSliceWrapper *byteutil.BufferedByteSliceWrapper
378378
// func init() {
379379
// bufferedByteSliceWrapper = byteutil.NewBufferedByteSliceWrapper(1, defaultBytesBufferSize)
380380
// }
381+
382+
func readKVs(file string) (map[string]string, error) {
383+
type KV [2]string
384+
fn := func(line string) (interface{}, bool, error) {
385+
if len(line) == 0 {
386+
return nil, false, nil
387+
}
388+
items := strings.Split(strings.TrimRight(line, "\r\n"), "\t")
389+
if len(items) < 2 {
390+
return nil, false, nil
391+
}
392+
393+
return KV([2]string{items[0], items[1]}), true, nil
394+
}
395+
kvs := make(map[string]string)
396+
reader, err := breader.NewBufferedReader(file, 2, 10, fn)
397+
if err != nil {
398+
return kvs, err
399+
}
400+
var items KV
401+
for chunk := range reader.Ch {
402+
if chunk.Err != nil {
403+
return kvs, err
404+
}
405+
for _, data := range chunk.Data {
406+
items = data.(KV)
407+
kvs[items[0]] = items[1]
408+
}
409+
}
410+
return kvs, nil
411+
}

seqkit/cmd/replace.go

+71-10
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import (
2626
"regexp"
2727
"runtime"
2828
"strconv"
29+
"strings"
2930

3031
"github.com/shenwei356/bio/seq"
3132
"github.com/shenwei356/bio/seqio/fastx"
@@ -53,6 +54,11 @@ Or use the \ escape character.
5354
5455
more on: http://shenwei356.github.io/seqkit/usage/#replace
5556
57+
Special repalcement symbols:
58+
59+
{nr} Record number, starting from 1
60+
{kv} Corresponding value of the key ($1) by key-value file
61+
5662
`,
5763
Run: func(cmd *cobra.Command, args []string) {
5864
config := getConfigs(cmd)
@@ -66,10 +72,7 @@ more on: http://shenwei356.github.io/seqkit/usage/#replace
6672

6773
pattern := getFlagString(cmd, "pattern")
6874
replacement := []byte(getFlagString(cmd, "replacement"))
69-
var replaceeWithNR bool
70-
if reNR.Match(replacement) {
71-
replaceeWithNR = true
72-
}
75+
kvFile := getFlagString(cmd, "kv-file")
7376

7477
bySeq := getFlagBool(cmd, "by-seq")
7578
// byName := getFlagBool(cmd, "by-name")
@@ -78,26 +81,63 @@ more on: http://shenwei356.github.io/seqkit/usage/#replace
7881
if pattern == "" {
7982
checkError(fmt.Errorf("flags -p (--pattern) needed"))
8083
}
81-
8284
p := pattern
8385
if ignoreCase {
8486
p = "(?i)" + p
8587
}
8688
patternRegexp, err := regexp.Compile(p)
8789
checkError(err)
8890

91+
var replaceWithNR bool
92+
if reNR.Match(replacement) {
93+
replaceWithNR = true
94+
}
95+
96+
var replaceWithKV bool
97+
var kvs map[string]string
98+
if reKV.Match(replacement) {
99+
replaceWithKV = true
100+
if !regexp.MustCompile(`\(.+\)`).MatchString(pattern) {
101+
checkError(fmt.Errorf(`value of -p (--pattern) must contains "(" and ")" to capture data which is used specify the KEY`))
102+
}
103+
if kvFile == "" {
104+
checkError(fmt.Errorf(`since repalcement symbol "{kv}"/"{KV}" found in value of flag -r (--replacement), tab-delimited key-value file should be given by flag -k (--kv-file)`))
105+
}
106+
log.Infof("read key-value file: %s", kvFile)
107+
kvs, err = readKVs(kvFile)
108+
if err != nil {
109+
checkError(fmt.Errorf("read key-value file: %s", err))
110+
}
111+
if len(kvs) == 0 {
112+
checkError(fmt.Errorf("no valid data in key-value file: %s", kvFile))
113+
}
114+
115+
if ignoreCase {
116+
kvs2 := make(map[string]string, len(kvs))
117+
for k, v := range kvs {
118+
kvs2[strings.ToLower(k)] = v
119+
}
120+
kvs = kvs2
121+
}
122+
123+
log.Infof("%d pairs of key-value loaded", len(kvs))
124+
}
125+
89126
files := getFileList(args)
90127

91128
outfh, err := xopen.Wopen(outFile)
92129
checkError(err)
93130
defer outfh.Close()
94131

95132
var r []byte
133+
var found [][]byte
134+
var k string
135+
var ok bool
96136
for _, file := range files {
97137

98138
fastxReader, err := fastx.NewReader(alphabet, file, idRegexp)
99139
checkError(err)
100-
nr := 1
140+
nr := 0
101141
for {
102142
record, err := fastxReader.Read()
103143
if err != nil {
@@ -113,12 +153,30 @@ more on: http://shenwei356.github.io/seqkit/usage/#replace
113153
record.Seq.Seq = patternRegexp.ReplaceAll(record.Seq.Seq, replacement)
114154
} else {
115155
r = replacement
116-
if replaceeWithNR {
117-
r = reNR.ReplaceAll(replacement, []byte(strconv.Itoa(nr)))
156+
157+
if replaceWithNR {
158+
r = reNR.ReplaceAll(r, []byte(strconv.Itoa(nr)))
159+
}
160+
161+
if replaceWithKV {
162+
found = patternRegexp.FindSubmatch(record.Name)
163+
if len(found) > 0 {
164+
k = string(found[1])
165+
if ignoreCase {
166+
k = strings.ToLower(k)
167+
}
168+
if _, ok = kvs[k]; ok {
169+
r = reKV.ReplaceAll(r, []byte(kvs[k]))
170+
} else {
171+
r = reKV.ReplaceAll(r, found[1])
172+
}
173+
}
118174
}
119-
record.Name = patternRegexp.ReplaceAll(record.Name, r)
175+
120176
}
121177

178+
record.Name = patternRegexp.ReplaceAll(record.Name, r)
179+
122180
record.FormatToWriter(outfh, lineWidth)
123181
}
124182

@@ -133,10 +191,13 @@ func init() {
133191
"replacement. supporting capture variables. "+
134192
" e.g. $1 represents the text of the first submatch. "+
135193
"ATTENTION: use SINGLE quote NOT double quotes in *nix OS or "+
136-
`use the \ escape character. Record number is also supported by "{NR}"`)
194+
`use the \ escape character. Record number is also supported by "{nr}"`)
137195
// replaceCmd.Flags().BoolP("by-name", "n", false, "replace full name instead of just id")
138196
replaceCmd.Flags().BoolP("by-seq", "s", false, "replace seq")
139197
replaceCmd.Flags().BoolP("ignore-case", "i", false, "ignore case")
198+
replaceCmd.Flags().StringP("kv-file", "k", "",
199+
`tab-delimited key-value file for replacing key with value when using "{kv}" in -r (--replacement)`)
140200
}
141201

142202
var reNR = regexp.MustCompile(`\{(NR|nr)\}`)
203+
var reKV = regexp.MustCompile(`\{(KV|kv)\}`)

0 commit comments

Comments
 (0)