Skip to content

Commit 5273dc8

Browse files
committed
trunc_seq v0.2 with README
merged functionality of 'trunc_seq' and 'run_trunc_seq' in one script, output for single file input via STDOUT, results dir for fof input '-r', POD and pod2usage, autodie, Getopt::Long, '-o' output format, version switch, remove filepaths and skip empty/comment lines from fof input, check and warn if multi-seq file as input
1 parent 7698dd2 commit 5273dc8

File tree

3 files changed

+466
-89
lines changed

3 files changed

+466
-89
lines changed

trunc_seq/README.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
trunc_seq
2+
=========
3+
4+
`trunc_seq.pl` is a script to truncate sequence files.
5+
6+
* [Synopsis](#synopsis)
7+
* [Description](#description)
8+
* [Usage](#usage)
9+
* [Options](#options)
10+
* [Output](#output)
11+
* [Run environment](#run-environment)
12+
* [Dependencies](#dependencies)
13+
* [Author - contact](#author---contact)
14+
* [Citation, installation, and license](#citation-installation-and-license)
15+
* [Changelog](#changelog)
16+
17+
## Synopsis
18+
19+
perl trunc_seq.pl 20 3500 seq-file.embl > seq-file_trunc_20_3500.embl
20+
21+
**or**
22+
23+
perl trunc_seq.pl file_of_filenames_and_coords.tsv
24+
25+
## Description
26+
27+
This script truncates sequence files according to the given
28+
coordinates. The features/annotations in RichSeq files (e.g. EMBL or
29+
GENBANK format) will also be adapted accordingly. Use option **-o** to
30+
specify a different output sequence format. Input can be given directly
31+
as a file and truncation coordinates to the script, with the start
32+
position as the first argument, stop as the second and (the path to)
33+
the sequence file as the third. In this case the truncated sequence
34+
entry is printed to *STDOUT*. Input sequence files should contain only
35+
one sequence entry, if a multi-sequence file is used as input only the
36+
**first** sequence entry is truncated.
37+
38+
Alternatively, a file of filenames (fof) with respective coordinates
39+
and sequence files in the following **tab-separated** format can be
40+
given to the script (the header is optional):
41+
42+
#start stop seq-file
43+
300 9000 (path/to/)seq-file
44+
50 1300 (path/to/)seq-file2
45+
46+
With a fof the resulting truncated sequence files are printed into a
47+
results directory. Use option **-r** to specify a different results
48+
directory than the default.
49+
50+
It is also possible to truncate a RichSeq sequence file loaded into the
51+
[Artemis](http://www.sanger.ac.uk/science/tools/artemis) genome browser
52+
from the Sanger Institute: Select a subsequence and then go to Edit ->
53+
Subsequence (and Features)
54+
55+
## Usage
56+
57+
perl trunc_seq.pl -o gbk 120 30000 seq-file.embl > seq-file_trunc_120_3000.gbk
58+
59+
**or**
60+
61+
perl trunc_seq.pl -o fasta 5300 18500 seq-file.gbk | perl revcom_seq.pl -i fasta > seq-file_trunc_revcom.fasta
62+
63+
**or**
64+
65+
perl trunc_seq.pl -r path/to/trunc_embl_dir -o embl file_of_filenames_and_coords.tsv
66+
67+
## Options
68+
69+
- **-h**, **-help**
70+
71+
Help (perldoc POD)
72+
73+
- **-o**=*str*, **-outformat**=*str*
74+
75+
Specify different sequence format for the output (files) [fasta, embl, or gbk]
76+
77+
- **-r**=*str*, **-result\_dir**=*str*
78+
79+
Path to result folder for fof input \[default = './trunc\_seq\_results'\]
80+
81+
- **-v**, **-version**
82+
83+
Print version number to *STDOUT*
84+
85+
## Output
86+
87+
- *STDOUT*
88+
89+
If a single sequence file is given to the script the truncated sequence
90+
file is printed to *STDOUT*. Redirect or pipe into another tool as
91+
needed.
92+
93+
**or**
94+
95+
- ./trunc_seq_results
96+
97+
If a fof is given to the script, all output files are stored in a
98+
results folder
99+
100+
- ./trunc_seq_results/seq-file_trunc_start_stop.format
101+
102+
Truncated output sequence files are named appended with 'trunc' and the
103+
corresponding start and stop positions
104+
105+
## Run environment
106+
107+
The Perl script runs under Windows and UNIX flavors.
108+
109+
## Dependencies
110+
111+
- [**BioPerl**](http://www.bioperl.org) (tested version 1.007001)
112+
113+
## Author - contact
114+
115+
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
116+
117+
## Citation, installation, and license
118+
119+
For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md).
120+
121+
## Changelog
122+
123+
* v0.2 (2015-12-07)
124+
* Merged funtionality of `trunc_seq.pl` and `run_trunc_seq.pl` in one single script
125+
* Allows now single file and file of filenames (fof) with coordinates input
126+
* output for single file input printed to *STDOUT* now
127+
* output for fof input printed into files in a result directory, new option **-r** to specify result directory
128+
* included a POD instead of a simple usage text
129+
* included `pod2usage` with Pod::Usage
130+
* included 'use autodie' pragma
131+
* options with Getopt::Long
132+
* output format now specified with option **-o**
133+
* included version switch, **-v**
134+
* fixed bug to remove input filepaths from fof input for output files
135+
* skip empty or comment lines (/^#/) in fof input
136+
* check and warn if input seq file has more than one seq entries
137+
* v0.1 (2013-02-08)
138+
* In v0.1 `trunc_seq.pl` only for single sequence input, but included additional wrapper script `run_trunc_seq.pl` for a fof input

trunc_seq/run_trunc_seq.pl

Lines changed: 0 additions & 46 deletions
This file was deleted.

0 commit comments

Comments
 (0)