Update main README and trunc_seq README

aleimba · aleimba · commit 643607b322d1 · 2016-12-21T15:22:35.000+01:00
new revcom_seq and trunc_seq info,
updated links and author info,
minor update to trunc_seq README
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 bac-genomics-scripts
 ====================
 
-A collection of scripts intended for bacterial genomics (some might also be useful for eukaryotes).
+A collection of scripts intended for **bacterial genomics** (some might also be useful for eukaryotes) from **high-throughput sequencing** (aka next-generation sequencing).
 
 * [Summary](#summary)
 * [Introduction](#introduction)
@@ -11,6 +11,7 @@ A collection of scripts intended for bacterial genomics (some might also be usef
 * [Windows - UNIX linebreak problems](#windows---unix-linebreak-problems)
 * [Citation](#citation)
 * [License](#license)
+* [Author - contact](#author---contact)
 
 ## Summary
 
@@ -20,17 +21,19 @@ A collection of scripts intended for bacterial genomics (some might also be usef
 * Extraction of protein/nucleotide sequences from CDSs: [`cds_extractor`](/cds_extractor)
 * MLST (multilocus sequence typing) assignment and allele extraction for *Escherichia coli* ([Achtman scheme](http://mlst.warwick.ac.uk/mlst/)): [`ecoli_mlst`](/ecoli_mlst)
 * Create a feature table for all annotated primary features in RichSeq (EMBL or GENBANK format) files: [`genomes_feature_table`](/genomes_feature_table)
-* Batch downloading of sequences from NCBI's FTP server: [`ncbi_ftp_download`](/ncbi_ftp_download) and `ncbi_e-utilities`
+* **Deprecated!** Batch downloading of sequences from NCBI's FTP server: [`ncbi_ftp_download`](/ncbi_ftp_download)
 * Order sequence entries in FASTA/FASTQ files according to an ID list: [`order_fastx`](/order_fastx)
 * Create an ortholog/paralog annotation comparison matrix from [*Proteinortho5*](http://www.bioinf.uni-leipzig.de/Software/proteinortho/) output: [`po2anno`](/po2anno)
 * Calculate stats and plot venn diagrams for genome groups according to orthologs/paralogs from [*Proteinortho5*](http://www.bioinf.uni-leipzig.de/Software/proteinortho/) output, i.e. overall presence/absence statistics for groups of genomes and not simply single genomes: [`po2group_stats`](/po2group_stats)
 * Strain panel query protein search with **BLASTP** plus concise hit summary, optional alignment, and presence/absence matrix. Also included, scripts to transpose the matrix and calculate overall presence/absence statistics for groups of columns in the matrix: [`prot_finder`](/prot_finder)
 * Rename FASTA ID lines and optionally numerate them: [`rename_fasta_id`](/rename_fasta_id)
+* Reverse complement (multi-)sequence files (RichSeq EMBL or GENBANK format, or FASTA format): [`revcom_seq`](/revcom_seq)
 * Regions of difference (ROD) detection in genomes with **BLASTN**: [`rod_finder`](/rod_finder)
 * NGS paired-end library insert size estimation from BAM/SAM: [`sam_insert-size`](/sam_insert-size)
 * Randomly subsample FASTA, FASTQ, or TEXT files with [*reservoir sampling*](https://en.wikipedia.org/wiki/Reservoir_sampling): [`sample_fastx-txt`](/sample_fastx-txt)
 * Convert a sequence file to another format with [BioPerl](http://www.bioperl.org): [`seq_format-converter`](/seq_format-converter)
 * Manual curation of annotation in NCBI's TBL format (e.g. from [Prokka](http://www.vicbioinformatics.com/software.prokka.shtml) automatic annotation) in a spreadsheet software: [`tbl2tab`](/tbl2tab)
+* Truncate sequence files (RichSeq EMBL or GENBANK format, or FASTA format) according to given coordinates: [`trunc_seq`](/trunc_seq)
 * And an assortment of smaller scripts for tasks like (not yet uploaded to GitHub): alignment format converters, dnadiff, GC% calculation etc.
 
 ## Introduction
@@ -43,7 +46,7 @@ The scripts are only tested under UNIX, some won't run in a Windows environment
 
 ## Installation recommendations
 
-To download the repository, use either the ['Download ZIP'](https://github.com/aleimba/bac-genomics-scripts/archive/master.zip) button on the right hand side or clone the repository with `git`:
+To download the repository, use either the '[Download ZIP](https://github.com/aleimba/bac-genomics-scripts/archive/master.zip)' link after clicking the green 'Clone or download' button at the top or clone the repository with `git`:
 
     git clone https://github.com/aleimba/bac-genomics-scripts.git
 
@@ -56,7 +59,7 @@ To install the scripts, copy them e.g. to a home */bin* folder in your *PATH* an
     $ find . \( -name '*.pl' -o -name '*.sh' -o -name '*.fas' -o -name '*.txt' \) -exec cp {} ~/bin \;
     $ chmod u+x ~/bin/*.pl
 
-the scripts and can then be run everywhere on your system. Of course you can just call them directly by prefexing `perl` to the command or a './' for bash wrappers:
+the scripts can then be run everywhere on your system. Of course you can just call them directly by prefexing `perl` to the command or a './' for bash wrappers:
 
     $ perl /path/to/script/script.pl <options>
 
@@ -68,9 +71,9 @@ or
 
 ## Dependencies
 
-All scripts are tested with Perl v5.18.2.
+All scripts are tested with Perl v5.22.1.
 
-Most of the Perl scripts include modules from [BioPerl](http://www.bioperl.org) as stated in their respective *README.md*, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website ([**How Do I...?...install BioPerl?**](http://www.bioperl.org/wiki/Installing_BioPerl)).
+Most of the Perl scripts include modules from [BioPerl](http://www.bioperl.org) as stated in their respective *README.md* or POD, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website ([**Installation**](http://bioperl.org/INSTALL.html)).
 
 Some scripts need additional Perl modules, which will be stated in the associated *README.md* or POD. If they're not installed yet on your system get them from [CPAN](http://www.cpan.org/) (installation instructions can be found on the website, see e.g. [**Getting Started...Installing Perl Modules**](http://www.cpan.org/modules/INSTALL.html) or [**FAQ**](http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules)).
 
@@ -80,7 +83,7 @@ Furthermore, some scripts call upon statistical computing language [**R**](http:
 
 A very handy tip, if you want to run a script on all files in the current working directory you can use a **loop** in UNIX, e.g.:
 
-    $ for i in *.fasta; do perl script.pl -i $i; done
+    $ for file in *.fasta; do perl script.pl "$file"; done
 
 ## Windows - UNIX linebreak problems
 
@@ -94,6 +97,9 @@ For now you can cite this repository by using this URL (https://github.com/aleim
 
 ## License
 
-All scripts are licensed under GPLv3 which is contained in the file *LICENSE*.
+All scripts are licensed under GPLv3 which is contained in the file [*LICENSE*](./LICENSE).
 
+## Author - contact
 For help, suggestions, bugs etc. use the GitHub issues or write an email to aleimba [at] gmx [dot] de.
+
+Andreas Leimbach (Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
diff --git a/trunc_seq/README.md b/trunc_seq/README.md
@@ -39,9 +39,9 @@ Alternatively, a file of filenames (fof) with respective coordinates
 and sequence files in the following **tab-separated** format can be
 given to the script (the header is optional):
 
-    #start&emsp;stop&emsp;seq-file
-    300&emsp;9000&emsp;(path/to/)seq-file
-    50&emsp;1300&emsp;(path/to/)seq-file2
+\#start&emsp;stop&emsp;seq-file<br>
+300&emsp;9000&emsp;(path/to/)seq-file<br>
+50&emsp;1300&emsp;(path/to/)seq-file2<br>
 
 With a fof the resulting truncated sequence files are printed into a
 results directory. Use option **-r** to specify a different results