You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-8Lines changed: 14 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
bac-genomics-scripts
2
2
====================
3
3
4
-
A collection of scripts intended for bacterial genomics (some might also be useful for eukaryotes).
4
+
A collection of scripts intended for **bacterial genomics** (some might also be useful for eukaryotes) from **high-throughput sequencing** (aka next-generation sequencing).
5
5
6
6
*[Summary](#summary)
7
7
*[Introduction](#introduction)
@@ -11,6 +11,7 @@ A collection of scripts intended for bacterial genomics (some might also be usef
@@ -20,17 +21,19 @@ A collection of scripts intended for bacterial genomics (some might also be usef
20
21
* Extraction of protein/nucleotide sequences from CDSs: [`cds_extractor`](/cds_extractor)
21
22
* MLST (multilocus sequence typing) assignment and allele extraction for *Escherichia coli* ([Achtman scheme](http://mlst.warwick.ac.uk/mlst/)): [`ecoli_mlst`](/ecoli_mlst)
22
23
* Create a feature table for all annotated primary features in RichSeq (EMBL or GENBANK format) files: [`genomes_feature_table`](/genomes_feature_table)
23
-
* Batch downloading of sequences from NCBI's FTP server: [`ncbi_ftp_download`](/ncbi_ftp_download) and `ncbi_e-utilities`
24
+
***Deprecated!**Batch downloading of sequences from NCBI's FTP server: [`ncbi_ftp_download`](/ncbi_ftp_download)
24
25
* Order sequence entries in FASTA/FASTQ files according to an ID list: [`order_fastx`](/order_fastx)
25
26
* Create an ortholog/paralog annotation comparison matrix from [*Proteinortho5*](http://www.bioinf.uni-leipzig.de/Software/proteinortho/) output: [`po2anno`](/po2anno)
26
27
* Calculate stats and plot venn diagrams for genome groups according to orthologs/paralogs from [*Proteinortho5*](http://www.bioinf.uni-leipzig.de/Software/proteinortho/) output, i.e. overall presence/absence statistics for groups of genomes and not simply single genomes: [`po2group_stats`](/po2group_stats)
27
28
* Strain panel query protein search with **BLASTP** plus concise hit summary, optional alignment, and presence/absence matrix. Also included, scripts to transpose the matrix and calculate overall presence/absence statistics for groups of columns in the matrix: [`prot_finder`](/prot_finder)
28
29
* Rename FASTA ID lines and optionally numerate them: [`rename_fasta_id`](/rename_fasta_id)
30
+
* Reverse complement (multi-)sequence files (RichSeq EMBL or GENBANK format, or FASTA format): [`revcom_seq`](/revcom_seq)
29
31
* Regions of difference (ROD) detection in genomes with **BLASTN**: [`rod_finder`](/rod_finder)
30
32
* NGS paired-end library insert size estimation from BAM/SAM: [`sam_insert-size`](/sam_insert-size)
31
33
* Randomly subsample FASTA, FASTQ, or TEXT files with [*reservoir sampling*](https://en.wikipedia.org/wiki/Reservoir_sampling): [`sample_fastx-txt`](/sample_fastx-txt)
32
34
* Convert a sequence file to another format with [BioPerl](http://www.bioperl.org): [`seq_format-converter`](/seq_format-converter)
33
35
* Manual curation of annotation in NCBI's TBL format (e.g. from [Prokka](http://www.vicbioinformatics.com/software.prokka.shtml) automatic annotation) in a spreadsheet software: [`tbl2tab`](/tbl2tab)
36
+
* Truncate sequence files (RichSeq EMBL or GENBANK format, or FASTA format) according to given coordinates: [`trunc_seq`](/trunc_seq)
34
37
* And an assortment of smaller scripts for tasks like (not yet uploaded to GitHub): alignment format converters, dnadiff, GC% calculation etc.
35
38
36
39
## Introduction
@@ -43,7 +46,7 @@ The scripts are only tested under UNIX, some won't run in a Windows environment
43
46
44
47
## Installation recommendations
45
48
46
-
To download the repository, use either the ['Download ZIP'](https://github.com/aleimba/bac-genomics-scripts/archive/master.zip) button on the right hand side or clone the repository with `git`:
49
+
To download the repository, use either the '[Download ZIP](https://github.com/aleimba/bac-genomics-scripts/archive/master.zip)' link after clicking the green 'Clone or download' button at the top or clone the repository with `git`:
the scripts and can then be run everywhere on your system. Of course you can just call them directly by prefexing `perl` to the command or a './' for bash wrappers:
62
+
the scripts can then be run everywhere on your system. Of course you can just call them directly by prefexing `perl` to the command or a './' for bash wrappers:
60
63
61
64
$ perl /path/to/script/script.pl <options>
62
65
@@ -68,9 +71,9 @@ or
68
71
69
72
## Dependencies
70
73
71
-
All scripts are tested with Perl v5.18.2.
74
+
All scripts are tested with Perl v5.22.1.
72
75
73
-
Most of the Perl scripts include modules from [BioPerl](http://www.bioperl.org) as stated in their respective *README.md*, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website ([**How Do I...?...install BioPerl?**](http://www.bioperl.org/wiki/Installing_BioPerl)).
76
+
Most of the Perl scripts include modules from [BioPerl](http://www.bioperl.org) as stated in their respective *README.md* or POD, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website ([**Installation**](http://bioperl.org/INSTALL.html)).
74
77
75
78
Some scripts need additional Perl modules, which will be stated in the associated *README.md* or POD. If they're not installed yet on your system get them from [CPAN](http://www.cpan.org/) (installation instructions can be found on the website, see e.g. [**Getting Started...Installing Perl Modules**](http://www.cpan.org/modules/INSTALL.html) or [**FAQ**](http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules)).
76
79
@@ -80,7 +83,7 @@ Furthermore, some scripts call upon statistical computing language [**R**](http:
80
83
81
84
A very handy tip, if you want to run a script on all files in the current working directory you can use a **loop** in UNIX, e.g.:
82
85
83
-
$ for i in *.fasta; do perl script.pl -i $i; done
86
+
$ for file in *.fasta; do perl script.pl "$file"; done
84
87
85
88
## Windows - UNIX linebreak problems
86
89
@@ -94,6 +97,9 @@ For now you can cite this repository by using this URL (https://github.com/aleim
94
97
95
98
## License
96
99
97
-
All scripts are licensed under GPLv3 which is contained in the file *LICENSE*.
100
+
All scripts are licensed under GPLv3 which is contained in the file [*LICENSE*](./LICENSE).
98
101
102
+
## Author - contact
99
103
For help, suggestions, bugs etc. use the GitHub issues or write an email to aleimba [at] gmx [dot] de.
104
+
105
+
Andreas Leimbach (Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
0 commit comments