Skip to content

Commit 3f17b03

Browse files
committed
more changes to workflow
1 parent f35547f commit 3f17b03

File tree

8 files changed

+118
-53
lines changed

8 files changed

+118
-53
lines changed

README.md

+13-4
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55

66
<img align="right" src=snpgenie/logo.png width=180px>
77

8-
_snpgenie_ is a tool for microbial variant calling and phylogenetic analysis from raw read data. It was primarily written to be used with bacterial isolates of M. Bovis but can be applied to other species. This is in early stages of development. Anyone interested in using the software is encouraged to make sugggestions on improving or adding features.
8+
_snpgenie_ is a tool for microbial variant calling and phylogenetic analysis from raw read data. It was primarily written to be used with bacterial isolates of M. Bovis but can be applied to other species. Anyone interested in using the software is encouraged to make suggestions on improving or adding features.
99

10-
This software is written in Python and is developed with the Qt toolkit using PySide2. It was made on Ubuntu linux but is designed to also run on Windows 10 with a standalone application.
10+
This software is written in Python. It was developed on Ubuntu linux but is designed to also run on Windows 10 with a standalone application. The GUI is made using the Qt toolkit using PySide2.
1111

1212
## Documentation
1313

@@ -19,7 +19,7 @@ Note for Windows users: a standalone installer will be available.
1919

2020
`pip install -e git+https://github.com/dmnfarrell/snpgenie.git#egg=snpgenie`
2121

22-
(You may need to use pip3 on Ubuntu to ensure you use Python 3)
22+
Notes: You may need to use pip3 on Ubuntu to ensure you use Python 3. Use sudo if installing system-wide.
2323

2424
## Usage
2525

@@ -34,6 +34,7 @@ For Linux installs, you require Python 3 and the following packages. These will
3434
* matplotlib
3535
* biopython
3636
* pyvcf
37+
* pyfaidx
3738
* pyside2 (if using GUI)
3839

3940
Other binaries required:
@@ -42,7 +43,15 @@ Other binaries required:
4243
* samtools
4344
* bcftools
4445

45-
The binaries can be installed with apt in Ubuntu. They are downloaded automatically in Windows.
46+
These binaries can be installed with apt in Ubuntu:
47+
48+
`sudo apt install bwa samtools bcftools`
49+
50+
If you want a tree to be built you should install RaXML:
51+
52+
`sudo apt install raxml`
53+
54+
The binaries are downloaded automatically in Windows.
4655

4756
## Screenshots
4857

doc/source/description.rst

+10-14
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,13 @@ Introduction
55

66
This software is written in Python and is developed with the Qt toolkit using PySide2. It was made on Ubuntu linux but is designed to also run on Windows 10 with a standalone application.
77

8-
The Desktop application
9-
=======================
8+
Command line tool
9+
-----------------
1010

11-
Unlike many other SNP calling pipelines, this tool is designed around a graphical user interface. Though it will also work from the command line and via Python scripts.
12-
13-
.. image:: scr1.png
14-
:scale: 45%
11+
This tool works from the command line and via Python scripts. Unlike many other SNP calling pipelines, it is also designed to have a graphical user interface, which is in development.
1512

1613
Current Features
17-
================
14+
----------------
1815

1916
* load multiple fastq files and process together
2017
* view fastq quality statistics
@@ -27,7 +24,7 @@ Current Features
2724
* create phylogenetic tree
2825

2926
Links
30-
=====
27+
-----
3128

3229
http://dmnfarrell.github.io/snpgenie
3330

@@ -37,14 +34,13 @@ Installation
3734
Linux
3835
-----
3936

40-
Install dependencies::
37+
With pip::
4138

42-
pip install pandas matplotlib biopython pyside2
43-
sudo apt install bcftools samtools bwa
39+
pip install -e git+https://github.com/dmnfarrell/snpgenie.git#egg=snpgenie
4440

45-
Get the current version from github::
41+
Install binary dependencies::
4642

47-
git clone https://github.com/dmnfarrell/btbgenie.git
43+
sudo apt install bcftools samtools bwa
4844

4945
Windows
5046
-------
@@ -54,4 +50,4 @@ A standalone installer will be used to deploy on windows.
5450
Mac OSX
5551
-------
5652

57-
Use the Linux instructions.
53+
Not tested. You can try the Linux instructions possibly with bioconda for the binaries.

doc/source/usage.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This will run the entire process based on a set of options given at the terminal
1212
-i FILE, --input FILE
1313
input folder(s)
1414
-l FILE, --labels FILE
15-
sample labels file
15+
sample labels file, optional
1616
-r FILE, --reference FILE
1717
reference genome filename
1818
-w, --overwrite overwrite intermediate files
@@ -26,6 +26,23 @@ This will run the entire process based on a set of options given at the terminal
2626
-v, --version Get version
2727
-s, --test Do test run
2828

29+
Example::
30+
31+
snpgenie -r reference.fa -g reference.gff -i data_files -t 8 -o results
32+
33+
From Python
34+
-----------
35+
36+
You can run a workflow from within Python::
37+
38+
from sngenie import app
39+
args = {'threads':8, 'outdir': 'results', 'labelsep':'-',
40+
'input':['/my/folder/',
41+
'/my/other/folder'],
42+
'reference': None, 'overwrite':False}
43+
W = app.WorkFlow(**args)
44+
st = W.setup()
45+
W.run()
2946

3047
Desktop Application
3148
-------------------

notebooks/albania.ipynb

+10-8
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
},
1313
{
1414
"cell_type": "code",
15-
"execution_count": 2,
15+
"execution_count": 1,
1616
"metadata": {},
1717
"outputs": [],
1818
"source": [
@@ -31,7 +31,7 @@
3131
},
3232
{
3333
"cell_type": "code",
34-
"execution_count": 3,
34+
"execution_count": 2,
3535
"metadata": {},
3636
"outputs": [],
3737
"source": [
@@ -54,6 +54,8 @@
5454
"reference None\n",
5555
"overwrite False\n",
5656
"quality 25\n",
57+
"filters 'QUAL>=40 && INFO/DP>=10 && MQ>40'\n",
58+
"gff_file None\n",
5759
"14 samples were loaded:\n",
5860
"-------------\n",
5961
" name sample filename pair read_length\n",
@@ -108,11 +110,6 @@
108110
"14 samples\n",
109111
"using 4020 sites\n",
110112
"\n",
111-
"building tree\n",
112-
"-------------\n",
113-
"raxmlHPC-PTHREADS -f a -N 10 -T 8 -m GTRCAT -V -p 77901206 -x 18311491 -n variants -w /home/damien/gitprojects/snpgenie/test_results -s ../test_results/core.fa\n",
114-
"/home/damien/gitprojects/snpgenie/test_results/RAxML_bipartitions.variants\n",
115-
"\n",
116113
"Done. Sample summary:\n",
117114
"---------------------\n",
118115
" sample name bam_file read_length\n",
@@ -129,7 +126,12 @@
129126
"10 SRR5486071 SRR5486071_1 ../test_results/mapped/SRR5486071.bam 227\n",
130127
"11 SRR8063654 SRR8063654_1 ../test_results/mapped/SRR8063654.bam 217\n",
131128
"12 SRR8063665 SRR8063665_1 ../test_results/mapped/SRR8063665.bam 231\n",
132-
"13 SRR8065079 SRR8065079_1 ../test_results/mapped/SRR8065079.bam 233\n"
129+
"13 SRR8065079 SRR8065079_1 ../test_results/mapped/SRR8065079.bam 233\n",
130+
"\n",
131+
"building tree\n",
132+
"-------------\n",
133+
"raxmlHPC-PTHREADS -f a -N 10 -T 8 -m GTRCAT -V -p 72177716 -x 81946655 -n variants -w /home/damien/gitprojects/snpgenie/test_results -s ../test_results/core.fa\n",
134+
"/home/damien/gitprojects/snpgenie/test_results/RAxML_bipartitions.variants\n"
133135
]
134136
}
135137
],

setup.py

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
'matplotlib>=3.0',
2323
'biopython>=1.5',
2424
'pyvcf>=0.6',
25+
'pyfaidx',
2526
'pyside2>=5.1',
2627
'future'],
2728
entry_points = {

snpgenie/app.py

+27-16
Original file line numberDiff line numberDiff line change
@@ -42,19 +42,20 @@
4242
module_path = os.path.dirname(os.path.abspath(__file__)) #path to module
4343
datadir = os.path.join(module_path, 'data')
4444
sequence_path = os.path.join(config_path, 'genome')
45+
annotation_path = os.path.join(config_path, 'annotation')
4546
ref_genome = os.path.join(sequence_path, 'Mbovis_AF212297.fa')
4647
ref_gff = os.path.join(datadir, 'Mbovis_csq_format.gff')
4748
#windows only path to binaries
4849
bin_path = os.path.join(config_path, 'binaries')
49-
default_filter = "'QUAL>=40 && INFO/DP>=10 && MQ>40'"
50+
default_filter = 'QUAL>=40 && INFO/DP>=10 && MQ>40'
5051

5152
if not os.path.exists(config_path):
5253
try:
5354
os.makedirs(config_path, exist_ok=True)
5455
except:
5556
os.makedirs(config_path)
5657

57-
defaults = {'threads':4, 'labelsep':'_','quality':25,
58+
defaults = {'threads':4, 'labelsep':'_','quality':25, 'filters': default_filter,
5859
'reference': None, 'gff_file': None, 'overwrite':False}
5960

6061
def check_platform():
@@ -285,11 +286,11 @@ def mpileup_gnuparallel(bam_files, ref, outpath, threads=4, callback=None):
285286
return rawbcf
286287

287288
def variant_calling(bam_files, ref, outpath, relabel=True, threads=4,
288-
callback=None, overwrite=False, filter=None, gff_file=None, **kwargs):
289+
callback=None, overwrite=False, filters=None, gff_file=None, **kwargs):
289290
"""Call variants with bcftools"""
290291

291-
if filter == None:
292-
filter = default_filter
292+
if filters == None:
293+
filters = default_filter
293294
rawbcf = os.path.join(outpath,'raw.bcf')
294295
bcftoolscmd = tools.get_cmd('bcftools')
295296
if not os.path.exists(rawbcf) or overwrite == True:
@@ -321,11 +322,11 @@ def variant_calling(bam_files, ref, outpath, relabel=True, threads=4,
321322

322323
#filter variants
323324
final = os.path.join(outpath,'filtered.vcf.gz')
324-
cmd = '{bc} filter -i {f} -o {o} -O z {i}'.format(bc=bcftoolscmd,i=vcfout,o=final,f=filter)
325+
cmd = '{bc} filter -i "{f}" -o {o} -O z {i}'.format(bc=bcftoolscmd,i=vcfout,o=final,f=filters)
325326
print (cmd)
326327
tmp = subprocess.check_output(cmd,shell=True)
327328
if callback != None:
328-
callback(tmp)
329+
callback(cmd)
329330

330331
#consequence calling
331332
if gff_file != None:
@@ -399,6 +400,9 @@ def setup(self):
399400
self.filenames = get_files_from_paths(self.input)
400401
self.threads = int(self.threads)
401402
df = get_samples(self.filenames, sep=self.labelsep)
403+
if len(df) == 0:
404+
print ('no samples provided')
405+
return
402406
df['read_length'] = df.filename.apply(tools.get_fastq_info)
403407
self.fastq_table = df
404408
sample_size = len(df['sample'].unique())
@@ -444,7 +448,7 @@ def run(self):
444448
print ('----------------')
445449
bam_files = list(samples.bam_file.unique())
446450
self.vcf_file = variant_calling(bam_files, self.reference, self.outdir, threads=self.threads,
447-
gff_file=self.gff_file,
451+
gff_file=self.gff_file, filters=self.filters,
448452
overwrite=self.overwrite)
449453
print (self.vcf_file)
450454
print ()
@@ -468,6 +472,8 @@ def run(self):
468472
print ('building tree')
469473
print ('-------------')
470474
treefile = trees.run_RAXML(outfasta, outpath=self.outdir)
475+
if treefile == None:
476+
return
471477
print (treefile)
472478
#labelmap = dict(zip(sra.filename,sra.geo_loc_name_country))
473479
t,ts = trees.create_tree(treefile)#, labelmap)
@@ -491,25 +497,25 @@ def main():
491497

492498
import sys, os
493499
from argparse import ArgumentParser
494-
parser = ArgumentParser(description='snpgenie CLI tool. https://github.com/dmnfarrell/btbgenie')
495-
#parser.add_argument("-f", "--fasta", dest="filenames",default=[],
496-
# help="input fasta file", metavar="FILE")
500+
parser = ArgumentParser(description='snpgenie CLI tool. https://github.com/dmnfarrell/snpgenie')
497501
parser.add_argument("-i", "--input", action='append', dest="input", default=[],
498502
help="input folder(s)", metavar="FILE")
499-
parser.add_argument("-l", "--labels", dest="labels", default=[],
500-
help="sample labels file", metavar="FILE")
503+
#parser.add_argument("-l", "--labels", dest="labels", default=[],
504+
# help="sample labels file, optional", metavar="FILE")
501505
parser.add_argument("-e", "--labelsep", dest="labelsep", default=',',
502-
help="symbol to split sample labels on")
506+
help="symbol to split the sample labels on")
503507
parser.add_argument("-r", "--reference", dest="reference", default=None,
504508
help="reference genome filename", metavar="FILE")
505509
parser.add_argument("-g", "--gff", dest="gff_file", default=None,
506510
help="reference gff, optional", metavar="FILE")
507511
parser.add_argument("-w", "--overwrite", dest="overwrite", action="store_true", default=False,
508-
help="overwrite intermediate files" )
512+
help="overwrite intermediate files")
509513
parser.add_argument("-m", "--trim", dest="trim", action="store_true", default=False,
510-
help="trim fastq files" )
514+
help="whether to trim fastq files" )
511515
parser.add_argument("-q", "--quality", dest="quality", default=25,
512516
help="trim quality" )
517+
parser.add_argument("-f", "--filters", dest="filters", default=default_filter,
518+
help="variant calling post-filters" )
513519
parser.add_argument("-t", "--threads", dest="threads",default=4,
514520
help="cpu threads to use")
515521
parser.add_argument("-o", "--outdir", dest="outdir",
@@ -527,6 +533,11 @@ def main():
527533
from . import __version__
528534
print ('snpgenie version %s' %__version__)
529535
print ('https://github.com/dmnfarrell/btbgenie')
536+
elif args['outdir'] == None:
537+
print ('No input or output folders provided. These are required.')
538+
print ('Example:')
539+
print ('snpgenie -r <reference> -i <input folder with fastq.gz files> -o <output folder>')
540+
print ('Use -h for more help on options.')
530541
else:
531542
W = WorkFlow(**args)
532543
st = W.setup()

0 commit comments

Comments
 (0)