Skip to content

Commit 88c1962

Browse files
Jon PalmerJon Palmer
Jon Palmer
authored and
Jon Palmer
committed
bug fix goatools change header output again
1 parent 3ac1a10 commit 88c1962

File tree

4 files changed

+45
-12
lines changed

4 files changed

+45
-12
lines changed

bin/funannotate-compare.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -578,11 +578,12 @@ def __init__(self,prog):
578578
file = os.path.join(args.out, 'go_enrichment', f)
579579
base = file.split('.go_enrichment.txt')[0]
580580
name = base.split('/')[-1]
581-
#check output, > 3 lines means there is some data, otherwise nothing.
582-
num_lines = sum(1 for line in open(file))
581+
#check goatools output, return is a tuple with True/False and header line #
582+
goresult = lib.checkgoatools(file)
583583
output.write('<h4 class="sub-header" align="left">GO Enrichment: '+name+'</h4>')
584-
if num_lines > 9: #goatools changed output, empty files now have 9 lines instead of 3...
585-
df = pd.read_csv(file, sep='\t', skiprows=8) #the 9th row is the header
584+
#goatools keeps changing output - which really sucks....trying now to parse the header, hopefully that doesnt change
585+
if goresult[0]:#goatools changed output, empty files now have 9 lines instead of 3...
586+
df = pd.read_csv(file, sep='\t', skiprows=goresult[1]) #the get header row from tuple
586587
df['enrichment'].replace('p', 'under', inplace=True)
587588
df['enrichment'].replace('e', 'over', inplace=True)
588589
df2 = df.loc[df['p_fdr'] < args.go_fdr]

bin/funannotate-functional.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -218,9 +218,11 @@ def runIPRpython(Input):
218218
lib.gb2output(genbank, Proteins, Transcripts, Scaffolds)
219219

220220
#get absolute path for all input so there are no problems later
221+
Scaffolds, Proteins, Transcripts, GFF = [os.path.abspath(i) for i in [Scaffolds, Proteins, Transcripts, GFF]] #suggestion via GitHub
222+
'''
221223
for i in Scaffolds, Proteins, Transcripts, GFF:
222224
i = os.path.abspath(i)
223-
225+
'''
224226

225227
#get organism and isolate from GBK file
226228
if not args.species:

docs/mac_install.md

+23-7
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@ brew tap nextgenusfs/tap
2525
brew install python
2626
2727
#then setup pip and install modules to local python
28-
pip install -U biopython natsort psutil goatools numpy pandas matplotlib seaborn scikit-learn
28+
pip install -U biopython natsort psutil goatools fisher numpy pandas matplotlib seaborn scikit-learn
2929
```
3030

3131
4) Install Perl modules via cpanm (`brew install cpanm`)
3232
```
33-
cpanm BioPerl Getopt::Long Pod::Usage File::Basename threads threads::shared \
33+
cpanm Bio::Perl Getopt::Long Pod::Usage File::Basename threads threads::shared \
3434
Thread::Queue Carp Data::Dumper YAML Hash::Merge Logger::Simple Parallel::ForkManager \
35-
DBI Text::Soundex Scalar::Util::Numeric
35+
DBI Text::Soundex Scalar::Util::Numeric Tie::File POSIX Storable
3636
```
3737
5) Install funannotation via homebrew
3838
```
@@ -49,12 +49,12 @@ brew install freetype
4949
6) Get RepBase data and reconfigure RepeatMasker/RepeatModeler. Register for [RepBase](http://www.girinst.org/repbase/)
5050
```
5151
#download RepeatMasker libraries and install
52-
wget --user name --password pass http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20150807.tar.gz
53-
tar zxvf repeatmaskerlibraries-20150807.tar.gz -C /usr/local/opt/repeatmasker/libexec
52+
wget --user name --password pass http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20160829.tar.gz
53+
tar zxvf repeatmaskerlibraries-20160829.tar.gz -C /usr/local/opt/repeatmasker/libexec
5454
5555
#now setup RepeatMasker
56-
cd /usr/local/opt/repeatmasker/libexec
57-
./configure <config.txt
56+
cd /usr/local/Cellar/repeatmasker/4.0.5/libexec
57+
./configure
5858
5959
#softlink GFF script to bin in path
6060
ln /usr/local/opt/repeatmasker/libexec/util/rmOutToGFF3.pl /usr/local/bin
@@ -90,6 +90,22 @@ cd /usr/local/opt/funannotate/libexec
9090
#run setup script, might need sudo here
9191
./setup.sh
9292
```
93+
94+
10) Troubleshooting. There are a number of installation problems with a lot of these software packages that really bother me. One common problem is that many of the programs written in perl ship with a shebang line of `#!/usr/bin/perl` - this can cause lots of problems if you are not using the system perl (which many people do to avoid messing with system perl as it is needed for lots of system maintenance). I like to install perl using homebrew and install modules to this version of Perl, i.e. BioPerl, etc. The better shebang line for portability is `#!/usr/bin/env perl` - which says to use whatever perl is currently in the environment, i.e. your homebrewed perl. The same thing happens in python, i.e. the most portable is `#!/usr/bin/env python` - but that is not always the case. There are several programs here that are by default installed to use system perl - if this is not what you have, you will have to do a little bit of extra work, here is the list of software I currently know has this problem.
95+
1) GeneMark-ES
96+
2) ProteinOrtho5
97+
3) RepeatMasker
98+
4) RepeatModeler
99+
100+
One solution is to manually change the shebang line, for example you can do this in the GeneMark folder as follows:
101+
```
102+
#move into folder
103+
cd /usr/local/gmes_petap
104+
105+
#find all perl files, and change shebang line for each file inplace, using GNU-Sed here, Mac-sed may not work for this
106+
find . -name "*.pl" | xargs gsed -i 's,#!/usr/bin/perl,#!/usr/bin/env perl,g'
107+
```
108+
93109
The script will download and format necessary databases and then check all of the dependencies of funannotate - any tool not properly installed will be flagged by the script.
94110

95111
####Python Dependencies:

lib/library.py

+14
Original file line numberDiff line numberDiff line change
@@ -1779,6 +1779,20 @@ def trainAugustus(AUGUSTUS_BASE, train_species, trainingset, genome, outdir, cpu
17791779
if float(train_results[4]) < 0.50:
17801780
log.info("Accuracy seems low, you can try to improve by passing the --optimize_augustus option.")
17811781

1782+
def checkgoatools(input):
1783+
with open(input, 'rU') as goatools:
1784+
count = -1
1785+
result = False
1786+
headercount = 0
1787+
for line in goatools:
1788+
count += 1
1789+
if line.startswith('GO\tNS'):
1790+
header = line.replace('\n', '')
1791+
headercount = count
1792+
if line.startswith('GO:'):
1793+
result = True
1794+
return (result, headercount)
1795+
17821796
HEADER = '''
17831797
<!DOCTYPE html>
17841798
<html lang="en">

0 commit comments

Comments
 (0)