Skip to content

Latest commit

 

History

History
1123 lines (856 loc) · 33.8 KB

File metadata and controls

1123 lines (856 loc) · 33.8 KB

VEP

Property value
prog_name VEP
publication https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4
citations_num 839 (2019.05.13)
first_release_year 2016?
www https://www.ensembl.org/info/docs/tools/vep/index.html
repo https://github.com/Ensembl/ensembl-vep
docker docker pull ensemblorg/ensembl-vep
lang perl
obtained_from docker install
version 96.3
version_date 2019.06.21
last_ver_check 2019.06.24
activity_main active (release 2019.04+)
activity_dev (next relase) active
issues_github active
requirements_1 ??
requirements_2 ??
documentation https://www.ensembl.org/info/docs/tools/vep/script/index.html
test_data ??
install_1 bar-server
install_1_dir docker
install_2 foo-server
install_2_dir docker

Docker install on bar-server. /DANE_SYNOLOGY/users/darked89/vep_data

## commands
#1 (docker image)
sudo docker pull ensemblorg/ensembl-vep 
#2 (genome -> selected human)
docker run -t -i -v  /DANE_SYNOLOGY/users/darked89/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl
#3 (VEP pluggins)
sudo docker run -t -i -v /DANE_SYNOLOGY/users/darked89/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl -a cfp -s homo_sapiens -y GRCh38 -g all

darked89@bar-server /D/u/d/vep_data> sudo docker run -t -i -v /DANE_SYNOLOGY/users/darked89/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl -a cfp -s homo_sapiens -y GRCh38 -g all

[sudo] password for darked89: 
 - getting list of available cache files

WARNING: It looks like you already have the cache for homo_sapiens GRCh38 (v96) installed.

Delete the folder /opt/vep/.vep/homo_sapiens/96_GRCh38 and re-run INSTALL.pl if you want to re-install
 - skipping homo_sapiens
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.fai
 - downloading Homo_sapiens.GRCh38.dna.toplevel.fa.gz.gzi

The FASTA file should be automatically detected by the VEP when using --cache or --offline.
If it is not, use "--fasta /opt/vep/.vep/homo_sapiens/96_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz"


WARNING: The following plugins have not been found: all
Available plugins: AncestralAllele,Blosum62,CADD,CSN,Carol,Condel,Conservation,Downstream,Draw,ExAC,ExACpLI,FATHMM_MKL,G2P,GO,GeneSplicer,Gwava,LD,LOVD,LoF,LoFtool,LocalID,MPC,MTR,MaxEntScan,NearestGene,Phenotypes,ProteinSeqs,REVEL,SameCodon,SpliceRegion,TSSDistance,dbNSFP,dbscSNV,miRNA

 - installing "AncestralAllele"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/AncestralAllele.pm for details
 - OK

 - installing "Blosum62"
 - add "--plugin Blosum62" to your VEP command to use this plugin
 - OK

 - installing "CADD"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/CADD.pm for details
 - OK

 - installing "CSN"
 - add "--plugin CSN" to your VEP command to use this plugin
 - OK

 - installing "Carol"
 - This plugin requires installation
 - See /opt/vep/.vep/Plugins/Carol.pm for details
 - OK

 - installing "Condel"
 - This plugin requires installation
 - See /opt/vep/.vep/Plugins/Condel.pm for details
 - OK

 - installing "Conservation"
 - add "--plugin Conservation,[options]" to your VEP command to use this plugin
 - OK

 - installing "Downstream"
 - add "--plugin Downstream" to your VEP command to use this plugin
 - OK

 - installing "Draw"
 - This plugin requires installation
 - See /opt/vep/.vep/Plugins/Draw.pm for details
 - OK

 - installing "ExAC"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/ExAC.pm for details
 - OK

 - installing "ExACpLI"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/ExACpLI.pm for details
 - OK

 - installing "FATHMM_MKL"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/FATHMM_MKL.pm for details
 - OK

 - installing "G2P"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/G2P.pm for details
 - OK

 - installing "GO"
 - add "--plugin GO" to your VEP command to use this plugin
 - OK

 - installing "GeneSplicer"
 - This plugin requires installation
 - See /opt/vep/.vep/Plugins/GeneSplicer.pm for details
 - OK

 - installing "Gwava"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/Gwava.pm for details
 - OK

 - installing "LD"
 - add "--plugin LD,[options]" to your VEP command to use this plugin
 - OK

 - installing "LOVD"
 - add "--plugin LOVD" to your VEP command to use this plugin
 - OK

 - installing "LoF"
 - This plugin requires installation
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/LoF.pm for details
 - OK

 - installing "LoFtool"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/LoFtool.pm for details
 - OK

 - installing "LocalID"
 - add "--plugin LocalID" to your VEP command to use this plugin
 - OK

 - installing "MPC"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/MPC.pm for details
 - OK

 - installing "MTR"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/MTR.pm for details
 - OK

 - installing "MaxEntScan"
 - This plugin requires installation
 - See /opt/vep/.vep/Plugins/MaxEntScan.pm for details
 - OK

 - installing "NearestGene"
 - add "--plugin NearestGene" to your VEP command to use this plugin
 - OK

 - installing "Phenotypes"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/Phenotypes.pm for details
 - OK

 - installing "ProteinSeqs"
 - add "--plugin ProteinSeqs" to your VEP command to use this plugin
 - OK

 - installing "REVEL"
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/REVEL.pm for details
 - OK

 - installing "SameCodon"
 - add "--plugin SameCodon" to your VEP command to use this plugin
 - OK

 - installing "SpliceRegion"
 - add "--plugin SpliceRegion" to your VEP command to use this plugin
 - OK

 - installing "TSSDistance"
 - add "--plugin TSSDistance" to your VEP command to use this plugin
 - OK

 - installing "dbNSFP"
 - This plugin requires installation
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/dbNSFP.pm for details
 - OK

 - installing "dbscSNV"
 - This plugin requires installation
 - This plugin requires data
 - See /opt/vep/.vep/Plugins/dbscSNV.pm for details
 - OK

 - installing "miRNA"
 - add "--plugin miRNA" to your VEP command to use this plugin
 - OK

NB: One or more plugins that you have installed will not work without installation or downloading data; see logs above

All done

mounting cache data(?) and running inside of the container

sudo docker run -t -i -v /DANE_SYNOLOGY/users/darked89/vep_data:/opt/vep/.vep:Z ensemblorg/ensembl-vep

Annovar

Property value
prog_name annovar
publication https://academic.oup.com/nar/article/38/16/e164/1749458
citations_num 5157 (2019.05.31)
first_release_year 2010?
www http://annovar.openbioinformatics.org/en/latest/
repo no repo
lang perl
obtained_from autor after registering ( http://download.openbioinformatics.org/annovar_download_form.php)
version 2018.04.16
version_date 2018.04.16
last_ver_check 2019.05.31
activity_main active (release 2018.04+)
activity_dev ??
issues_github no gitgub
requirements_1 ??
requirements_2 ??
documentation http://annovar.openbioinformatics.org/en/latest/
test_data ??
install_1 foo-server
install_1_dir /opt/soft/annovar_20180416

databases to download

  • hg38 refGene
  • hg38 knownGene
  • hg38 ensGene
  • hg38 dbnsfp35a
  • hg38 dbscsnv11
  • hg38 intervar_20180118
Note: newer version exists here: https://github.com/WGLab/InterVar/releases

Version 2.1.2
  • hg38 esp6500siv2_all
  • hg38 exac03
  • hg38 exac03nontcga
  • hg38 exac03nonpsych
  • hg38 gnomad211_exome
  • hg38 gnomad211_genome
  • hg38 kaviar_20150923
Info: newer VCF version exist: http://db.systemsbiology.net/kaviar/
  • hg38 hrcr1
  • hg38 1000g2015aug ??? (6 data sets) ???
  • hg38 mcap
  • hg38 revel
  • hg38 avsnp150
  • hg38 nci60
  • hg38 clinvar_20190305
  • hg38 regsnpintron (from annovar, the web site is here https://regsnps-intron.ccbb.iupui.edu/)
  • hg38?? LoFtool_scores
  • hg38?? LoFtool_scores
wget http://www.openbioinformatics.org/annovar/download/LoFtool_scores.txt.gz
pigz -d LoFtool_scores.txt.gz
  • COSMIC (manual download and preparation)
# besed on the procedure: http://annovar.openbioinformatics.org/en/latest/user-guide/filter/#cosmic-annotations

# web: https://cancer.sanger.ac.uk/cosmic/download (after registration, confirming account then re-logging)
# COSMIC v89, released 15-MAY-19

CosmicCodingMuts.vcf.gz
CosmicNonCodingVariants.vcf.gz
CosmicMutantExport.tsv.gz
CosmicNCV.tsv.gz 


# get the script (! error in the link !):
wget http://www.openbioinformatics.org/annovar/download/prepare_annovar_user.txt
mv prepare_annovar_user.txt prepare_annovar_user.pl
chmod +x prepare_annovar_user.pl

pigz -d Cosmic*gz

# coding cosmic:
../prepare_annovar_user.pl -dbtype cosmic CosmicMutantExport.tsv  -vcf CosmicCodingMuts.vcf > hg38_cosmic89_coding.txt

## messages:
# NOTICE: Finished reading 4787561 mutation ID from the VCF file CosmicCodingMuts.vcf
# NOTICE: Finished reading 3378630 COSMIC records in DB file CosmicMutantExport.tsv
# WARNING: 14027 COSMIC ID from MutantExport file cannot be found in VCF file (this may be normal if the VCF file only contains coding or noncoding variants

# non-coding cosmic:
../prepare_annovar_user.pl -dbtype cosmic CosmicNCV.tsv  -vcf CosmicNonCodingVariants.vcf > hg38_cosmic89_noncoding.txt 

## messages:
# NOTICE: Finished reading 21373936 mutation ID from the VCF file CosmicNonCodingVariants.vcf
# NOTICE: Finished reading 19106570 COSMIC records in DB file CosmicNCV.tsv
# WARNING: 460 COSMIC ID from MutantExport file cannot be found in VCF file (this may be normal if the VCF file only contains coding or noncoding variants

#!/bin/bash

./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 refGene humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 knownGene humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 ensGene humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 dbnsfp35a humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 dbscsnv11 humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 intervar_20180118 humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 esp6500siv2_all humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 esp6500siv2_all humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 exac03nontcga  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 exac03nonpsych  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 gnomad211_exome  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 gnomad211_genome  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 kaviar_20150923  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 hrcr1  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 1000g2015aug  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 mcap  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 revel  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 avsnp150  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 nci60  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 clinvar_20190305  humandb/
./annotate_variation.pl -downdb --webfrom annovar -buildver hg38 regsnpintron  humandb/

MutSigCV

Property value
prog_name mutsig
publication https://www.nature.com/articles/nature12912
citations_num 1667 20190613
first_release_year 2014
www https://software.broadinstitute.org/cancer/cga/mutsig
repo none
lang Matlab
obtained_from https://software.broadinstitute.org/cancer/cga/mutsig_download
version 1.41
version_date ??
last_ver_check 2019.06.13
requirements_1 libncursesw.so.5
requirements_2 ??
documentation http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/MutSigCV
test_data https://software.broadinstitute.org/cancer/cga/mutsig_run (part Example data) !not working!
install_1 manjaro-linux-dk
install_1_dir /home/darked89/proj_soft/mutsigcv_1.41

inputs

  • MAF
One common source for MAF files that have been used in MutSigCV during the algorithm's development 
was the MuTect tool, followed by annotation of its output using Oncotator.  
More information can be found on MuTect here: http://www.broadinstitute.org/cancer/cga/mutect.    
Information about Oncotator can be found here: http://www.broadinstitute.org/oncotator.

## from the Supplement of the paper:
Mutation table
This file contains information about the mutations detected in the sequencing
project. It lists one mutation per row, and the columns (named in the header
row) report several pieces of information for each mutation. The five columns
required by MutSigCV are
• Hugo Symbol = name of the gene that the mutation was in
• Tumor Sample Barcode = name of the patient that the mutation was
in
• categ = number of the category that the mutation was in (categories
must match those in the coverage table)
• is coding = 1 (the mutation in a coding region or splice-site) or 0 (the
mutation is in a noncoding flanking region)
• is silent = 1 (the mutation is a synonymous change) or 0 (the mutation is a coding change or is noncoding)
For the specific data file used in the present manuscript, the category
numbers in categ are
1. transition mutations at CpG dinucleotides
2. transversion mutations at CpG dinucleotides
3. transition mutations at C:G basepairs not in CpG dinucleotides
4. transversion mutations at C:G basepairs not in CpG dinucleotides
5. transition mutations at A:T basepairs
6. transversion mutations at A:T basepairs
7. null+indel mutations, including nonsense, splice-site, and indel mut



  • coverage file
Coverage file
This file contains information about the sequencing coverage achieved for each gene and patient/tumor. Within each gene-patient bin, the coverage is broken down further according to the mutation category (e.g., A:T basepairs, C:G basepairs), and also according to the effect (silent/nonsilent/noncoding). This tab-delimited file can be produced by processing the sample-level coverage files in WIG (wiggle) format output by the MuTect tool.  More information on MuTect can be found here: http://www.broadinstitute.org/cancer/cga/mutect.  If detailed coverage information is not available, the user can use a “full coverage” file that is available on the GenePattern server.

The columns of the file are:

gene: name of the gene for which this line reports coverage (corresponds to the MAF file's Hugo_Symbol)
effect: silent, nonsilent, or noncoding
categ: number of the category that this line reports coverage for (must match the categ in the mutation table)
<patient_name_1>: number of covered bases for this gene, effect, and category
<patient_name_2>: number of covered bases for this gene, effect, and category
<patient_name_ ...>: number of covered bases for this gene, effect, and category

## from the supplement:

Coverage table
This file contains information about the sequencing coverage achieved for
each gene and patient. Within each gene-patient bin, the coverage is broken
down further according to the category (e.g. A:T basepairs, C:G basepairs),
and also according to the zone (silent/nonsilent/noncoding). The columns
of the file are
• gene = name of the gene that this line reports coverage for
• zone = either silent, nonsilent, or noncoding
• categ = number of the category that this line reports coverage for
(must match the categories in the mutation table)
• PATIENT1 NAME = number of covered bases for PATIENT1 in this
gene, zone, and category
• PATIENT2 NAME = number of covered bases for PATIENT2 in this
gene, zone, and category
• ...
• PATIENTnp NAME = number of covered bases for PATIENTnp in
this gene, zone, and category
Note, covered bases will typically contribute fractionally to more than one
zone depending on the consequences of mutating to each of three different
possible alternate bases. For example, a particular covered C base may count
2
3 toward the nonsilent zone and 1
3 toward the silent zone, if mutation
to A or G causes an amino acid change whereas mutation to T is silent
(synonymous).

  • Covariate file
Covariate file
This file contains the genomic covariate data for each gene, for example, expression levels and DNA replication times, that will be used in MutSigCV to judge which genes are close to each other in mathematical "covariate space."

In general, the columns of this file are:

gene: name of the gene for which this line reports coverage
COVARIATE1 NAME: value of COVARIATE1 for this gene
COVARIATE2 NAME: value of COVARIATE2 for this gene
COVARIATEnv NAME: value of COVARIATEnv for this gene
For the specific data file supplied in GenePattern, the columns are:

gene: name of the gene for which this line reports coverage
expr: expression level of this gene, averaged across 91 cell lines in the Cancer Cell Line Encylcopedia (CCLE)
reptime: DNA replication time of this gene (measured in HeLa cells), ranging from 100 (very early) to 1000 (very late)
hic: chromatin state of this gene (measured from HiC experments in K562 cells) ranging approximately from -50 (very closed) to +50 (very open)

## from the Supplement

This file contains the genomic covariate data for each gene, for example
expression levels and DNA replication times, that will be used in MutSigCV
to judge which genes are near to each other in covariate space. In general,
the columns of this file are
• gene = name of the gene that this line reports coverage for
21
W W W. N A T U R E . C O M / N A T U R E | 2 1
SUPPLEMENTARY INFORMATION RESEARCH
• COVARIATE1 NAME = value of COVARIATE1 for this gene
• COVARIATE2 NAME = value of COVARIATE2 for this gene
• ...
• COVARIATEnv NAME = value of COVARIATEnv for this gene
For the specific data file used in the present manuscript, the columns are
• gene = name of the gene that this line reports coverage for
• expr = expression level of this gene, averaged across many cell lines in
the Cancer Cell Line Encylcopedia
• reptime = DNA replication time of this gene, ranging approximately
from 100 (very early) to 1000 (very late)
• hic = chromatin compartment of this gene, measured from HiC experment, ranging approximately from -50 (very closed) to +50 (very
open)
Note, gene and patient names must agree across these three tables. Similarly, the categ category numbers must agree between the mutation table
and the coverage table.

Web module in GenePattern (registration required)

www: https://cloud.genepattern.org/gp/pages/index.jsf

version of Mutsig: 1.3.4

local linux install

# on manjaro-linux-dk

# matlab LD_LIBRARY_PATH
/opt/soft/matlab_R2016a/runtime/v901/runtime/glnxa64:/opt/soft/matlab_R2016a/runtime/v901/bin/glnxa64:/opt/soft/matlab_R2016a/runtime/v901/sys/os/glnxa64:

# running requires unst
ncurses5-compat-libs

# set up env (bash)
export LD_LIBRARY_PATH=/opt/soft/matlab_R2016a/runtime/v901/runtime/glnxa64:/opt/soft/matlab_R2016a/runtime/v901/bin/glnxa64:/opt/soft/matlab_R2016a/runtime/v901/sys/os/glnxa64:/opt/soft/matlab_R2016a/runtime/v901/sys/opengl/lib/glnxa64

# download example data and unzip
cd data

https://software.broadinstitute.org/cancer/cga/sites/default/files/data/tools/mutsig/MutSigCV_example_data.1.0.1.zip

unzip MutSigCV_example_data.1.0.1.zip

# run it:
../../MutSigCV  LUSC.mutations.maf LUSC.coverage.txt gene.covariates.txt  output.txt

======================================
  MutSigCV
  v1.4

  (c) Mike Lawrence and Gaddy Getz
  Broad Institute of MIT and Harvard
======================================


MutSigCV: PREPROCESS
--------------------
Loading mutation_file...
Loading coverage file...
Processing mutation "effect"...
NOTE:  This version now ignores "is_coding" and "is_silent".
       Requires Variant_Classification/type column and mutation_type_dictionary so we can assign nulls.
Error using MutSigCV>MutSig_preprocess (line 291)
missing mutation_type_dictionary_file

Error in MutSigCV (line 184)

# missing mutation_type_dictionary_file.txt
# discovered putative mutation_type_dictionary file:

wget https://raw.githubusercontent.com/tgen/CovGen/master/snpEff_ANN_mutation_type_dictionary_file.txt
mv -i snpEff_ANN_mutation_type_dictionary_file.txt mutation_type_dictionary_file.txt

# missing chromosome files
wget http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/mutsig/reference_files/chr_files_hg19.zip

unzip chr_files_hg19.zip
ln -s chr_files_hg19 chr_files

# wrong mutation_names in the LUSC.mutations.maf
Frame_Shift_Ins   => inframe_insertion
Frame_Shift_Del   => inframe_deletion
Intron            => intron_variant
Silent            => synonymous_variant
Missense_Mutation => missense_variant
3'UTR             => 3_prime_UTR_variant #6
5'UTR             => 5_prime_UTR_variant #7
# the types left:

5'Flank 
Nonsense_Mutation
Nonstop_Mutation
RNA
Splice_Site
Translation_Start_Site
Variant_Classification


# running again: (with 3_prime_UTR_variant #6 corrections)
./MutSigCV data/LUSC.maf data/LUSC.coverage.txt data/gene.covariates.txt ./new_lusk

<snip>
MutSigCV: PREPROCESS
--------------------
Loading mutation_file...
Loading coverage file...
Processing mutation "effect"...
NOTE:  This version now ignores "is_coding" and "is_silent".
       Requires Variant_Classification/type column and mutation_type_dictionary so we can assign nulls.
WARNING:  8286/137343 mutations could not be mapped to effect using mutation_type_dictionary_file:

                       RNA: [33]
          Nonstop_Mutation: [58]
    Translation_Start_Site: [103]
                   5'Flank: [1255]
               Splice_Site: [1307]
                     5'UTR: [1634]
         Nonsense_Mutation: [3896]
                 ----TOTAL: [8286]
          They will be removed from the bar-server.
<snip>
MutSigCV: RUN
-------------
Loading mutation_file...
NOTE:  Both "gene" and "Hugo_Symbol" are present in mutation_file.  Using "gene".
NOTE:  Both "patient" and "Tumor_Sample_Barcode" are present in mutation_file.  Using "patient".
Loading coverage file...
Loading covariate file...
NOTE:  Trimming "-Tumor" from patient names.
NOTE:  Converting "-" to "_" in patient names.
Building n and N tables...
Processing covariates...
Finding bagels...  1000/18267 2000/18267 3000/18267 4000/18267 5000/18267 6000/18267 7000/18267 8000/18267 9000/18267 10000/18267 11000/18267 12000/18267 13000/18267 14000/18267 15000/18267 16000/18267 17000/18267 18000/18267 
Expanding to (x,X)_gcp...
Calculating p-value using 2D Projection method...  1000/18267 2000/18267 3000/18267 4000/18267 5000/18267 6000/18267 7000/18267 8000/18267 9000/18267 10000/18267 11000/18267 12000/18267 13000/18267 14000/18267 15000/18267 16000/18267 17000/18267 18000/18267 
Done.  Wrote results to ./new_lusk.sig_genes.txt

# running with 5_prime_UTR_variant #7:

./MutSigCV data/LUSC.maf data/LUSC.coverage.txt data/gene.covariates.txt ./7_muttype_corrected

# WARNING:  6652/137343 mutations could not be mapped

checking if it is possible to run the Matlab code using Octave

# manjaro linux
# GNU Octave, version 5.1.0

In Octave GUI:
MutSigCV('data/LUSC.maf','data/LUSC.coverage.txt','data/gene.covariates.txt','LUSC.output_dk01.txt')

# warning: the 'verLessThan' function is not yet implemented in Octave

# fixing it (hack):
%%% dk 20190614a
function v = verLessThan()
    v = true;
end
%%% dk 20190614a end

# running again:
MutSigCV('data/LUSC.maf','data/LUSC.coverage.txt','data/gene.covariates.txt','LUSC.output_dk01.txt')

# error: 'fields' undefined near line 2103 column 11
# error: called from
#    MutSigCV>slength at line 2103 column 9
#    MutSigCV>MutSig_preprocess at line 303 column 7
#    MutSigCV at line 184 column 3

# checking if octave_python may be used to make more sense of the errors
# in Octave GUI Command window:

pkg install https://gitlab.com/mtmiller/octave-pythonic/-/archive/v0.0.1/octave-pythonic-v0.0.1.tar.gz
pkg load pythonic

# status:
# while simple things like py.print(some_variable) do work, the pythonic way of printing i.e all used parameters 
# is not working when using some_variable from Octave/Matlab out of the box. 
# abandoned for time being 20190617

generating inputs for MutSigCV

  • oncotator generates a file
test_Patient0.snp.maf.txt

#9 column values counts:

     15 3'UTR
      7 5'Flank
      7 5'UTR
     16 IGR
     20 Intron
      1 lincRNA
    414 Missense_Mutation
     19 Nonsense_Mutation
      3 RNA
    212 Silent
     15 Splice_Site
      1 Start_Codon_SNP
      1 Variant_Classification

info about preprocessing

# newer MAF files look like being processed by VEP with some effect prediction
Run online VEP at:
https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=UUNNMKIL4SaYItNz-5404366

2 check

Nirvana

NImble and Robust VAriant aNnotAtor

Property value
prog_name nirvana
publication https://www.pnas.org/content/113/50/14330
citations_num 99 (2019.06.17)
first_release_year 2016?
www ??
repo https://github.com/Illumina/Nirvana
docker ??
lang C#
obtained_from https://github.com/Illumina/Nirvana/archive/v2.0.9.tar.gz
version 2.0.9
version_date 2018.04.26
last_ver_check 2019.06.18
activity_main on hold last year
activity_dev on hold last year
issues_github active
requirements_1 Microsoft .Net runtime (see below)
requirements_2 ??
documentation https://github.com/Illumina/Nirvana/wiki
test_data ??
install_1 foo-server
install_1_dir /mnt/vdb1/soft/nirvana_2.0.9
status test script passed

TestNirvana.sh

https://github.com/Illumina/Nirvana

to get .NET runtime

wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.asc.gpg
sudo mv microsoft.asc.gpg /etc/apt/trusted.gpg.d/
wget -q https://packages.microsoft.com/config/debian/9/prod.list
sudo mv prod.list /etc/apt/sources.list.d/microsoft-prod.list
sudo chown root:root /etc/apt/trusted.gpg.d/microsoft.asc.gpg
sudo chown root:root /etc/apt/sources.list.d/microsoft-prod.list

# installation 
sudo apt-get install apt-transport-https

sudo apt-get update

# this is not needed:
# sudo apt-get install aspnetcore-runtime-2.2

#use:
sudo apt-get install dotnet-runtime-2.2

install from source

https://github.com/Illumina/Nirvana/archive/v2.0.9.tar.gz
tar xfvz v2.0.9.tar.gz
mv -i Nirvana-2.0.9/ nirvana_2.0.9/

##  testing (edit script)
https://raw.githubusercontent.com/wiki/Illumina/Nirvana/scripts/TestNirvana.sh

databases listed

  • dbSNP
  • 1000 Genomes Project
  • EVS
  • ExAC
  • phyloP conservation score
  • ClinVar
  • COSMIC
  • DGV

Multiple other annotations included (see the wiki page)

running it

input: VCF

output: json

2020plus

  • for cancer
Property value
prog_name 2020plus
publication https://www.pnas.org/content/113/50/14330
citations_num 99 (2019.06.17)
first_release_year 2016?
www ??
repo https://github.com/KarchinLab/2020plus
docker ??
lang Python
obtained_from https://github.com/KarchinLab/2020plus/archive/v1.2.2.tar.gz
version 1.2.3 (version num problem)
version_date 2019.05.30?
last_ver_check 2019.06.17
activity_main active
activity_dev active
issues_github active
requirements_1 conda
requirements_2 ??
documentation https://2020plus.readthedocs.io/en/latest/
test_data ??
install_1 foo-server
install_1_dir /opt/soft/2020plus_1.2.3/
status not tested

TUSON

https://www.ncbi.nlm.nih.gov/pubmed/24183448

Speedseq??

https://github.com/hall-lab/speedseq

oncotator

  • for MutSigCv pipeline
lftp gsapubftp-anonymous@ftp.broadinstitute.org/bundle/oncotator/
# empty passwd, press enter
get oncotator_v1_ds_April052016.tar.gz
# 15GB download, ~2hrs on foo-server 

wget https://github.com/broadinstitute/oncotator/archive/v1.9.9.0.tar.gz
tar xfvz v1.9.9.0.tar.gz
mv -i oncotator-1.9.9.0/ /opt/soft/oncotator_1.9.9.0/
cd /opt/soft/oncotator_1.9.9.0/
wget https://personal.broadinstitute.org/lichtens/oncobeta/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt 

# Oncotator requires Python 2.7.x; 
# DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020.



sudo pip install virtualenv
bash scripts/create_oncotator_venv.sh -e /opt/soft/oncotator_1.9.9.0/ 
source ./bin/activate.fish

python setup.py install

# checking if it works

oncotator -h

#success!


running tests

bash run_ci_tests_no_activate.sh  MY_DB_DIR/

# problems with bigWig (to be ignored since we do not use them)

# problems with COSMIC indexing and retrieval (4 solving)

vcfanno

Property value
prog_name vcfanno
publication https://www.ncbi.nlm.nih.gov/pubmed/19505943
citations_num 18873 (2019.05.07)
first_release_year 2009?
www http://www.htslib.org/
repo https://github.com/brentp/vcfanno
lang Go
obtained_from https://github.com/brentp/vcfanno/releases/download/v0.3.1/vcfanno_linux64
version 0.3.1
version_date 2018.10.29
last_ver_check 2019.06.17
requirements_1 ?? Lua ??
install_1 foo-server
install_1_dir /opt/soft/vcfanno_0.3.1
  • primary use:

vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files.

  • status: not tested

gemini

Property value
prog_name gemini
publication https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003153
citations_num 211 (2019.06.24)
first_release_year 2013
www https://gemini.readthedocs.io/en/latest/
repo https://github.com/arq5x/gemini
lang python
obtained_from https://github.com/arq5x/gemini
version 0.30.1
version_date 2019.02.07
last_ver_check 2019.06.24
requirements_1 python
install_1 foo-server
install_1_dir /opt/soft/gemini/

installation

# caveat: few links/mv missing
mkdir gemini_20190618
cd gemini_20190618/

wget https://github.com/arq5x/gemini/raw/master/gemini/scripts/gemini_install.py
python gemini_install.py /opt/soft/gemini/  /opt/soft/gemini/
PATH=$tools/bin:$data/anaconda/bin:$PATH

usage

  • hg19 only
  • python2

CVE in R

https://bioconductor.org/packages/release/bioc/vignettes/CVE/inst/doc/CVE_tutorial.html

annotate VCF with coverage if not already done

crossmap -> lift the annotations between genomes

https://sourceforge.net/projects/crossmap/files/

AnnotSV

Property value
prog_name AnnotSV
publication
citations_num
first_release_year
www https://lbgi.fr/AnnotSV/
repo https://github.com/lgmgeo/AnnotSV
docker
lang Tcl
obtained_from
version 2.2
version_date 2019/07/09
last_ver_check 2019/08/25
activity_main
activity_dev (next relase)
issues_github
requirements_1 ??
requirements_2 ??
documentation
test_data ??
install_1 bar-server
install_1_dir /opt/soft/AnnotSV_2.2

Test installation commands:

## commands
export ANNOTSV=/opt/soft/AnnotSV_2.2
/opt/soft/AnnotSV_2.2/bin/AnnotSV/AnnotSV.tcl -SVinputFile /opt/soft/AnnotSV_2.2/share/example/HG00096.SV.bed -vcfFiles "/opt/soft/AnnotSV_2.2/share/example/input/HG00096.chr*phase3.vcf" -outputFile ./HG00096.SV.annotated.tsv


Genome STRiP

Property value
prog_name Genome STRiP (svtoolkit) > SVAnnotator
publication
citations_num
first_release_year
www http://software.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_main_SVAnnotator.html
repo
docker
lang Java
obtained_from
version 2.00.1918
version_date 2019/05/27
last_ver_check 2019/08/25
activity_main
activity_dev (next relase)
issues_github
requirements_1 ??
requirements_2 ??
documentation
test_data ??
install_1 bar-server
install_1_dir /opt/soft/svtoolkit_2.00.1918

Test installation commands:

## commands
export SV_DIR=/opt/soft/svtoolkit_current
export JAVA_HOME=/opt/soft/graalvm-ee_current
export PATH=/opt/soft/graalvm-ee_current/bin:$PATH
/opt/soft/svtoolkit_current/installtest/discovery.sh
/opt/soft/svtoolkit_current/installtest/genotyping.sh

SURVIVOR_ant

Property value
prog_name SURVIVOR_ant (svcompare)
publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668921/
citations_num
first_release_year
www https://github.com/NCBI-Hackathons/svcompare
repo https://github.com/NCBI-Hackathons/svcompare
docker
lang Perl, R
obtained_from
version
version_date 2019/08/20
last_ver_check 2019/08/25
activity_main
activity_dev (next relase)
issues_github
requirements_1 ??
requirements_2 ??
documentation
test_data ??
install_1 bar-server
install_1_dir /opt/soft/svcompare

Test installation commands:

## commands
/opt/soft/svcompare/scripts/bar-server//run_all.sh