Skip to content

Commit f35d825

Browse files
committed
automatic decomposition of multiallelics
1 parent 8bc41f1 commit f35d825

26 files changed

+230
-251
lines changed

README.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,23 @@
22

33
### Overview
44

5-
The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package intended for analysis and clinical interpretation of individual cancer genomes. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces HTML reports that can be navigated by clinical oncologists (Figure 1).
5+
The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces interactive HTML reports intended for clinical interpretation (Figure 1).
66

77
![PCGR overview](PCGR_workflow.png)
88

99
### Example reports
10-
* <a href="http://folk.uio.no/sigven/tumor_sample.COAD.pcgr.html" target="_blank">View an example report for a colorectal tumor sample (TCGA)</a>
11-
* <a href="http://folk.uio.no/sigven/tumor_sample.BRCA.pcgr.html" target="_blank">View an example report for a breast tumor sample (TCGA)</a>
10+
* <a href="http://folk.uio.no/sigven/tumor_sample.COAD.pcgr.html" target="_blank">Report for a colorectal tumor sample (TCGA)</a>
11+
* <a href="http://folk.uio.no/sigven/tumor_sample.BRCA.pcgr.html" target="_blank">Report for a breast tumor sample (TCGA)</a>
1212

1313
### PCGR documentation
1414

1515
[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)
1616

17+
If you use PCGR, please cite our paper:
1718

18-
### Annotation resources included in PCGR (v0.3.2)
19+
Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and Eivind Hovig. __Personal Cancer Genome Reporter: Variant Interpretation Report For Precision Oncology__ (2017). bioRxiv. doi:[10.1101/122366](https://doi.org/10.1101/122366)
20+
21+
### Annotation resources included in PCGR (v0.3.3)
1922

2023
* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
2124
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
@@ -53,16 +56,16 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/
5356

5457
#### STEP 2: Download PCGR
5558

56-
<font color="red"><b>April 19th 2017</b>: New release (0.3.2)</font>
59+
<font color="red"><b>April 20th 2017</b>: New release (0.3.3)</font>
5760

58-
1. Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest)
61+
1. Download and unpack the [latest release (0.3.3)](https://github.com/sigven/pcgr/releases/latest)
5962
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
60-
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.2`)
63+
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.3`)
6164
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`
6265

6366
A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
64-
3. Pull the [PCGR Docker image (0.3.2)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
65-
* `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine)
67+
3. Pull the [PCGR Docker image (0.3.3)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.2Gb):
68+
* `docker pull sigven/pcgr:0.3.3` (PCGR annotation engine)
6669

6770
#### STEP 3: Input preprocessing
6871

@@ -73,12 +76,9 @@ The PCGR workflow accepts two types of input files:
7376

7477
PCGR can be run with either or both of the two input files present.
7578

76-
The following requirements __MUST__ be met by the input VCF for PCGR to work properly:
79+
* We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html)
80+
* If the input VCF contains multi-allelic sites, these will be subject to [decomposition](http://genome.sph.umich.edu/wiki/Vt#Decompose)
7781

78-
1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release
79-
2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort).
80-
* We strongly recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html)
81-
* 'chr' must be stripped from the chromosome names
8282

8383
The tab-separated values file with copy number aberrations __MUST__ contain the following four columns:
8484
* Chromosome
@@ -112,7 +112,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t
112112

113113
positional arguments:
114114
pcgr_dir PCGR base directory with accompanying data directory,
115-
e.g. ~/pcgr-0.3.2
115+
e.g. ~/pcgr-0.3.3
116116
output_dir Output directory
117117
sample_id Tumor sample/cancer genome identifier - prefix for
118118
output files
@@ -146,7 +146,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t
146146

147147
The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:
148148

149-
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`
149+
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD`
150150

151151
This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:
152152

@@ -157,3 +157,7 @@ This command will run the Docker-based PCGR workflow and produce the following o
157157
5. __tumor_sample.COAD.pcgr.mutational_signatures.tsv__ - Tab-separated values file with estimated contributions by known mutational signatures and associated underlying etiologies
158158
6. __tumor_sample.COAD.pcgr.snvs_indels.biomarkers.tsv__ - Tab-separated values file with clinical evidence items associated with biomarkers for diagnosis, prognosis or drug sensitivity/resistance
159159
7. __tumor_sample.COAD.pcgr.cna_segments.tsv.gz__ - Tab-separated values file with annotations of gene transcripts that overlap with somatic copy number aberrations
160+
161+
## Contact
162+
163+

docs/_build/doctrees/about.doctree

3.02 KB
Binary file not shown.
751 Bytes
Binary file not shown.
0 Bytes
Binary file not shown.
-3.54 KB
Binary file not shown.

docs/_build/html/_sources/about.rst.txt

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,15 @@ What is the Personal Cancer Genome Reporter (PCGR)?
55
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66

77
The Personal Cancer Genome Reporter (PCGR) is a stand-alone software
8-
package intended for analysis and clinical interpretation of individual
9-
cancer genomes. It interprets both somatic SNVs/InDels and copy number
10-
aberrations. The software extends basic gene and variant annotations
11-
from the `Ensembl’s Variant Effect Predictor
8+
package for functional annotation and translation of individual cancer
9+
genomes for precision oncology. It interprets both somatic SNVs/InDels
10+
and copy number aberrations. The software extends basic gene and variant
11+
annotations from the `Ensembl’s Variant Effect Predictor
1212
(VEP) <http://www.ensembl.org/info/docs/tools/vep/index.html>`__ with
1313
oncology-relevant, up-to-date annotations retrieved flexibly through
14-
`vcfanno <https://github.com/brentp/vcfanno>`__, and produces HTML
15-
reports that can be navigated by clinical oncologists (Figure 1).
14+
`vcfanno <https://github.com/brentp/vcfanno>`__, and produces
15+
interactive HTML reports intended for clinical interpretation (Figure
16+
1).
1617

1718
.. figure:: PCGR_workflow.png
1819
:alt:
@@ -22,6 +23,12 @@ affiliated with the `Norwegian Cancer Genomics
2223
Consortium <http://cancergenomics.no>`__, at the `Institute for Cancer
2324
Research/Oslo University Hospital <http://radium.no>`__.
2425

26+
Example reports
27+
^^^^^^^^^^^^^^^
28+
29+
- Report for a colorectal tumor sample (TCGA)
30+
- Report for a breast tumor sample (TCGA)
31+
2532
Why use PCGR?
2633
~~~~~~~~~~~~~
2734

@@ -37,6 +44,13 @@ and variant level. The application generates a tiered report that will
3744
aid the interpretation of individual cancer genomes in a clinical
3845
setting.
3946

47+
If you use PCGR, please cite our paper:
48+
49+
Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and
50+
Eivind Hovig. **Personal Cancer Genome Reporter: Variant Interpretation
51+
Report For Precision Oncology** (2017). bioRxiv.
52+
doi:\ `10.1101/122366 <https://doi.org/10.1101/122366>`__
53+
4054
Docker-based technology
4155
~~~~~~~~~~~~~~~~~~~~~~~
4256

@@ -50,3 +64,8 @@ for precision oncology <annotation_resources.html>`__.
5064

5165
.. figure:: docker-logo50.png
5266
:alt:
67+
68+
Contact
69+
~~~~~~~
70+
71+

docs/_build/html/_sources/getting_started.rst.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,18 +42,18 @@ terminal window.
4242
Download PCGR
4343
^^^^^^^^^^^^^
4444

45-
**April 19th 2017**: New release (0.3.2)
45+
**April 20th 2017**: New release (0.3.3)
4646

4747
- Download and unpack the `latest release
48-
(0.3.2) <https://github.com/sigven/pcgr/releases/latest>`__
48+
(0.3.3) <https://github.com/sigven/pcgr/releases/latest>`__
4949

5050
- Download and unpack the data bundle (approx. 17Gb) in the PCGR
5151
directory
5252

5353
- Download `the latest data
5454
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
5555
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
56-
version number, e.g. ``~/pcgr-0.3.2``)
56+
version number, e.g. ``~/pcgr-0.3.3``)
5757
- Decompress and untar the bundle, e.g. through the following Unix
5858
command:
5959
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
@@ -62,10 +62,10 @@ Download PCGR
6262
have been produced
6363

6464
- Pull the `PCGR Docker image -
65-
0.3.2 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
66-
(3.1Gb) :
65+
0.3.3 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
66+
(3.2Gb) :
6767

68-
- ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine)
68+
- ``docker pull sigven/pcgr:0.3.3`` (PCGR annotation engine)
6969

7070
Run test - generation of clinical report for a cancer genome
7171
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script
8989

9090
positional arguments:
9191
pcgr_dir PCGR base directory with accompanying data directory,
92-
e.g. ~/pcgr-0.3.2
92+
e.g. ~/pcgr-0.3.3
9393
output_dir Output directory
9494
sample_id Tumor sample/cancer genome identifier - prefix for
9595
output files
@@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be
125125
generated by running the following command in your terminal window:
126126

127127
``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
128-
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD``
128+
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD``
129129

130130
This command will run the Docker-based PCGR workflow and produce the
131131
following output files in the *examples* folder:

docs/_build/html/_sources/output.rst.txt

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,23 +18,11 @@ currently supported.
1818
VCF
1919
^^^
2020

21-
The following requirements **MUST** be met by the input VCF for PCGR to
22-
work properly:
23-
24-
1. Variants in the raw VCF that contain multiple alternative alleles
25-
(e.g. "multiple ALTs") must be split into variants with a single
26-
alternative allele. This can be done with the help of either `vt
27-
decompose <http://genome.sph.umich.edu/wiki/Vt#Decompose>`__ or
28-
`vcflib's vcfbreakmulti <https://github.com/vcflib/vcflib#vcflib>`__.
29-
We will add integrated support for this in an upcoming release
30-
2. The contents of the VCF must be sorted correctly (i.e. according to
31-
chromosomal order and chromosomal position). This can be obtained by
32-
`vcftools <https://vcftools.github.io/perl_module.html#vcf-sort>`__.
33-
34-
- We **strongly** recommend that the input VCF is compressed and
35-
indexed using `bgzip <http://www.htslib.org/doc/tabix.html>`__ and
36-
`tabix <http://www.htslib.org/doc/tabix.html>`__
37-
- 'chr' must be stripped from the chromosome names
21+
- We **strongly** recommend that the input VCF is compressed and
22+
indexed using `bgzip <http://www.htslib.org/doc/tabix.html>`__ and
23+
`tabix <http://www.htslib.org/doc/tabix.html>`__
24+
- If the input VCF contains multi-allelic sites, these will be subject
25+
to `decomposition <http://genome.sph.umich.edu/wiki/Vt#Decompose>`__
3826

3927
**IMPORTANT NOTE 1**: Considering the VCF output for the `numerous
4028
somatic SNV/InDel callers <https://www.biostars.org/p/19104/>`__ that

docs/_build/html/about.html

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,13 @@
8989
<p class="caption"><span class="caption-text">Table of Contents</span></p>
9090
<ul class="current">
9191
<li class="toctree-l1 current"><a class="current reference internal" href="#">About</a><ul>
92-
<li class="toctree-l2"><a class="reference internal" href="#what-is-the-personal-cancer-genome-reporter-pcgr">What is the Personal Cancer Genome Reporter (PCGR)?</a></li>
92+
<li class="toctree-l2"><a class="reference internal" href="#what-is-the-personal-cancer-genome-reporter-pcgr">What is the Personal Cancer Genome Reporter (PCGR)?</a><ul>
93+
<li class="toctree-l3"><a class="reference internal" href="#example-reports">Example reports</a></li>
94+
</ul>
95+
</li>
9396
<li class="toctree-l2"><a class="reference internal" href="#why-use-pcgr">Why use PCGR?</a></li>
9497
<li class="toctree-l2"><a class="reference internal" href="#docker-based-technology">Docker-based technology</a></li>
98+
<li class="toctree-l2"><a class="reference internal" href="#contact">Contact</a></li>
9599
</ul>
96100
</li>
97101
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
@@ -147,21 +151,29 @@ <h1>About<a class="headerlink" href="#about" title="Permalink to this headline">
147151
<div class="section" id="what-is-the-personal-cancer-genome-reporter-pcgr">
148152
<h2>What is the Personal Cancer Genome Reporter (PCGR)?<a class="headerlink" href="#what-is-the-personal-cancer-genome-reporter-pcgr" title="Permalink to this headline"></a></h2>
149153
<p>The Personal Cancer Genome Reporter (PCGR) is a stand-alone software
150-
package intended for analysis and clinical interpretation of individual
151-
cancer genomes. It interprets both somatic SNVs/InDels and copy number
152-
aberrations. The software extends basic gene and variant annotations
153-
from the <a class="reference external" href="http://www.ensembl.org/info/docs/tools/vep/index.html">Ensembl’s Variant Effect Predictor
154+
package for functional annotation and translation of individual cancer
155+
genomes for precision oncology. It interprets both somatic SNVs/InDels
156+
and copy number aberrations. The software extends basic gene and variant
157+
annotations from the <a class="reference external" href="http://www.ensembl.org/info/docs/tools/vep/index.html">Ensembl’s Variant Effect Predictor
154158
(VEP)</a> with
155159
oncology-relevant, up-to-date annotations retrieved flexibly through
156-
<a class="reference external" href="https://github.com/brentp/vcfanno">vcfanno</a>, and produces HTML
157-
reports that can be navigated by clinical oncologists (Figure 1).</p>
160+
<a class="reference external" href="https://github.com/brentp/vcfanno">vcfanno</a>, and produces
161+
interactive HTML reports intended for clinical interpretation (Figure
162+
1).</p>
158163
<div class="figure">
159164
<img alt="" src="_images/PCGR_workflow.png" />
160165
</div>
161166
<p>The Personal Cancer Genome Reporter has been developed by scientists
162167
affiliated with the <a class="reference external" href="http://cancergenomics.no">Norwegian Cancer Genomics
163168
Consortium</a>, at the <a class="reference external" href="http://radium.no">Institute for Cancer
164169
Research/Oslo University Hospital</a>.</p>
170+
<div class="section" id="example-reports">
171+
<h3>Example reports<a class="headerlink" href="#example-reports" title="Permalink to this headline"></a></h3>
172+
<ul class="simple">
173+
<li>Report for a colorectal tumor sample (TCGA)</li>
174+
<li>Report for a breast tumor sample (TCGA)</li>
175+
</ul>
176+
</div>
165177
</div>
166178
<div class="section" id="why-use-pcgr">
167179
<h2>Why use PCGR?<a class="headerlink" href="#why-use-pcgr" title="Permalink to this headline"></a></h2>
@@ -176,6 +188,11 @@ <h2>Why use PCGR?<a class="headerlink" href="#why-use-pcgr" title="Permalink to
176188
and variant level. The application generates a tiered report that will
177189
aid the interpretation of individual cancer genomes in a clinical
178190
setting.</p>
191+
<p>If you use PCGR, please cite our paper:</p>
192+
<p>Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and
193+
Eivind Hovig. <strong>Personal Cancer Genome Reporter: Variant Interpretation
194+
Report For Precision Oncology</strong> (2017). bioRxiv.
195+
doi:<a class="reference external" href="https://doi.org/10.1101/122366">10.1101/122366</a></p>
179196
</div>
180197
<div class="section" id="docker-based-technology">
181198
<h2>Docker-based technology<a class="headerlink" href="#docker-based-technology" title="Permalink to this headline"></a></h2>
@@ -190,6 +207,10 @@ <h2>Docker-based technology<a class="headerlink" href="#docker-based-technology"
190207
<img alt="" src="_images/docker-logo50.png" />
191208
</div>
192209
</div>
210+
<div class="section" id="contact">
211+
<h2>Contact<a class="headerlink" href="#contact" title="Permalink to this headline"></a></h2>
212+
<p><a class="reference external" href="mailto:sigven&#37;&#52;&#48;ifi&#46;uio&#46;no">sigven<span>&#64;</span>ifi<span>&#46;</span>uio<span>&#46;</span>no</a></p>
213+
</div>
193214
</div>
194215

195216

0 commit comments

Comments
 (0)