Skip to content

Commit 8d70d8d

Browse files
committed
v0.3
1 parent 2153f7c commit 8d70d8d

27 files changed

+915
-49364
lines changed

README.md

Lines changed: 53 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package int
1515
[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)
1616

1717

18-
### Annotation resources included in PCGR (v0.2)
18+
### Annotation resources included in PCGR (v0.3)
1919

2020
* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
2121
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
@@ -53,14 +53,15 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/
5353

5454
#### STEP 2: Download PCGR
5555

56-
1. Download and unpack the [latest release](https://github.com/sigven/pcgr/releases/latest)
56+
<font color="red"><b>April 12th 2017</b>: New release (v0.3)</font>
57+
1. Download and unpack the [latest release (v0.3)](https://github.com/sigven/pcgr/releases/latest)
5758
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
58-
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number)
59+
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3`)
5960
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`
6061

6162
A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
62-
3. Pull the [PCGR Docker image](https://hub.docker.com/r/sigven/pcgr/) from DockerHub:
63-
* `docker pull sigven/pcgr` (PCGR annotation engine)
63+
3. Pull the [PCGR Docker image (v0.3)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
64+
* `docker pull sigven/pcgr:0.3` (PCGR annotation engine)
6465

6566
#### STEP 3: Input preprocessing
6667

@@ -94,55 +95,57 @@ Here, _Chromosome_, _Start_, and _End_ denote the chromosomal segment (GRCh37),
9495

9596
#### STEP 4: Run example
9697

97-
A tumor sample report is generated by calling the Python script __run_pcgr.py__ in the PCGR software folder, which takes the following arguments and options:
98-
99-
usage: run_pcgr.py [-h] [--input_vcf INPUT_VCF]
100-
[--input_cna_segments INPUT_CNA_SEGMENTS]
101-
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
102-
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
103-
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
104-
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
105-
pcgr_directory working_directory sample_id
106-
107-
Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
108-
somatic nucleotide variants and copy number aberration segments
109-
110-
positional arguments:
111-
pcgr_directory PCGR base directory
112-
working_directory Working directory - directory with input/output files
113-
sample_id Tumor sample/cancer genome identifier - prefix for
114-
output files
115-
116-
optional arguments:
117-
-h, --help show this help message and exit
118-
--input_vcf INPUT_VCF
119-
VCF input file with somatic query variants
120-
(SNVs/InDels) (default: None)
121-
--input_cna_segments INPUT_CNA_SEGMENTS
122-
Somatic copy number alteration segments (tab-separated
123-
values) (default: None)
124-
--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION
125-
Log(2) ratio treshold for calling copy number
126-
amplifications in HTML report (default: 0.8)
127-
--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION
128-
Log(2) ratio treshold for calling homozygous deletions
129-
in HTML report (default: -0.8)
130-
--num_vcfanno_processes NUM_VCFANNO_PROCESSES
131-
Number of processes used during vcfanno annotation
132-
(default: 4)
133-
--num_vep_forks NUM_VEP_FORKS
134-
Number of forks (--forks option in VEP) used during
135-
VEP annotation (default: 4)
136-
--force_overwrite By default, the script will fail with an error if any
137-
output file already exists. You can force the
138-
overwrite of existing result files by using this flag
139-
(default: False)
140-
98+
A tumor sample report is generated by calling the Python script __pcgr.py__ in the PCGR software folder, which takes the following arguments and options:
99+
100+
usage: pcgr.py [-h] [--input_vcf INPUT_VCF]
101+
[--input_cna_segments INPUT_CNA_SEGMENTS]
102+
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
103+
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
104+
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
105+
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
106+
[--version]
107+
pcgr_dir output_dir sample_id
108+
109+
Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
110+
somatic nucleotide variants and copy number aberration segments
111+
112+
positional arguments:
113+
pcgr_dir PCGR base directory with accompanying data directory,
114+
e.g. ~/pcgr-0.3
115+
output_dir Output directory
116+
sample_id Tumor sample/cancer genome identifier - prefix for
117+
output files
118+
119+
optional arguments:
120+
-h, --help show this help message and exit
121+
--input_vcf INPUT_VCF
122+
VCF input file with somatic query variants
123+
(SNVs/InDels) (default: None)
124+
--input_cna_segments INPUT_CNA_SEGMENTS
125+
Somatic copy number alteration segments (tab-separated
126+
values) (default: None)
127+
--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION
128+
Log(2) ratio treshold for calling copy number
129+
amplifications in HTML report (default: 0.8)
130+
--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION
131+
Log(2) ratio treshold for calling homozygous deletions
132+
in HTML report (default: -0.8)
133+
--num_vcfanno_processes NUM_VCFANNO_PROCESSES
134+
Number of processes used during vcfanno annotation
135+
(default: 4)
136+
--num_vep_forks NUM_VEP_FORKS
137+
Number of forks (--forks option in VEP) used during
138+
VEP annotation (default: 4)
139+
--force_overwrite By default, the script will fail with an error if any
140+
output file already exists. You can force the
141+
overwrite of existing result files by using this flag
142+
(default: False)
143+
--version show program's version number and exit
141144

142145

143146
The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:
144147

145-
`python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD`
148+
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3 ~/pcgr-0.3/examples tumor_sample.COAD`
146149

147150
This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:
148151

1.41 KB
Binary file not shown.
0 Bytes
Binary file not shown.
513 Bytes
Binary file not shown.

docs/_build/html/_sources/annotation_resources.rst.txt

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -92,19 +92,24 @@ corresponds to variants reported with other HGVS nomenclature (e.g.
9292
Other data quality concerns
9393
~~~~~~~~~~~~~~~~~~~~~~~~~~~
9494

95-
**Clinical biomarkers** Clinical biomarkers included in PCGR is limited
96-
to the following: \* Markers reported at the variant level (e.g. **BRAF
97-
p.V600E**) \* Markers reported at the codon level (e.g. **KRAS p.G12**)
98-
\* Markers reported at the exon level (e.g. **KIT exon 11 mutation**) \*
99-
Within CBMDB, only markers collected from FDA/NCCN guidelines,
100-
scientific literature and clinical trials are included (markers
101-
collected from conference abstracts are not included)
102-
103-
**COSMIC variants** The COSMIC dataset that is part of the PCGR
104-
annotation bundle is the subset of variants that satisfy the following
105-
criteria: \* **Mutation somatic status** is either
106-
'*confirmed\_somatic*' or
107-
'*reported\_in\_another\_cancer\_sample\_as\_somatic*'. \*
108-
**Site/histology** must be known and the sample must come from a
109-
malignant tumor (i.e. not polyps/adenomas, which are also found in
110-
COSMIC)
95+
**Clinical biomarkers**
96+
97+
Clinical biomarkers included in PCGR is limited to the following:
98+
99+
- Markers reported at the variant level (e.g. **BRAF p.V600E**)
100+
- Markers reported at the codon level (e.g. **KRAS p.G12**)
101+
- Markers reported at the exon level (e.g. **KIT exon 11 mutation**)
102+
- Within CBMDB, only markers collected from FDA/NCCN guidelines,
103+
scientific literature and clinical trials are included (markers
104+
collected from conference abstracts are not included)
105+
106+
**COSMIC variants**
107+
108+
The COSMIC dataset that is part of the PCGR annotation bundle is the
109+
subset of variants that satisfy the following criteria:
110+
111+
- **Mutation somatic status** is either '*confirmed\_somatic*' or
112+
'*reported\_in\_another\_cancer\_sample\_as\_somatic*'.
113+
- **Site/histology** must be known and the sample must come from a
114+
malignant tumor (i.e. not polyps/adenomas, which are also found in
115+
COSMIC)

docs/_build/html/_sources/getting_started.rst.txt

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -42,51 +42,53 @@ terminal window.
4242
Download PCGR
4343
^^^^^^^^^^^^^
4444

45-
- Download and unpack the `latest
46-
release <https://github.com/sigven/pcgr/releases/latest>`__
45+
April 12th 2017: New release (v0.3) \* Download and unpack the `latest
46+
release (v0.3) <https://github.com/sigven/pcgr/releases/latest>`__
4747

4848
- Download and unpack the data bundle (approx. 17Gb) in the PCGR
4949
directory
5050

5151
- Download `the latest data
5252
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
5353
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
54-
version number)
54+
version number, e.g. ``~/pcgr-0.3``)
5555
- Decompress and untar the bundle, e.g. through the following Unix
5656
command:
5757
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
5858

5959
A *data/* folder within the *pcgr-X.X* software folder should now
6060
have been produced
6161

62-
- Pull the `PCGR Docker
63-
image <https://hub.docker.com/r/sigven/pcgr/>`__ (3.5Gb) from
64-
DockerHub):
62+
- Pull the `PCGR Docker image -
63+
v0.3 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
64+
(3.1Gb) :
6565

66-
- ``docker pull sigven/pcgr`` (PCGR annotation engine)
66+
- ``docker pull sigven/pcgr:0.3`` (PCGR annotation engine)
6767

6868
Run test - generation of clinical report for a cancer genome
6969
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7070

7171
A tumor sample report is generated by calling the Python script
72-
**run\_pcgr.py**, which takes the following arguments and options:
72+
**pcgr.py**, which takes the following arguments and options:
7373

7474
::
7575

76-
usage: run_pcgr.py [-h] [--input_vcf INPUT_VCF]
76+
usage: pcgr.py [-h] [--input_vcf INPUT_VCF]
7777
[--input_cna_segments INPUT_CNA_SEGMENTS]
7878
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
7979
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
8080
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
8181
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
82-
pcgr_directory working_directory sample_id
82+
[--version]
83+
pcgr_dir output_dir sample_id
8384

8485
Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
8586
somatic nucleotide variants and copy number aberration segments
8687

8788
positional arguments:
88-
pcgr_directory PCGR base directory
89-
working_directory Working directory - directory with input/output files
89+
pcgr_dir PCGR base directory with accompanying data directory,
90+
e.g. ~/pcgr-0.3
91+
output_dir Output directory
9092
sample_id Tumor sample/cancer genome identifier - prefix for
9193
output files
9294

@@ -114,13 +116,14 @@ A tumor sample report is generated by calling the Python script
114116
output file already exists. You can force the
115117
overwrite of existing result files by using this flag
116118
(default: False)
119+
--version show program's version number and exit
117120

118121
The *examples* folder contain input files from two tumor samples
119122
sequenced within TCGA. A report for a colorectal tumor case can be
120123
generated by running the following command in your terminal window:
121124

122-
``python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments``
123-
``tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD``
125+
``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
126+
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3 ~/pcgr-0.3/examples tumor_sample.COAD``
124127

125128
This command will run the Docker-based PCGR workflow and produce the
126129
following output files in the *examples* folder:

docs/_build/html/annotation_resources.html

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -246,21 +246,26 @@ <h2>Genome mapping<a class="headerlink" href="#genome-mapping" title="Permalink
246246
</div>
247247
<div class="section" id="other-data-quality-concerns">
248248
<h2>Other data quality concerns<a class="headerlink" href="#other-data-quality-concerns" title="Permalink to this headline"></a></h2>
249-
<p><strong>Clinical biomarkers</strong> Clinical biomarkers included in PCGR is limited
250-
to the following: * Markers reported at the variant level (e.g. <strong>BRAF
251-
p.V600E</strong>) * Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)
252-
* Markers reported at the exon level (e.g. <strong>KIT exon 11 mutation</strong>) *
253-
Within CBMDB, only markers collected from FDA/NCCN guidelines,
249+
<p><strong>Clinical biomarkers</strong></p>
250+
<p>Clinical biomarkers included in PCGR is limited to the following:</p>
251+
<ul class="simple">
252+
<li>Markers reported at the variant level (e.g. <strong>BRAF p.V600E</strong>)</li>
253+
<li>Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)</li>
254+
<li>Markers reported at the exon level (e.g. <strong>KIT exon 11 mutation</strong>)</li>
255+
<li>Within CBMDB, only markers collected from FDA/NCCN guidelines,
254256
scientific literature and clinical trials are included (markers
255-
collected from conference abstracts are not included)</p>
256-
<p><strong>COSMIC variants</strong> The COSMIC dataset that is part of the PCGR
257-
annotation bundle is the subset of variants that satisfy the following
258-
criteria: * <strong>Mutation somatic status</strong> is either
259-
&#8216;<em>confirmed_somatic</em>&#8216; or
260-
&#8216;<em>reported_in_another_cancer_sample_as_somatic</em>&#8216;. *
261-
<strong>Site/histology</strong> must be known and the sample must come from a
257+
collected from conference abstracts are not included)</li>
258+
</ul>
259+
<p><strong>COSMIC variants</strong></p>
260+
<p>The COSMIC dataset that is part of the PCGR annotation bundle is the
261+
subset of variants that satisfy the following criteria:</p>
262+
<ul class="simple">
263+
<li><strong>Mutation somatic status</strong> is either &#8216;<em>confirmed_somatic</em>&#8216; or
264+
&#8216;<em>reported_in_another_cancer_sample_as_somatic</em>&#8216;.</li>
265+
<li><strong>Site/histology</strong> must be known and the sample must come from a
262266
malignant tumor (i.e. not polyps/adenomas, which are also found in
263-
COSMIC)</p>
267+
COSMIC)</li>
268+
</ul>
264269
</div>
265270
</div>
266271

0 commit comments

Comments
 (0)