Skip to content

Commit 2153f7c

Browse files
committed
updated docs
1 parent 790db81 commit 2153f7c

27 files changed

+316
-163
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/
5555

5656
1. Download and unpack the [latest release](https://github.com/sigven/pcgr/releases/latest)
5757
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
58-
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mRjkxMXVaNm1zQ1U/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number)
58+
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number)
5959
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`
6060

6161
A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
7.54 KB
Binary file not shown.
1.12 KB
Binary file not shown.
0 Bytes
Binary file not shown.

docs/_build/html/_sources/annotation_resources.rst.txt

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,40 @@ Cancer gene knowledge bases
7171
suppressor/oncogene database (November 2015)
7272
- `Cancer Gene Cencus <http://cancer.sanger.ac.uk/cosmic/>`__ -
7373
(February 2017)
74+
75+
Notes on variant annotation datasets
76+
------------------------------------
77+
78+
Genome mapping
79+
~~~~~~~~~~~~~~
80+
81+
A requirement for all variant annotation datasets used in PCGR is that
82+
they have been mapped unambiguously to the human genome (GRCh37). For
83+
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
84+
etc.). A significant proportion of variants in the annotation datasets
85+
related to clinical interpretation, CIViC and CBMDB, are however not
86+
mapped to the genome. Whenever possible, we have utilized
87+
`TransVar <http://bioinformatics.mdanderson.org/transvarweb/>`__ to
88+
identify the actual genomic variants (e.g. *g.chr7:140453136A>T*) that
89+
corresponds to variants reported with other HGVS nomenclature (e.g.
90+
*p.V600E*).
91+
92+
Other data quality concerns
93+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
94+
95+
**Clinical biomarkers** Clinical biomarkers included in PCGR is limited
96+
to the following: \* Markers reported at the variant level (e.g. **BRAF
97+
p.V600E**) \* Markers reported at the codon level (e.g. **KRAS p.G12**)
98+
\* Markers reported at the exon level (e.g. **KIT exon 11 mutation**) \*
99+
Within CBMDB, only markers collected from FDA/NCCN guidelines,
100+
scientific literature and clinical trials are included (markers
101+
collected from conference abstracts are not included)
102+
103+
**COSMIC variants** The COSMIC dataset that is part of the PCGR
104+
annotation bundle is the subset of variants that satisfy the following
105+
criteria: \* **Mutation somatic status** is either
106+
'*confirmed\_somatic*' or
107+
'*reported\_in\_another\_cancer\_sample\_as\_somatic*'. \*
108+
**Site/histology** must be known and the sample must come from a
109+
malignant tumor (i.e. not polyps/adenomas, which are also found in
110+
COSMIC)

docs/_build/html/_sources/getting_started.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Download PCGR
4949
directory
5050

5151
- Download `the latest data
52-
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mRjkxMXVaNm1zQ1U/>`__
52+
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
5353
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
5454
version number)
5555
- Decompress and untar the bundle, e.g. through the following Unix

docs/_build/html/about.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@
9696
</li>
9797
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
9898
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html">Annotation resources</a></li>
99+
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html#notes-on-variant-annotation-datasets">Notes on variant annotation datasets</a></li>
99100
<li class="toctree-l1"><a class="reference internal" href="output.html">Input &amp; output</a></li>
100101
</ul>
101102

docs/_build/html/annotation_resources.html

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,11 @@
9999
<li class="toctree-l2"><a class="reference internal" href="#cancer-gene-knowledge-bases">Cancer gene knowledge bases</a></li>
100100
</ul>
101101
</li>
102+
<li class="toctree-l1"><a class="reference internal" href="#notes-on-variant-annotation-datasets">Notes on variant annotation datasets</a><ul>
103+
<li class="toctree-l2"><a class="reference internal" href="#genome-mapping">Genome mapping</a></li>
104+
<li class="toctree-l2"><a class="reference internal" href="#other-data-quality-concerns">Other data quality concerns</a></li>
105+
</ul>
106+
</li>
102107
<li class="toctree-l1"><a class="reference internal" href="output.html">Input &amp; output</a></li>
103108
</ul>
104109

@@ -223,6 +228,40 @@ <h2>Cancer gene knowledge bases<a class="headerlink" href="#cancer-gene-knowledg
223228
(February 2017)</li>
224229
</ul>
225230
</div>
231+
</div>
232+
<div class="section" id="notes-on-variant-annotation-datasets">
233+
<h1>Notes on variant annotation datasets<a class="headerlink" href="#notes-on-variant-annotation-datasets" title="Permalink to this headline"></a></h1>
234+
<div class="section" id="genome-mapping">
235+
<h2>Genome mapping<a class="headerlink" href="#genome-mapping" title="Permalink to this headline"></a></h2>
236+
<p>A requirement for all variant annotation datasets used in PCGR is that
237+
they have been mapped unambiguously to the human genome (GRCh37). For
238+
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
239+
etc.). A significant proportion of variants in the annotation datasets
240+
related to clinical interpretation, CIViC and CBMDB, are however not
241+
mapped to the genome. Whenever possible, we have utilized
242+
<a class="reference external" href="http://bioinformatics.mdanderson.org/transvarweb/">TransVar</a> to
243+
identify the actual genomic variants (e.g. <em>g.chr7:140453136A&gt;T</em>) that
244+
corresponds to variants reported with other HGVS nomenclature (e.g.
245+
<em>p.V600E</em>).</p>
246+
</div>
247+
<div class="section" id="other-data-quality-concerns">
248+
<h2>Other data quality concerns<a class="headerlink" href="#other-data-quality-concerns" title="Permalink to this headline"></a></h2>
249+
<p><strong>Clinical biomarkers</strong> Clinical biomarkers included in PCGR is limited
250+
to the following: * Markers reported at the variant level (e.g. <strong>BRAF
251+
p.V600E</strong>) * Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)
252+
* Markers reported at the exon level (e.g. <strong>KIT exon 11 mutation</strong>) *
253+
Within CBMDB, only markers collected from FDA/NCCN guidelines,
254+
scientific literature and clinical trials are included (markers
255+
collected from conference abstracts are not included)</p>
256+
<p><strong>COSMIC variants</strong> The COSMIC dataset that is part of the PCGR
257+
annotation bundle is the subset of variants that satisfy the following
258+
criteria: * <strong>Mutation somatic status</strong> is either
259+
&#8216;<em>confirmed_somatic</em>&#8216; or
260+
&#8216;<em>reported_in_another_cancer_sample_as_somatic</em>&#8216;. *
261+
<strong>Site/histology</strong> must be known and the sample must come from a
262+
malignant tumor (i.e. not polyps/adenomas, which are also found in
263+
COSMIC)</p>
264+
</div>
226265
</div>
227266

228267

docs/_build/html/genindex.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@
9090
<li class="toctree-l1"><a class="reference internal" href="about.html">About</a></li>
9191
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
9292
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html">Annotation resources</a></li>
93+
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html#notes-on-variant-annotation-datasets">Notes on variant annotation datasets</a></li>
9394
<li class="toctree-l1"><a class="reference internal" href="output.html">Input &amp; output</a></li>
9495
</ul>
9596

docs/_build/html/getting_started.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@
100100
</ul>
101101
</li>
102102
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html">Annotation resources</a></li>
103+
<li class="toctree-l1"><a class="reference internal" href="annotation_resources.html#notes-on-variant-annotation-datasets">Notes on variant annotation datasets</a></li>
103104
<li class="toctree-l1"><a class="reference internal" href="output.html">Input &amp; output</a></li>
104105
</ul>
105106

@@ -195,7 +196,7 @@ <h3>Download PCGR<a class="headerlink" href="#download-pcgr" title="Permalink to
195196
<li><p class="first">Download and unpack the data bundle (approx. 17Gb) in the PCGR
196197
directory</p>
197198
<ul class="simple">
198-
<li>Download <a class="reference external" href="https://drive.google.com/file/d/0B8aYD2TJ472mRjkxMXVaNm1zQ1U/">the latest data
199+
<li>Download <a class="reference external" href="https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/">the latest data
199200
bundle</a>
200201
from Google Drive to <code class="docutils literal"><span class="pre">~/pcgr-X.X</span></code> (replace <em>X.X</em> with the
201202
version number)</li>

0 commit comments

Comments
 (0)