Skip to content

Commit 1c36e28

Browse files
committed
doc updates and more TSV output
1 parent b58028f commit 1c36e28

File tree

8 files changed

+119
-24
lines changed

8 files changed

+119
-24
lines changed

R/main.R

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,25 @@ write_cpsr_output <- function(report,
260260
".classification.tsv.gz"
261261
)
262262
)
263+
fnames[["tsv_bm"]] <-
264+
file.path(
265+
output_dir,
266+
paste0(
267+
sample_fname_pattern,
268+
".biomarker_evidence.tsv.gz"
269+
)
270+
)
271+
272+
fnames[["tsv_pgx"]] <-
273+
file.path(
274+
output_dir,
275+
paste0(
276+
sample_fname_pattern,
277+
".pgx_findings.tsv.gz"
278+
)
279+
)
280+
281+
263282
fnames[["xlsx"]] <-
264283
file.path(
265284
output_dir,
@@ -372,12 +391,11 @@ write_cpsr_output <- function(report,
372391

373392
if (output_format == "tsv") {
374393
if (NROW(
375-
report[["content"]][["snv_indel"]]$callset$tsv
376-
) > 0) {
394+
report[["content"]][["snv_indel"]]$callset$tsv) > 0) {
377395
pcgrr::log4r_info("------")
378396
pcgrr::log4r_info(
379397
paste0(
380-
"Generating SNV/InDel tab-separated values file (.tsv.gz) ",
398+
"Generating tab-separated values file (.tsv.gz) ",
381399
"with variant findings"
382400
)
383401
)
@@ -389,6 +407,57 @@ write_cpsr_output <- function(report,
389407
na = "."
390408
)
391409
}
410+
## Biomarker TSV
411+
if (NROW(report$content$snv_indel$callset$variant$bm) > 0) {
412+
biomarker_tsv <- report$content$snv_indel$callset$variant$bm |>
413+
dplyr::mutate(
414+
BM_MOLECULAR_PROFILE = pcgrr::strip_html(
415+
.data$BM_MOLECULAR_PROFILE
416+
)
417+
) |>
418+
dplyr::select(
419+
dplyr::any_of(
420+
cpsr::col_format_output[["xlsx_biomarker"]]
421+
))
422+
423+
pcgrr::log4r_info("------")
424+
pcgrr::log4r_info(
425+
paste0(
426+
"Generating tab-separated values file (.tsv.gz) ",
427+
"with biomarker evidence"
428+
)
429+
)
430+
readr::write_tsv(
431+
biomarker_tsv,
432+
file = fnames[["tsv_bm"]],
433+
col_names = T,
434+
quote = "none",
435+
na = "."
436+
)
437+
}
438+
439+
if (NROW(report[["content"]]$snv_indel$callset$variant$pgx) > 0) {
440+
pgx_tsv <- report[["content"]]$snv_indel$callset$variant$pgx |>
441+
dplyr::select(
442+
dplyr::any_of(
443+
cpsr::col_format_output[["xlsx_pgx"]]
444+
))
445+
pcgrr::log4r_info("------")
446+
pcgrr::log4r_info(
447+
paste0(
448+
"Generating tab-separated values file (.tsv.gz) ",
449+
"with pharmacogenomic evidence"
450+
)
451+
)
452+
readr::write_tsv(
453+
pgx_tsv,
454+
file = fnames[["tsv_pgx"]],
455+
col_names = T,
456+
quote = "none",
457+
na = "."
458+
)
459+
}
460+
392461
}
393462

394463
if (output_format == "xlsx") {

inst/templates/quarto/cpsr_gwas.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,8 @@ DT::datatable(variants_gwas_cancer,
7070
"GENOTYPE",
7171
backgroundColor =
7272
DT::styleEqual(
73-
"hom_ref",
74-
"grey"
73+
cpsr::color_palette[['genotypes']][['levels']],
74+
cpsr::color_palette[['genotypes']][['values']]
7575
)
7676
)
7777

vignettes/CHANGELOG.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ output: rmarkdown::html_document
55

66
## v2.2.0
77

8-
- Date: **2025-03-XX**
8+
- Date: **2025-03-22**
99
- Major data updates
1010
- ClinVar (2025-03)
1111
- dbNSFP (v5.0)
@@ -14,7 +14,7 @@ output: rmarkdown::html_document
1414
- PanelApp (2025-02)
1515
- UniProt KB (2025-01)
1616
- Cancer Gene Census (v101)
17-
- Added more cancer susceptibility genes in panel zero (ATG12, BIK, CHD1L, CMTR2,
17+
- Added more cancer susceptibility genes to panel zero (ATG12, BIK, CHD1L, CMTR2,
1818
CPAP, HAVCR2, LLGL2, MYO3A, MYO5B, PAH, TTC7A)
1919
- Added a pharmacogenetic findings option (`--pgx_findings`), which will include pharmacogenetic findings in the HTML report (within the `Genomic biomarkers` section)
2020
- For now, this is implemented very simple, not considering star alleles, but merely focusing on pathogenic variants or drug-response related variants in DPYD, TPMT, and NUDT15

vignettes/installation.Rmd

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@ output: rmarkdown::html_document
44
---
55

66
CPSR is distributed alongside the [Personal Cancer Genome Reporter (PCGR)](https://github.com/sigven/pcgr), so please follow
7-
the [PCGR installation steps](https://sigven.github.io/pcgr/articles/installation.html) to install CPSR, either through [Docker](https://docs.docker.com/), [Apptainer/Singularity](https://apptainer.org/docs/user/latest/index.html), or [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html).
7+
the [PCGR installation steps](https://sigven.github.io/pcgr/articles/installation.html) to install CPSR, either through
8+
9+
- [Docker](https://sigven.github.io/pcgr/articles/installation.html#b--docker), or
10+
- [Apptainer/Singularity](https://sigven.github.io/pcgr/articles/installation.html#c--singularityapptainer), or
11+
- [Conda](https://sigven.github.io/pcgr/articles/installation.html#a--conda).
812

913
We recommend Conda as the simplest framework to install PCGR and CPSR, using either a MacOS or a Linux platform.
1014

vignettes/output.Rmd

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,16 +45,17 @@ The report is structured in multiple sections, described briefly below:
4545
* Annotation resources
4646
* Information on annotation sources utilized by CPSR, including versions and licensing requirements
4747
* Variant classification
48-
* Overview of how CPSR performs variant classification of variants not recorded in ClinVar, listing ACMG criteria and associated scores
48+
* Overview of how CPSR performs variant annotation and classification of variants not recorded in ClinVar,
49+
listing ACMG criteria and associated scores, calibration of classification thresholds etc.
4950

5051
8. __References__
51-
* Supporting scientific literature - knowledge resources, guideline references etc.)
52+
* Supporting scientific literature - knowledge resources, guideline references etc.
5253

5354
<br><br>
5455

5556
### Variant call format - VCF
5657

57-
A VCF file containing annotated, germline calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:
58+
A VCF file containing annotated, germline variant calls (single nucleotide variants and insertions/deletions) is generated with the following naming convention:
5859

5960
- `<sample_id>.cpsr.<genome_assembly>.vcf.gz (.tbi)`
6061
- The __sample_id__ is provided as input by the user, and reflects a unique identifier of the sample to be analyzed. Following common standards, the annotated VCF file is compressed with [bgzip](http://www.htslib.org/doc/bgzip.html) and indexed with [tabix](http://www.htslib.org/doc/tabix.html). Below follows a description of all annotations/tags present in the VCF INFO column after processing with the CPSR annotation pipeline:
@@ -322,6 +323,8 @@ be present if any data is found):
322323

323324
### Tab-separated values - TSV
324325

326+
#### _Variant classification_
327+
325328
We provide a compressed tab-separated values file with variant classifications and the most essential variant/gene annotations. The file has the following naming convention:
326329

327330
- `<sample_id>.cpsr.<genome_assembly>.classification.tsv.gz`
@@ -403,6 +406,19 @@ The following variables are included in the tiered TSV file (VCF tags in the que
403406

404407
**NOTE**: The user has the possibility to append the TSV file with data from other INFO tags in the input VCF (i.e. using the *--retained_info_tags* option)
405408

409+
#### _Biomarker evidence_
410+
411+
We provide a compressed tab-separated values file with variants implicated as germline biomarkers. The file has the following naming convention:
412+
413+
- `<sample_id>.cpsr.<genome_assembly>.biomarker_evidence.tsv.gz`
414+
415+
#### _Pharmacogenetic findings_
416+
417+
We provide a compressed tab-separated values file with variants implicated with drug toxicity/dosage effects of cancer chemotherapies. The file has the following naming convention:
418+
419+
- `<sample_id>.cpsr.<genome_assembly>.pgx_findings.tsv.gz`
420+
421+
406422
<br><br>
407423

408424
### Biomarker annotations

vignettes/running.Rmd

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The user can flexibly choose the set of cancer predisposition genes for which in
1919

2020
The user can choose from a range of pre-defined gene panels, selected from the following list of panel identifiers:
2121

22-
- **0**: [Exploratory panel - all cancer predisposition gene](https://sigven.github.io/cpsr/articles/virtual_panels.html)
22+
- **0**: [Exploratory panel - all cancer predisposition genes](https://sigven.github.io/cpsr/articles/virtual_panels.html)
2323
- **1**: [Adult solid tumours cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/245/)
2424
- **2**: [Adult solid tumours for rare disease](https://panelapp.genomicsengland.co.uk/panels/391/)
2525
- **3**: [Bladder cancer pertinent cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/208/)
@@ -65,10 +65,11 @@ The user can choose from a range of pre-defined gene panels, selected from the f
6565
- **43**: [Upper gastrointestinal cancer pertinent cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/273/)
6666
- **44**: [DNA repair genes pertinent cancer susceptibility](https://panelapp.genomicsengland.co.uk/panels/256/)
6767

68+
<br>
6869

6970
#### Custom-made virtual gene panels
7071

71-
CPSR allows users to create custom virtual gene panels for reporting. Any set of genes found in the [CPSR superpanel (panel 0)](virtual_panels.html#panel-0) can be used to design a custom virtual gene panel. Technically, the users need to create a simple one-column text file with Ensembl gene identifiers, and provide a name for the custom panel:
72+
CPSR allows users to create custom virtual gene panels for reporting. Any set of genes found in the [CPSR superpanel (panel 0)](virtual_panels.html#panel-0) can be used to design a custom virtual gene panel. Technically, the users need to create a simple one-column text (TSV) file with Ensembl gene identifiers, and provide a name for the custom panel, using the following command line options:
7273

7374
* `--custom_list <custom_list_tsv>`
7475
* `--custom_list_name <custom_list_name`
@@ -104,6 +105,8 @@ By default, CPSR do not report variants in the input sample that are found in ca
104105

105106
* `--clinvar_report_noncancer`
106107

108+
<br>
109+
107110
### Optional report contents
108111

109112
CPSR allows users to report recommended [incidental findings](https://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/), the occurrence of important variants with respec to chemotherapy toxicity, and also the genotypes of reported cancer risk loci from [genome-wide association studies (GWAS)](https://www.ebi.ac.uk/gwas/):
@@ -144,7 +147,7 @@ Panel options:
144147
--panel_id VIRTUAL_PANEL_ID
145148
Comma-separated string with identifier(s) of predefined virtual cancer predisposition gene panels,
146149
choose any combination of the following identifiers (GEP = Genomics England PanelApp):
147-
0 = CPSR exploratory cancer predisposition panel (PanelApp genes / TCGA's germline study / Cancer Gene Census / Other)
150+
0 = CPSR exploratory cancer predisposition panel (PanelApp genes / TCGA's germline study / Cancer Gene Census / Other )
148151
1 = Adult solid tumours cancer susceptibility (GEP)
149152
2 = Adult solid tumours for rare disease (GEP)
150153
3 = Bladder cancer pertinent cancer susceptibility (GEP)
@@ -249,14 +252,15 @@ Report generation with the example VCF, using the [Adult solid tumours cancer su
249252
$ (base) conda activate pcgr
250253
$ (pcgr)
251254
cpsr \
252-
--input_vcf ~/cpsr-2.0.0/inst/examples/example.vcf.gz \
255+
--input_vcf ~/cpsr-2.2.0/inst/examples/example.vcf.gz \
253256
--vep_dir ~/.vep \
254257
--refdata_dir ~/pcgr_ref_data \
255-
--output_dir ~/cpsr-2.0.0/ \
258+
--output_dir ~/cpsr-2.2.0/ \
256259
--genome_assembly grch37 \
257260
--panel_id 1 \
258261
--sample_id example \
259262
--secondary_findings \
263+
--pgx_findings \
260264
--classify_all \
261265
--maf_upper_threshold 0.2 \
262266
--force_overwrite
@@ -273,7 +277,9 @@ This command will produce the following output files in the _output_ folder:
273277
5. __example.cpsr.grch37.xlsx__ - An Excel workbook that contains
274278
* _i)_ information on virtual gene panel interrogated for variants
275279
* _ii)_ classification of clinical significance for variants overlapping with cancer predisposition genes
276-
* _iii)_ match of variants with existing biomarkers (if any found)
277-
* _iv)_ secondary findings (if any found)
280+
* _iii)_ secondary findings (if any found)
281+
* _iv)_ match of variants with existing biomarkers (if any found)
282+
* _v)_ overlap with pharmacogenomic variants (if any found)
283+
278284
6. __example.cpsr.grch37.html__ - Interactive HTML report with clinically relevant variants in cancer predisposition genes
279285
7. __example.cpsr.grch37.snvs_indels.classification.tsv.gz__ - TSV file with key annotations of germline SNVs/InDels classified according to clinical significance

vignettes/variant_classification.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ while(i <= NROW(cpsr::acmg$evidence_codes)){
3737
```
3838

3939
<br>
40-
Currently (as of 2025-02), based on a calibration against ClinVar-classified variants (minimum two review status stars) in n = 105 core cancer predisposition genes, the clinical significance (**CPSR_CLASSIFICATION**) is determined based on the following ranges of pathogenicity scores:
40+
Currently (as of March 2025), based on a calibration against ClinVar-classified variants (minimum two review status stars) in n = 105 core cancer predisposition genes, the clinical significance (**CPSR_CLASSIFICATION**) is determined based on the following ranges of pathogenicity scores:
4141

4242
<br>
4343

vignettes/virtual_panels.Rmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ output: rmarkdown::html_document
77

88
The cancer predisposition report can show variants found in a number of well-known cancer predisposition genes, and the specific set of genes can be customized by the user by choosing any of the following __virtual gene panels (0 - 44)__:
99

10-
* **Panel 0** is a non-conservative, research-based _superpanel_ assembled through multiple sources on cancer predisposition genes:
11-
* A list of 152 genes that were curated and established within TCGA’s pan-cancer study ([Huang et al., *Cell*, 2018](https://www.ncbi.nlm.nih.gov/pubmed/29625052))
12-
* A list of 114 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v100](https://cancer.sanger.ac.uk/census),
10+
* **Panel 0** is a non-conservative, research-based _superpanel_ assembled through multiple sources on cancer predisposition genes:
11+
* A list of 151 genes that were curated and established within TCGA’s pan-cancer study ([Huang et al., *Cell*, 2018](https://www.ncbi.nlm.nih.gov/pubmed/29625052))
12+
* A list of 114 protein-coding genes that has been manually curated in COSMIC’s [Cancer Gene Census v101](https://cancer.sanger.ac.uk/census),
1313
* Genes from all [Genomics England PanelApp](https://panelapp.genomicsengland.co.uk/) panels for inherited cancers and tumor syndromes, as well as DNA repair genes (detailed below)
14-
* Additional genes deemed relevant for cancer predisposition (i.e. contributed by CPSR users)
14+
* Additional genes deemed relevant for cancer predisposition (i.e. contributed by CPSR users etc.)
1515

1616

17-
The combination of the above sources resulted in a non-redundant set of **n = 572**
17+
The combination of the above sources resulted in a non-redundant set of **n = 574**
1818
genes of relevance for cancer predisposition (see complete details [below](#panel-0))
1919

2020
Data with respect to mechanisms of inheritance (<i>MoI</i> - autosomal recessive (AR) vs. autosomal

0 commit comments

Comments
 (0)