Skip to content

Commit 5057722

Browse files
pichuankishwarshafin
authored andcommitted
Add SBX case study.
PiperOrigin-RevId: 799637820
1 parent e5b6765 commit 5057722

File tree

3 files changed

+127
-0
lines changed

3 files changed

+127
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ DeepVariant supports germline variant-calling in diploid organisms.
2323
* Complete Genomics
2424
[T7 case study](docs/deepvariant-complete-t7-case-study.md);
2525
[G400 case study](docs/deepvariant-complete-g400-case-study.md).
26+
* Roche [SBX case study](docs/sbx-case-study.md) for SBX-D and SBX-Fast data.
2627
* Pangenome-mapping-based case-study:
2728
[vg case study](docs/deepvariant-vg-case-study.md).
2829
* RNA data for

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
[Mapped with VG](pangenome-aware-wgs-vg-case-study.md)
2323
* Pangenome-aware DeepVariant WES:
2424
[Mapped with BWA](pangenome-aware-wes-bwa-case-study.md)
25+
* [SBX case study](sbx-case-study.md)
2526

2627
## Visualization and analysis
2728

docs/sbx-case-study.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# SBX case study for SBX-D and SBX-Fast data
2+
3+
## Prepare environment
4+
5+
### Tools
6+
7+
[Docker](https://docs.docker.com/get-docker/) will be used to run DeepVariant
8+
and [hap.py](https://github.com/illumina/hap.py),
9+
10+
### Download Reference
11+
12+
We will be using GRCh38 for this case study.
13+
14+
```bash
15+
mkdir -p reference
16+
17+
FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids
18+
19+
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > reference/GRCh38_no_alt_analysis_set.fasta
20+
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > reference/GRCh38_no_alt_analysis_set.fasta.fai
21+
```
22+
23+
### Download T2T v1.1 truth for benchmarking
24+
25+
We will benchmark our variant calls against T2T v1.1 truth for HG002.
26+
27+
```bash
28+
mkdir -p benchmark
29+
30+
HTTPDIR=https://storage.googleapis.com/deepvariant/case-study-testdata
31+
32+
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz
33+
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz.tbi > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz.tbi
34+
curl ${HTTPDIR}/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed > benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed
35+
```
36+
37+
### Download GBZ built for GRCh38
38+
39+
```bash
40+
mkdir -p input
41+
HTTPDIR=https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38
42+
43+
curl ${HTTPDIR}/hprc-v1.1-mc-grch38.gbz > input/hprc-v1.1-mc-grch38.gbz
44+
```
45+
46+
### Download HG002 BAM
47+
48+
Please download a SBX-D (or SBX-Fast) HG002 BAM and put it in your input dir.
49+
50+
```bash
51+
# Download your HG002 BAM
52+
HG002_BAM=/input/your_HG002.bam # This will be used in the command later.
53+
```
54+
55+
### Download the model
56+
57+
In this case study, we're calling variants on HG002 chr20, and we'll evaluate
58+
with T2T v1.1 truth. We'll use the "leave-out-HG001" model, which we also left
59+
out all chromosome 20 from training or tuning. Refer to the technical white
60+
paper for more details on all experiments.
61+
62+
```bash
63+
mkdir -p model
64+
65+
HTTPDIR=https://storage.googleapis.com/brain-genomics-public/research/sbx/2025/models/leave-out-HG001
66+
67+
curl ${HTTPDIR}/model.ckpt.data-00000-of-00001 > model/model.ckpt.data-00000-of-00001
68+
curl ${HTTPDIR}/model.ckpt.index > model/model.ckpt.index
69+
curl ${HTTPDIR}/example_info.json > model/example_info.json
70+
```
71+
72+
## Running DeepVariant with one command, on a CPU-only machine
73+
74+
```bash
75+
mkdir -p output
76+
mkdir -p output/intermediate_results_dir
77+
78+
BIN_VERSION="pangenome_aware_deepvariant-head784362481"
79+
80+
sudo docker run \
81+
-v "${PWD}/input":"/input" \
82+
-v "${PWD}/output":"/output" \
83+
-v "${PWD}/reference":"/reference" \
84+
-v "${PWD}/model":"/model" \
85+
--shm-size 12gb \
86+
google/deepvariant:"${BIN_VERSION}" \
87+
/opt/deepvariant/bin/run_pangenome_aware_deepvariant \
88+
--model_type WGS \
89+
--ref /reference/GRCh38_no_alt_analysis_set.fasta \
90+
--reads "${HG002_BAM}" \
91+
--pangenome /input/hprc-v1.1-mc-grch38.gbz \
92+
--output_vcf /output/HG002.chr20.output.vcf.gz \
93+
--output_gvcf /output/HG002.chr20.output.g.vcf.gz \
94+
--num_shards $(nproc) \
95+
--regions chr20 \
96+
--intermediate_results_dir /output/intermediate_results_dir \
97+
--customized_model /model/model.ckpt \
98+
--make_examples_extra_args="alt_aligned_pileup=single_row,create_complex_alleles=true,enable_strict_insertion_filter=true,keep_legacy_allele_counter_behavior=true,keep_only_window_spanning_haplotypes=true,keep_supplementary_alignments=true,min_mapping_quality=0,normalize_reads=true,pileup_image_height_pangenome=100,pileup_image_height_reads=100,pileup_image_width=301,sort_by_haplotypes=true,trim_reads_for_pileup=true,vsc_min_fraction_indels=0.08,ws_min_base_quality=25" \
99+
--postprocess_variants_extra_args="multiallelic_mode=product"
100+
```
101+
102+
## Benchmark on chr20
103+
104+
```bash
105+
mkdir -p happy
106+
107+
sudo docker pull jmcdani20/hap.py:v0.3.12
108+
109+
sudo docker run \
110+
-v "${PWD}/benchmark":"/benchmark" \
111+
-v "${PWD}/input":"/input" \
112+
-v "${PWD}/output":"/output" \
113+
-v "${PWD}/reference":"/reference" \
114+
-v "${PWD}/happy:/happy" \
115+
jmcdani20/hap.py:v0.3.12 \
116+
/opt/hap.py/bin/hap.py \
117+
/benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.vcf.gz \
118+
/output/HG002.chr20.output.vcf.gz \
119+
-f /benchmark/GRCh38_HG2-T2TQ100-V1.1_smvar.benchmark.bed \
120+
-r /reference/GRCh38_no_alt_analysis_set.fasta \
121+
-o /happy/happy.output \
122+
--engine=vcfeval \
123+
--pass-only \
124+
-l chr20
125+
```

0 commit comments

Comments
 (0)