Skip to content

Commit 69de578

Browse files
committed
Merge branch 'kc/update_constraint_docs' of https://github.com/broadinstitute/gnomad-browser into kc/update_constraint_docs
2 parents 15ea7ad + 7bf87fb commit 69de578

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

browser/help/topics/constraint.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ The observed variant count is the number of unique single nucleotide variants in
3030

3131
#### Expected variant count
3232

33-
[Coverage](gnomad.broadinstitute.org/help/how-was-coverage-calculated) for gnomAD v4 was calculated from sample [genomic VCFs (gVCFs)](https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format), which is less granular than coverage information from read data due to the reference block structure within gVCFs. In gnomAD v4.1.1, we use [allele number (AN)](https://gnomad.broadinstitute.org/news/2024-04-gnomad-v4-1/#allele-numbers-across-all-possible-sites) as a higher resolution proxy for coverage in constraint calculations.
33+
[Coverage](https://gnomad.broadinstitute.org/help/how-was-coverage-calculated) for gnomAD v4 was calculated from sample [genomic VCFs (gVCFs)](https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format), which is less granular than coverage information from read data due to the reference block structure within gVCFs. In gnomAD v4.1.1, we use [allele number (AN)](https://gnomad.broadinstitute.org/news/2024-04-gnomad-v4-1/#allele-numbers-across-all-possible-sites) as a higher resolution proxy for coverage in constraint calculations.
3434

35-
We calculate the expected number of variants for all bases with median AN percent (percent of total possible allele number observed at a site) ≥ 20 in our exome samples using a mutational model that corrects for local sequence context and CpG methylation levels. Following the methods described in section 4.1 of the supplement in [Karczewski _et al._ Nature 2020](https://www.nature.com/articles/s41586-020-2308-7), we calculate a coverage model for sites with median AN percent between 20% and 90% in the gnomAD exome samples and use this model to adjust expected variant counts at low coverage sites.
35+
We calculate the expected number of variants for all bases with AN percent (percent of total possible allele number observed at a site) ≥ 20 in our exome samples using a mutational model that corrects for local sequence context and CpG methylation levels. Following the methods described in section 4.1 of the supplement in [Karczewski _et al._ Nature 2020](https://www.nature.com/articles/s41586-020-2308-7), we calculate a coverage model for sites with AN percent between 20% and 90% in the gnomAD exome samples and use this model to adjust expected variant counts at low coverage sites.
3636

3737
#### pLoF Variant types
3838

@@ -42,7 +42,7 @@ For pLoF counts, only nonsense, splice donor and acceptor site variants caused b
4242

4343
#### <a id="loeuf"></a>Observed / expected (`oe`) and the Loss-of-function Observed / expected upper bound fraction (`LOEUF`) score
4444

45-
We have calculated the ratio of the observed / expected (`oe`) number of loss-of-function variants for all bases with median AN percent ≥ 20 in the MANE Select (v4 on GRCh38) or canonical (ExAC and v2 on GRCh37) and other non-Select/canonical transcript for each gene. The expected counts are based on a mutational model that takes sequence context and methylation into account.
45+
We have calculated the ratio of the observed / expected (`oe`) number of loss-of-function variants for all bases with AN percent ≥ 20 in the MANE Select (v4 on GRCh38) or canonical (ExAC and v2 on GRCh37) and other non-Select/canonical transcript for each gene. The expected counts are based on a mutational model that takes sequence context and methylation into account.
4646

4747
In its original formulation, LOEUF was computed using a frequentist approach: the observed and expected LoF counts were modeled as Poisson-distributed, and the score was defined as the upper bound of a central 90% Poisson confidence interval around the observed count, divided by the neutral expectation. While intuitive, this approach treats the true underlying number of LoF variants as a fixed but unknown parameter, and the confidence interval it produces has a strictly frequentist interpretation — one that does not directly quantify uncertainty about it given the data at hand.
4848

0 commit comments

Comments
 (0)