update loeuf threshold and add table

ch-kr · ch-kr · commit 47a96962854b · 2026-01-21T15:25:50.000-05:00
diff --git a/browser/help/topics/constraint.md b/browser/help/topics/constraint.md
@@ -52,7 +52,19 @@ When evaluating how constrained a gene is, it is essential to take the 90% CI in
 
 One advantage of `oe` and `LOEUF` compared to `pLI` are that they are more direct measures of biological significance, and can be easily used as continuous values. For example, a doubling of `oe` from 0.2 to 0.4 conveys that 20% vs 40% of the expected number of variants has been observed in gnomAD. By contrast, a doubling of the `pLI` score (e.g., 0.45 to 0.9) is less immediately interpretable as `pLI` is fairly dichotomous with nearly all genes having scores < 0.1 or > 0.9. Intermediate `pLI` scores (0.1-0.9) are typically an indication that the gene was too small to be confidently categorized.
 
-Although `oe` and `LOEUF` are continuous values, we understand that it can be useful to use a threshold for certain applications. In particular, for the interpretation of Mendelian disease cases, we suggest using a `LOEUF` score < 0.5 as a threshold if needed. Again, ideally `oe` and `LOEUF` should be used as a continuous values rather than a cutoff.
+Although `oe` and `LOEUF` are continuous values, we understand that it can be useful to use a threshold for certain applications. We recommend using the following table to guide the interpretation of LOEUF across the pLoF constraint spectrum:
+
+| LOEUF score | LOEUF percentile | Number of genes |
+| :---------- | ---------------: | --------------: |
+| 0.15        |               99 |             177 |
+| 0.27        |               95 |             861 |
+| 0.36        |               90 |           1,711 |
+| 0.45        |               85 |           2,578 |
+| 0.60        |               75 |           3,433 |
+| 0.91        |               50 |           8,552 |
+| 1.19        |               25 |          12,815 |
+
+This table shows the LOEUF score that corresponds to a LOEUF percentile derived from 17,063 MANE Select transcripts. For the interpretation of Mendelian disease cases, we suggest using a `LOEUF` score < 0.45 (corresponding to a LOEUF percentile of 85) as a threshold if needed. Again, ideally `oe` and `LOEUF` should be used as a continuous values rather than a cutoff.
 
 As mentioned above, `oe` and `LOEUF` are dependent on sample size and we note that these values are slightly higher in v4 compared to v2 for all genes. The major impact of this is that any `LOEUF` thresholds used on v2 will not give an equivalent number of genes when applied to v4. This rise in `oe` is anticipated, particularly as we are now able to sample variants with a much lower population allele frequency than before (e.g., 1 in ~125,000 individuals vs 1 in ~730,000 individuals).
 
@@ -70,4 +82,4 @@ For more information, see [Samocha _et al._ Nature Genetics 2014](https://www.na
 
 #### <a id="loeuf-vs-pli"></a>What is the difference between the oe/LOEUF and pLI score?
 
-It is very important to note that `oe` (and thereby `LOEUF`) score is very different from that of `pLI`; in particular low `oe` values are indicative of strong intolerance, whereas high `pLI` scores indicate intolerance. In addition, while `pLI` incorporated the uncertainty around low counts (i.e a gene with low expected count, due to small size or low coverage, could not have a high `pLI`), `oe` does not. Therefore, the `oe` metric comes with a 90% CI. It is important to consider the confidence interval when using `oe`. The change from `pLI` to `oe` was motivated mainly by its easier interpretation and its continuity across the spectrum of selection. As an example, let’s take a gene with a `pLI` of 0.8: this means that this gene cannot be categorized as a highly likely haploinsufficient gene based on our data. However, it is unclear whether this value was obtained because of small sample or gene size or because there were too many loss-of-function (LoF) variants observed in the gene. In addition, if the cause was the latter, `pLI` doesn’t tell much about the overall selection against loss-of-function in this gene. On the other hand, a gene with an LoF `oe` of 0.4 can clearly be interpreted as a gene where only 40% of the expected loss-of-function variants were observed and therefore is likely under selection against LoF variants. In addition, the 90% CI allows us to clearly distinguish cases where there is a lot of uncertainty about the constraint for that gene due to sample size. Since `pLI` > 0.9 is widely used in research and clinical interpretation of Mendelian cases, we suggest using the upper bound of the `oe` confidence interval (which we term the "loss-of-function observed/expected upper bound fraction" or "`LOEUF`") < 0.5 if a hard threshold is needed.
+It is very important to note that `oe` (and thereby `LOEUF`) score is very different from that of `pLI`; in particular low `oe` values are indicative of strong intolerance, whereas high `pLI` scores indicate intolerance. In addition, while `pLI` incorporated the uncertainty around low counts (i.e a gene with low expected count, due to small size or low coverage, could not have a high `pLI`), `oe` does not. Therefore, the `oe` metric comes with a 90% CI. It is important to consider the confidence interval when using `oe`. The change from `pLI` to `oe` was motivated mainly by its easier interpretation and its continuity across the spectrum of selection. As an example, let’s take a gene with a `pLI` of 0.8: this means that this gene cannot be categorized as a highly likely haploinsufficient gene based on our data. However, it is unclear whether this value was obtained because of small sample or gene size or because there were too many loss-of-function (LoF) variants observed in the gene. In addition, if the cause was the latter, `pLI` doesn’t tell much about the overall selection against loss-of-function in this gene. On the other hand, a gene with an LoF `oe` of 0.4 can clearly be interpreted as a gene where only 40% of the expected loss-of-function variants were observed and therefore is likely under selection against LoF variants. In addition, the 90% CI allows us to clearly distinguish cases where there is a lot of uncertainty about the constraint for that gene due to sample size. Since `pLI` > 0.9 is widely used in research and clinical interpretation of Mendelian cases, we suggest using the upper bound of the `oe` confidence interval (which we term the "loss-of-function observed/expected upper bound fraction" or "`LOEUF`") < 0.45 if a hard threshold is needed.