Merge pull request #349 from USEPA/347-tcpl-top-and-loglikelihood-updates-ahead-of-cran-release

cthunes · web-flow · commit 060dd5099194 · 2025-04-08T12:01:23.000-04:00
347-tcpl-top-and-loglikelihood-updates-ahead-of-cran-release
diff --git a/vignettes/Introduction_Appendices.Rmd b/vignettes/Introduction_Appendices.Rmd
@@ -1557,12 +1557,12 @@ OutputParameters <- c(
   "a (y-scale) </br> b (x-scale)", # quadratic
   "a (y-scale) </br> b (x-scale)", # quadratic
   "a (y-scale) </br> p (power)", # power
-  "tp (top) </br> ga (gain AC50) </br> p (gain-power)", # hill
-  "tp (top) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss
+  "tp (top parameter) </br> ga (gain AC50) </br> p (gain-power)", # hill
+  "tp (top parameter) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss
   "a (y-scale) </br> b (x-scale)", # exp2
   "a (y-scale) </br> b (x-scale) </br> p (power)", # exp3
-  "tp (top) </br> ga (AC50)", # exp4
-  "tp (top) </br> ga (AC50) </br> p (power)" # exp5
+  "tp (top parameter) </br> ga (AC50)", # exp4
+  "tp (top parameter) </br> ga (AC50) </br> p (power)" # exp5
 )
 # Fifth column - additional model details.
 Details <- c(
@@ -2117,12 +2117,14 @@ In tcpl v2, activity hit calls (hitc) were binary, where 0 was negative, 1 was p
 
 Continuous $hitc$ as defined in [tcplfit2 R package](https://CRAN.R-project.org/package=tcplfit2) is calculated as the product of three proportional weights representing the confidence that: 
 
-* $p1$: “the winning AIC value is less than that of the constant model.” 
-  * Determine whether the constant model – if allowed to win – is a better fit than the winning model – i.e., is the winning model essentially flat or not.The constant model may never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous hitc will be zero.
-* $p2$: “at least one median response is greater than the cutoff.” 
-  * At least one dose group has a central tendency of the response values “outside” the cutoff band (consider bi-directional). Response is greater than cutoff in “+” direction and less than cutoff in “–” direction.
-* $p3$: “the top of the fitted curve is above the cutoff”
-  * Determine whether the predicted maximal response exceeds the cutoff, i.e. the response corresponding to the effect size of interest.
+* $p_1$: “the winning AIC value is less than that of the constant model” 
+  * Determine whether the constant model – if it were allowed to win – is a better fit to the observed data than the winning model – i.e., is the winning model essentially flat or not. The constant model can never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous $hitc$ will be zero.
+* $p_2$: “at least one median response is outside the cutoff band” 
+  * At least one dose group has a median response value (central tendency of observed responses within the does group) "outside" the cutoff band (when considering bidirectional fitting). Responses greater than the cutoff in the positive (“+”) direction and less than the cutoff in the negative (“–”) direction.
+* $p_3$: “the top of the fitted curve is outside the cutoff band”
+  * Determine whether the predicted maximal response from baseline (`top`) exceeds the cutoff, i.e. the response corresponding to the effect size of interest is outside the cutoff band (less than cutoff in the negative direction and greater than cutoff in the positive direction). 
+  
+See the Hitcalling section of the [tcplfit2 R package](https://CRAN.R-project.org/package=tcplfit2) vignette for more information on how the weights are calculated. 
 
 See [Sheffield et al., 2021](https://doi.org/10.1093/bioinformatics/btab779) for more information on tcplfit2.
 
@@ -2174,11 +2176,11 @@ A 90% confidence interval around the BMD, bounded by the benchmark dose lower bo
 
 <center>![<font style="font-size:15px"><i>Fit Category Tree (<a target="_blank" rel="noopener noreferrer" href="https://github.com/user-attachments/assets/d60cecc2-2aa2-4198-b692-33c611195d71">view in full resolution</a>)</i></font>](img/Fig5_fitc_tree_10jul2023.png)</center>
 
-A hierarchical fit category ($\mathit{fitc}$) decision tree is used to bin each fit as shown in Figure 2. Each fit falls into one leaf of the tree using the described logic with the final $\mathit{fitc}$ indicated with gray boxes. Abbreviations are defined as: $\mathit{conc}$ = concentration; $\mathit{hitc}$ = hit call; $\mathit{|top|}$ = absolute value of the modeled curve top; $\mathit{coff}$ = cutoff; $log_c(min)$ = minimum log~10~ concentration tested; $log_c(max)$ = maximum log~10~ concentration tested; AC~50~ = $50 \%$ activity concentration; AC~95~ = $95 \%$ activity concentration.
+A hierarchical fit category ($\mathit{fitc}$) decision tree is used to bin each fit as shown in Figure 2. Each fit falls into one leaf of the tree using the described logic with the final $\mathit{fitc}$ indicated with gray boxes. Abbreviations are defined as: $\mathit{conc}$ = concentration; $\mathit{hitc}$ = hit call; $\mathit{|top|}$ = absolute value of the maximal predicted change in response from baseline (i.e. $\mathit{y=0}$); $\mathit{coff}$ = cutoff; $log_c(min)$ = minimum log~10~ concentration tested; $log_c(max)$ = maximum log~10~ concentration tested; AC~50~ = $50 \%$ activity concentration; AC~95~ = $95 \%$ activity concentration.
 
 After curve fitting, all concentration series are assigned a fit category ($\mathit{fitc}$) based on similar characteristics and shape. Logic is based on relative activity, efficacy, and potency comparisons as shown in Figure 5. For continuity purposes, $\mathit{fitc}$ numbering has been conserved from past <font face="CMTT10">tcpl</font> versions. Grouping all series into $\mathit{fitc}$ enables quality control and can be useful in data cleaning applications, especially when considered with Level 6 flags. In <font face="CMTT10">invitrodb v3-3.5</font>, a common filtering approach removed the least reproducible curve-fits, i.e. those with very low AC~50~ (below the screened $\mathit{conc}$ range) and low efficacy (within 1.2-fold of the cutoff) as well as 3+ flags. However, preliminary investigation into <font face="CMTT10">invitrodb v4.1-4.2</font> has suggested that removing curve fits with 4 or more flags, or possibly filtering based on specific flags in combination with fitc such as fitc 36, may be a more appropriate filtering approach due to changes in curve fitting and flags in <font face="CMTT10">invitrodb v4</font> and beyond. The stringency of filtering for flags should be explored in a fit-for-purpose way.
 
-Fit category is largely based upon the relative efficacy and, in the case of actives, the location of the AC~50~ and concentration at $95 \%$ activity (an estimate of maximum activity concentration, AC~95~) compared to the tested concentration range. All concentration response curves are first split into active, inactive, or cannot determine. “Cannot determine” is indicative of exceptions that cannot be curve-fit, e.g. a concentration series with fewer than 4 concentrations. Active designations are determined for $\mathit{fitc}$ based on whether the $\mathit{hitc}$ surpasses the 0.90 threshold. For those series that are designated inactive with a $\mathit{hitc}$ less than 0.90, $\mathit{fitc}$ can be used to indicate to what extent the curve represents borderline inactivity via comparison of top modeled efficacy to the cutoff (i.e, the absolute value of the modeled top is less than 0.8 times the cutoff).
+Fit category is largely based upon the relative efficacy and, in the case of actives, the location of the AC~50~ and concentration at $95 \%$ activity (an estimate of maximum activity concentration, AC~95~) compared to the tested concentration range. All concentration response curves are first split into active, inactive, or cannot determine. “Cannot determine” is indicative of exceptions that cannot be curve-fit, e.g. a concentration series with fewer than 4 concentrations. Active designations are determined for $\mathit{fitc}$ based on whether the $\mathit{hitc}$ surpasses the 0.90 threshold. For those series that are designated inactive with a $\mathit{hitc}$ less than 0.90, $\mathit{fitc}$ can be used to indicate to what extent the curve represents borderline inactivity via comparison of top modeled efficacy to the cutoff (i.e, the absolute value of the maximal predicted change from baseline is less than 0.8 times the cutoff).
 
 For active curves, efficacy, as represented by the modeled top, is compared to 1.2 times the cutoff (less than or equal to, or greater than), thereby differentiating curves that may represent borderline activity from moderate activity. Active curves also have potency metrics estimated, e.g., AC~50~ and AC~95~ values, that can be compared to the range of concentrations screened to indicate curves for which potency estimates are more quantitatively informative. Curves for which the AC~50~ is less than or equal to the minimum concentration tested ($\mathit{fitc}$ = 36, 40) may indicate AC~50~ values that are less quantitatively informative than AC~50~ values within the concentration range screened. When the AC~50~ is greater than the minimum concentration tested but the AC~95~ is greater than or equal to the maximum concentration tested ($\mathit{fitc}$ = 38, 42), it is possible the maximum activity was not fully observed in the concentration range screened. $\mathit{Fitc}$ for curves where the AC~50~ and AC~95~ are both within the concentration range screened ($\mathit{fitc}$ = 37, 41) represent the most quantitatively informative AC~50~ values.
 
@@ -2223,7 +2225,7 @@ FlagDescription <- c("Flag series if model directionality is questionable, i.e.
                      square error $(rmse)$ for the series is greater than the cutoff $(coff)$; 
                      $rmse > coff$",
                      "Flag series if borderline activity is suspected based on modeled top 
-                     parameter $(top)$ relative to cutoff $(coff)$; $0.8 * coff <= |top| <= 1.2 * coff$",
+                      $(top)$ relative to cutoff $(coff)$; $0.8 * coff <= |top| <= 1.2 * coff$",
                      "Flag series if the average number of replicates per concentration is less than
                      2; $nrep < 2$.",
                      "Flag series if 4 concentrations or less were tested; $nconc <= 4$.",