Skip to content

Commit 060dd50

Browse files
authored
Merge pull request #349 from USEPA/347-tcpl-top-and-loglikelihood-updates-ahead-of-cran-release
347-tcpl-top-and-loglikelihood-updates-ahead-of-cran-release
2 parents 8846e72 + baf91e8 commit 060dd50

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

Diff for: vignettes/Introduction_Appendices.Rmd

+15-13
Original file line numberDiff line numberDiff line change
@@ -1557,12 +1557,12 @@ OutputParameters <- c(
15571557
"a (y-scale) </br> b (x-scale)", # quadratic
15581558
"a (y-scale) </br> b (x-scale)", # quadratic
15591559
"a (y-scale) </br> p (power)", # power
1560-
"tp (top) </br> ga (gain AC50) </br> p (gain-power)", # hill
1561-
"tp (top) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss
1560+
"tp (top parameter) </br> ga (gain AC50) </br> p (gain-power)", # hill
1561+
"tp (top parameter) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss
15621562
"a (y-scale) </br> b (x-scale)", # exp2
15631563
"a (y-scale) </br> b (x-scale) </br> p (power)", # exp3
1564-
"tp (top) </br> ga (AC50)", # exp4
1565-
"tp (top) </br> ga (AC50) </br> p (power)" # exp5
1564+
"tp (top parameter) </br> ga (AC50)", # exp4
1565+
"tp (top parameter) </br> ga (AC50) </br> p (power)" # exp5
15661566
)
15671567
# Fifth column - additional model details.
15681568
Details <- c(
@@ -2117,12 +2117,14 @@ In tcpl v2, activity hit calls (hitc) were binary, where 0 was negative, 1 was p
21172117

21182118
Continuous $hitc$ as defined in [tcplfit2 R package](https://CRAN.R-project.org/package=tcplfit2) is calculated as the product of three proportional weights representing the confidence that:
21192119

2120-
* $p1$: “the winning AIC value is less than that of the constant model.”
2121-
* Determine whether the constant model – if allowed to win – is a better fit than the winning model – i.e., is the winning model essentially flat or not.The constant model may never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous hitc will be zero.
2122-
* $p2$: “at least one median response is greater than the cutoff.”
2123-
* At least one dose group has a central tendency of the response values “outside” the cutoff band (consider bi-directional). Response is greater than cutoff in “+” direction and less than cutoff in “–” direction.
2124-
* $p3$: “the top of the fitted curve is above the cutoff”
2125-
* Determine whether the predicted maximal response exceeds the cutoff, i.e. the response corresponding to the effect size of interest.
2120+
* $p_1$: “the winning AIC value is less than that of the constant model”
2121+
* Determine whether the constant model – if it were allowed to win – is a better fit to the observed data than the winning model – i.e., is the winning model essentially flat or not. The constant model can never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous $hitc$ will be zero.
2122+
* $p_2$: “at least one median response is outside the cutoff band”
2123+
* At least one dose group has a median response value (central tendency of observed responses within the does group) "outside" the cutoff band (when considering bidirectional fitting). Responses greater than the cutoff in the positive (“+”) direction and less than the cutoff in the negative (“–”) direction.
2124+
* $p_3$: “the top of the fitted curve is outside the cutoff band”
2125+
* Determine whether the predicted maximal response from baseline (`top`) exceeds the cutoff, i.e. the response corresponding to the effect size of interest is outside the cutoff band (less than cutoff in the negative direction and greater than cutoff in the positive direction).
2126+
2127+
See the Hitcalling section of the [tcplfit2 R package](https://CRAN.R-project.org/package=tcplfit2) vignette for more information on how the weights are calculated.
21262128

21272129
See [Sheffield et al., 2021](https://doi.org/10.1093/bioinformatics/btab779) for more information on tcplfit2.
21282130

@@ -2174,11 +2176,11 @@ A 90% confidence interval around the BMD, bounded by the benchmark dose lower bo
21742176

21752177
<center>![<font style="font-size:15px"><i>Fit Category Tree (<a target="_blank" rel="noopener noreferrer" href="https://github.com/user-attachments/assets/d60cecc2-2aa2-4198-b692-33c611195d71">view in full resolution</a>)</i></font>](img/Fig5_fitc_tree_10jul2023.png)</center>
21762178

2177-
A hierarchical fit category ($\mathit{fitc}$) decision tree is used to bin each fit as shown in Figure 2. Each fit falls into one leaf of the tree using the described logic with the final $\mathit{fitc}$ indicated with gray boxes. Abbreviations are defined as: $\mathit{conc}$ = concentration; $\mathit{hitc}$ = hit call; $\mathit{|top|}$ = absolute value of the modeled curve top; $\mathit{coff}$ = cutoff; $log_c(min)$ = minimum log~10~ concentration tested; $log_c(max)$ = maximum log~10~ concentration tested; AC~50~ = $50 \%$ activity concentration; AC~95~ = $95 \%$ activity concentration.
2179+
A hierarchical fit category ($\mathit{fitc}$) decision tree is used to bin each fit as shown in Figure 2. Each fit falls into one leaf of the tree using the described logic with the final $\mathit{fitc}$ indicated with gray boxes. Abbreviations are defined as: $\mathit{conc}$ = concentration; $\mathit{hitc}$ = hit call; $\mathit{|top|}$ = absolute value of the maximal predicted change in response from baseline (i.e. $\mathit{y=0}$); $\mathit{coff}$ = cutoff; $log_c(min)$ = minimum log~10~ concentration tested; $log_c(max)$ = maximum log~10~ concentration tested; AC~50~ = $50 \%$ activity concentration; AC~95~ = $95 \%$ activity concentration.
21782180

21792181
After curve fitting, all concentration series are assigned a fit category ($\mathit{fitc}$) based on similar characteristics and shape. Logic is based on relative activity, efficacy, and potency comparisons as shown in Figure 5. For continuity purposes, $\mathit{fitc}$ numbering has been conserved from past <font face="CMTT10">tcpl</font> versions. Grouping all series into $\mathit{fitc}$ enables quality control and can be useful in data cleaning applications, especially when considered with Level 6 flags. In <font face="CMTT10">invitrodb v3-3.5</font>, a common filtering approach removed the least reproducible curve-fits, i.e. those with very low AC~50~ (below the screened $\mathit{conc}$ range) and low efficacy (within 1.2-fold of the cutoff) as well as 3+ flags. However, preliminary investigation into <font face="CMTT10">invitrodb v4.1-4.2</font> has suggested that removing curve fits with 4 or more flags, or possibly filtering based on specific flags in combination with fitc such as fitc 36, may be a more appropriate filtering approach due to changes in curve fitting and flags in <font face="CMTT10">invitrodb v4</font> and beyond. The stringency of filtering for flags should be explored in a fit-for-purpose way.
21802182

2181-
Fit category is largely based upon the relative efficacy and, in the case of actives, the location of the AC~50~ and concentration at $95 \%$ activity (an estimate of maximum activity concentration, AC~95~) compared to the tested concentration range. All concentration response curves are first split into active, inactive, or cannot determine. “Cannot determine” is indicative of exceptions that cannot be curve-fit, e.g. a concentration series with fewer than 4 concentrations. Active designations are determined for $\mathit{fitc}$ based on whether the $\mathit{hitc}$ surpasses the 0.90 threshold. For those series that are designated inactive with a $\mathit{hitc}$ less than 0.90, $\mathit{fitc}$ can be used to indicate to what extent the curve represents borderline inactivity via comparison of top modeled efficacy to the cutoff (i.e, the absolute value of the modeled top is less than 0.8 times the cutoff).
2183+
Fit category is largely based upon the relative efficacy and, in the case of actives, the location of the AC~50~ and concentration at $95 \%$ activity (an estimate of maximum activity concentration, AC~95~) compared to the tested concentration range. All concentration response curves are first split into active, inactive, or cannot determine. “Cannot determine” is indicative of exceptions that cannot be curve-fit, e.g. a concentration series with fewer than 4 concentrations. Active designations are determined for $\mathit{fitc}$ based on whether the $\mathit{hitc}$ surpasses the 0.90 threshold. For those series that are designated inactive with a $\mathit{hitc}$ less than 0.90, $\mathit{fitc}$ can be used to indicate to what extent the curve represents borderline inactivity via comparison of top modeled efficacy to the cutoff (i.e, the absolute value of the maximal predicted change from baseline is less than 0.8 times the cutoff).
21822184

21832185
For active curves, efficacy, as represented by the modeled top, is compared to 1.2 times the cutoff (less than or equal to, or greater than), thereby differentiating curves that may represent borderline activity from moderate activity. Active curves also have potency metrics estimated, e.g., AC~50~ and AC~95~ values, that can be compared to the range of concentrations screened to indicate curves for which potency estimates are more quantitatively informative. Curves for which the AC~50~ is less than or equal to the minimum concentration tested ($\mathit{fitc}$ = 36, 40) may indicate AC~50~ values that are less quantitatively informative than AC~50~ values within the concentration range screened. When the AC~50~ is greater than the minimum concentration tested but the AC~95~ is greater than or equal to the maximum concentration tested ($\mathit{fitc}$ = 38, 42), it is possible the maximum activity was not fully observed in the concentration range screened. $\mathit{Fitc}$ for curves where the AC~50~ and AC~95~ are both within the concentration range screened ($\mathit{fitc}$ = 37, 41) represent the most quantitatively informative AC~50~ values.
21842186

@@ -2223,7 +2225,7 @@ FlagDescription <- c("Flag series if model directionality is questionable, i.e.
22232225
square error $(rmse)$ for the series is greater than the cutoff $(coff)$;
22242226
$rmse > coff$",
22252227
"Flag series if borderline activity is suspected based on modeled top
2226-
parameter $(top)$ relative to cutoff $(coff)$; $0.8 * coff <= |top| <= 1.2 * coff$",
2228+
$(top)$ relative to cutoff $(coff)$; $0.8 * coff <= |top| <= 1.2 * coff$",
22272229
"Flag series if the average number of replicates per concentration is less than
22282230
2; $nrep < 2$.",
22292231
"Flag series if 4 concentrations or less were tested; $nconc <= 4$.",

0 commit comments

Comments
 (0)