-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Like other users, we came across these errors with our data and have done some debugging (qvalue version 2.32.0; R version 4.3.1). I think that these errors are due to unexpected behavior of findInterval() and tabulate(), the two functions used in pi0est(), which is in turn called by the qvalue() function.
Here is the input p-value vector we used:
> tmp.p
[1] 0.000000e+00 0.000000e+00 9.115477e-01 0.000000e+00 0.000000e+00
[6] 3.336338e-01 0.000000e+00 2.726045e-08 3.108624e-15 0.000000e+00
[11] 0.000000e+00 0.000000e+00 0.000000e+00 1.357958e-11 0.000000e+00
[16] 3.856843e-03 0.000000e+00 2.925816e-04 0.000000e+00 6.495249e-12
[21] 3.713274e-11 4.321365e-11 3.597123e-14 3.204471e-03 0.000000e+00
[26] 9.569107e-03 4.440892e-15 2.224478e-02 0.000000e+00 0.000000e+00
[31] 1.339865e-03 0.000000e+00 5.534483e-05 0.000000e+00 0.000000e+00
[36] 9.760413e-05 5.216207e-06 0.000000e+00 0.000000e+00 4.793859e-01
[41] 0.000000e+00 0.000000e+00 2.014265e-05 0.000000e+00 0.000000e+00
[46] 1.775049e-09 2.462234e-03 0.000000e+00 0.000000e+00 6.318696e-01
[51] 0.000000e+00 8.194909e-03 0.000000e+00 0.000000e+00 0.000000e+00
[56] 5.049079e-10 0.000000e+00 5.062230e-01 5.554316e-05 0.000000e+00
[61] 7.600417e-05 1.635782e-02 0.000000e+00 1.093950e-01 1.281121e-04
[66] 0.000000e+00 6.194023e-01 0.000000e+00 1.001962e-06 6.646150e-11
[71] 9.172546e-01 2.849969e-01 3.437511e-01 3.351566e-01 6.483649e-01
[76] 1.255449e-01 7.011413e-01 1.962199e-01 5.366918e-02 5.416075e-01
[81] 7.638786e-01 3.642310e-01 1.424749e-02 5.753318e-02 3.731248e-01
[86] 9.191109e-01 9.034551e-01 6.005921e-01 2.480674e-02 4.445593e-01
[91] 6.079983e-01 2.083319e-01 4.481868e-01 5.910949e-01 2.630329e-01
[96] 2.480999e-01 8.275802e-01 3.632738e-01 9.944623e-02 8.778394e-01
[101] 4.098542e-01
> summary (tmp.p)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000555 0.1704693 0.3336338 0.9191109
> sum (tmp.p < 0.05)
[1] 65
> qvalue (tmp.p)
Error in smooth.spline(lambda, pi0, df = smooth.df) :
missing or infinite values in inputs are not allowed
> ?qvalue
> qvalue (tmp.p, pi0.method = "bootstrap")
Error in quantile.default(pi0, prob = 0.1) :
missing values and NaN's not allowed if 'na.rm' is FALSE
> qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2)
Error in quantile.default(pi0, prob = 0.1) :
missing values and NaN's not allowed if 'na.rm' is FALSE
The qvalue() calls pi0est() to estimate pi0. The latter uses findInterval() and tabulate() to count the number of p-values in a series of nonoverlapping intervals given by lambda:
> lambda <- seq(0.05, 0.95, 0.05)
> lambda
[1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
[16] 0.80 0.85 0.90 0.95
> length (lambda)
[1] 19
> findInterval(tmp.p, vec = lambda)
[1] 0 0 18 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[25] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0
[49] 0 12 0 0 0 0 0 0 0 10 0 0 0 0 0 2 0 0 12 0 0 0 18 5
[73] 6 6 12 2 14 3 1 10 15 7 0 1 7 18 18 12 0 8 12 4 8 11 5 4
[97] 16 7 1 17 8
# This output means that the intervals generated by lambda are as follows:
# 0: <0.05; 1: [0.05, 0.1); 2: [0.1, 0.15), ..., 18: [0.90, 0.95), 19: >=0.95.
# Note that the above output contains 65 0s and no 19.
> tabulate(findInterval(tmp.p, vec = lambda))
[1] 3 2 1 2 2 3 3 3 1 2 1 5 0 1 1 1 1 4
# Note that 65 is not in the output above
> length (tabulate(findInterval(tmp.p, vec = lambda)))
[1] 18
The problem here is that "tabulate(findInterval(tmp.p, vec = lambda))" ignores the intervals at the two ends: <0.05 and >=0.95. It ignores the left interval of <0.05 because the tabulate() function ignores zeros in the input by default (see its documentation), even though there are 65 values in this interval. It ignores the right interval of >=0.95 because tmp.p contains has no value above 0.95 and therefore "findInterval(tmp.p, vec = lambda)" returns no 19.
In fact, in the R package vignette, the 605 hedenfalk p-values that are <0.05 are also ignored in the pi0 estimation. The output from debugging the pi0est() function is as follows:
> debug (pi0est)
> length (hedenfalk$p)
[1] 3170
> summary (hedenfalk$p)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517
> qobj <- qvalue(p = hedenfalk$p)
Browse[2]> length (p)
[1] 3170
Browse[2]> summary (p)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517
Browse[2]> lambda
[1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
[15] 0.75 0.80 0.85 0.90 0.95
Browse[2]> test.interval <- findInterval(p, vec = lambda)
Browse[2]> table (test.interval)
test.interval
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
605 263 194 190 180 154 128 130 134 120 108 101 113 83 115 118 109 122
18 19
94 109
Browse[2]> tabulate(findInterval(p, vec = lambda))
[1] 263 194 190 180 154 128 130 134 120 108 101 113 83 115 118 109 122
[18] 94 109
It is possible that the leftmost interval (<0.05) may not be necessary for later steps in estimating pi0; this part is not entirely clear to me. But to get around the programming problem that generates the error messages, we can at least change how lambda is specified:
> lambda <- seq(0, max(tmp.p), 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
lambda count
[1,] 0.00 65
[2,] 0.05 3
[3,] 0.10 2
[4,] 0.15 1
[5,] 0.20 2
[6,] 0.25 2
[7,] 0.30 3
[8,] 0.35 3
[9,] 0.40 3
[10,] 0.45 1
[11,] 0.50 2
[12,] 0.55 1
[13,] 0.60 5
[14,] 0.65 0
[15,] 0.70 1
[16,] 0.75 1
[17,] 0.80 1
[18,] 0.85 1
[19,] 0.90 4
In principle, there is also an interval #0, which is <0 and of course has no count. We can therefore ignore this interval safely.
@ajbass suggested an upper bound of "max(tmp.p)-0.05" in issue #20:
> lambda <- seq(0, max(tmp.p)-0.05, 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
lambda count
[1,] 0.00 65
[2,] 0.05 3
[3,] 0.10 2
[4,] 0.15 1
[5,] 0.20 2
[6,] 0.25 2
[7,] 0.30 3
[8,] 0.35 3
[9,] 0.40 3
[10,] 0.45 1
[11,] 0.50 2
[12,] 0.55 1
[13,] 0.60 5
[14,] 0.65 0
[15,] 0.70 1
[16,] 0.75 1
[17,] 0.80 1
[18,] 0.85 5
This upper bound results in a wider last interval. We can double check the counts in the last intervals:
> sum (tmp.p>=0.95)
[1] 0
> sum (tmp.p>=0.9)
[1] 4
> sum (tmp.p>=0.85)
[1] 5
These experiments suggest several ways to deal with the error messages, with a higher estimated pi0 corresponding to fewer positives:
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3198683
> sum (result$significant)
[1] 68
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p)-0.05, 0.05))
> result$pi0
[1] 0.2835713
> sum (result$significant)
[1] 69
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05), pi0.method = "bootstrap")
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69
> result <- qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69
# this option uses a fixed lambda instead of a sequence of intervals
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = 0.5)
> result$pi0
[1] 0.3168317
> sum (result$significant)
[1] 68
> result$lambda
[1] 0.5
We can also rerun the function on the hedenfalk p-values and compare the results:
Using the default lambda:
> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05)
> qobj$pi0
[1] 0.669926
> sum (qobj$significant)
[1] 162
Using a revised lambda:
> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05, lambda = seq (0, max(hedenfalk$p), 0.05))
> qobj$pi0
[1] 0.6681177
> sum (qobj$significant)
[1] 162
On this dataset, there is no qualitative difference. Again, perhaps the leftmost interval (<0.05) is not necessary in the estimation of pi0.