Dealing with the error messages of "missing or infinite values" or "missing values and NaN's" from calling qvalue() or pi0est()

Like other users, we came across these errors with our data and have done some debugging (qvalue version 2.32.0; R version 4.3.1).  I think that these errors are due to unexpected behavior of `findInterval()` and `tabulate()`, the two functions used in `pi0est()`, which is in turn called by the `qvalue()` function.  

Here is the input p-value vector we used:

```
> tmp.p
  [1] 0.000000e+00 0.000000e+00 9.115477e-01 0.000000e+00 0.000000e+00
  [6] 3.336338e-01 0.000000e+00 2.726045e-08 3.108624e-15 0.000000e+00
 [11] 0.000000e+00 0.000000e+00 0.000000e+00 1.357958e-11 0.000000e+00
 [16] 3.856843e-03 0.000000e+00 2.925816e-04 0.000000e+00 6.495249e-12
 [21] 3.713274e-11 4.321365e-11 3.597123e-14 3.204471e-03 0.000000e+00
 [26] 9.569107e-03 4.440892e-15 2.224478e-02 0.000000e+00 0.000000e+00
 [31] 1.339865e-03 0.000000e+00 5.534483e-05 0.000000e+00 0.000000e+00
 [36] 9.760413e-05 5.216207e-06 0.000000e+00 0.000000e+00 4.793859e-01
 [41] 0.000000e+00 0.000000e+00 2.014265e-05 0.000000e+00 0.000000e+00
 [46] 1.775049e-09 2.462234e-03 0.000000e+00 0.000000e+00 6.318696e-01
 [51] 0.000000e+00 8.194909e-03 0.000000e+00 0.000000e+00 0.000000e+00
 [56] 5.049079e-10 0.000000e+00 5.062230e-01 5.554316e-05 0.000000e+00
 [61] 7.600417e-05 1.635782e-02 0.000000e+00 1.093950e-01 1.281121e-04
 [66] 0.000000e+00 6.194023e-01 0.000000e+00 1.001962e-06 6.646150e-11
 [71] 9.172546e-01 2.849969e-01 3.437511e-01 3.351566e-01 6.483649e-01
 [76] 1.255449e-01 7.011413e-01 1.962199e-01 5.366918e-02 5.416075e-01
 [81] 7.638786e-01 3.642310e-01 1.424749e-02 5.753318e-02 3.731248e-01
 [86] 9.191109e-01 9.034551e-01 6.005921e-01 2.480674e-02 4.445593e-01
 [91] 6.079983e-01 2.083319e-01 4.481868e-01 5.910949e-01 2.630329e-01
 [96] 2.480999e-01 8.275802e-01 3.632738e-01 9.944623e-02 8.778394e-01
[101] 4.098542e-01

> summary (tmp.p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000000 0.0000000 0.0000555 0.1704693 0.3336338 0.9191109 

> sum (tmp.p < 0.05)
[1] 65
```

```
> qvalue (tmp.p)
Error in smooth.spline(lambda, pi0, df = smooth.df) : 
  missing or infinite values in inputs are not allowed
> ?qvalue
> qvalue (tmp.p, pi0.method = "bootstrap")
Error in quantile.default(pi0, prob = 0.1) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
> qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2)
Error in quantile.default(pi0, prob = 0.1) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
```
The qvalue() calls pi0est() to estimate pi0.  The latter uses findInterval() and tabulate() to count the number of p-values in a series of nonoverlapping intervals given by lambda:
```
> lambda <- seq(0.05, 0.95, 0.05)
> lambda
 [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
[16] 0.80 0.85 0.90 0.95
> length (lambda)
[1] 19
> findInterval(tmp.p, vec = lambda)
  [1]  0  0 18  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [25]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0
 [49]  0 12  0  0  0  0  0  0  0 10  0  0  0  0  0  2  0  0 12  0  0  0 18  5
 [73]  6  6 12  2 14  3  1 10 15  7  0  1  7 18 18 12  0  8 12  4  8 11  5  4
 [97] 16  7  1 17  8
# This output means that the intervals generated by lambda are as follows:
# 0: <0.05; 1: [0.05, 0.1); 2: [0.1, 0.15), ..., 18: [0.90, 0.95), 19: >=0.95.
# Note that the above output contains 65 0s and no 19.
> tabulate(findInterval(tmp.p, vec = lambda))
 [1] 3 2 1 2 2 3 3 3 1 2 1 5 0 1 1 1 1 4
# Note that 65 is not in the output above
> length (tabulate(findInterval(tmp.p, vec = lambda)))
[1] 18
```
The problem here is that "`tabulate(findInterval(tmp.p, vec = lambda))`" ignores the intervals at the two ends: <0.05 and >=0.95.  It ignores the left interval of <0.05 because the tabulate() function ignores zeros in the input by default (see its documentation), even though there are 65 values in this interval.  It ignores the right interval of >=0.95 because tmp.p contains has no value above 0.95 and therefore "`findInterval(tmp.p, vec = lambda)`" returns no 19.

In fact, in the R package vignette, the 605 hedenfalk p-values that are <0.05 are also ignored in the pi0 estimation.  The output from debugging the pi0est() function is as follows:

```
> debug (pi0est)
> length (hedenfalk$p)
[1] 3170
> summary (hedenfalk$p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517 
> qobj <- qvalue(p = hedenfalk$p)

Browse[2]> length (p)
[1] 3170
Browse[2]> summary (p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517 
Browse[2]> lambda
 [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
[15] 0.75 0.80 0.85 0.90 0.95
Browse[2]> test.interval <- findInterval(p, vec = lambda)
Browse[2]> table (test.interval)
test.interval
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
605 263 194 190 180 154 128 130 134 120 108 101 113  83 115 118 109 122 
 18  19 
 94 109 
Browse[2]> tabulate(findInterval(p, vec = lambda))
 [1] 263 194 190 180 154 128 130 134 120 108 101 113  83 115 118 109 122
[18]  94 109
```

It is possible that the leftmost interval (<0.05) may not be necessary for later steps in estimating pi0; this part is not entirely clear to me.  But to get around the programming problem that generates the error messages, we can at least change how lambda is specified:

```
> lambda <- seq(0, max(tmp.p), 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
      lambda count
 [1,]   0.00    65
 [2,]   0.05     3
 [3,]   0.10     2
 [4,]   0.15     1
 [5,]   0.20     2
 [6,]   0.25     2
 [7,]   0.30     3
 [8,]   0.35     3
 [9,]   0.40     3
[10,]   0.45     1
[11,]   0.50     2
[12,]   0.55     1
[13,]   0.60     5
[14,]   0.65     0
[15,]   0.70     1
[16,]   0.75     1
[17,]   0.80     1
[18,]   0.85     1
[19,]   0.90     4
```
In principle, there is also an interval #0, which is <0 and of course has no count.  We can therefore ignore this interval safely.

@ajbass suggested an upper bound of "`max(tmp.p)-0.05`" in issue #20:
```
> lambda <- seq(0, max(tmp.p)-0.05, 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
      lambda count
 [1,]   0.00    65
 [2,]   0.05     3
 [3,]   0.10     2
 [4,]   0.15     1
 [5,]   0.20     2
 [6,]   0.25     2
 [7,]   0.30     3
 [8,]   0.35     3
 [9,]   0.40     3
[10,]   0.45     1
[11,]   0.50     2
[12,]   0.55     1
[13,]   0.60     5
[14,]   0.65     0
[15,]   0.70     1
[16,]   0.75     1
[17,]   0.80     1
[18,]   0.85     5
```
This upper bound results in a wider last interval.  We can double check the counts in the last intervals:
```
> sum (tmp.p>=0.95)
[1] 0
> sum (tmp.p>=0.9)
[1] 4
> sum (tmp.p>=0.85)
[1] 5
```

These experiments suggest several ways to deal with the error messages, with a higher estimated pi0 corresponding to fewer positives:

```
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3198683
> sum (result$significant)
[1] 68

> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p)-0.05, 0.05))
> result$pi0
[1] 0.2835713
> sum (result$significant)
[1] 69

> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05), pi0.method = "bootstrap")
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69

> result <- qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69

# this option uses a fixed lambda instead of a sequence of intervals
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = 0.5)
> result$pi0
[1] 0.3168317
> sum (result$significant)
[1] 68
> result$lambda
[1] 0.5
```

We can also rerun the function on the hedenfalk p-values and compare the results:

Using the default lambda:
```
> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05)
> qobj$pi0
[1] 0.669926
> sum (qobj$significant)
[1] 162
```
Using a revised lambda:
```
> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05, lambda = seq (0, max(hedenfalk$p), 0.05))
> qobj$pi0
[1] 0.6681177
> sum (qobj$significant)
[1] 162
```
On this dataset, there is no qualitative difference.  Again, perhaps the leftmost interval (<0.05) is not necessary in the estimation of pi0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dealing with the error messages of "missing or infinite values" or "missing values and NaN's" from calling qvalue() or pi0est() #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dealing with the error messages of "missing or infinite values" or "missing values and NaN's" from calling qvalue() or pi0est() #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions