Skip to content

Dealing with the error messages of "missing or infinite values" or "missing values and NaN's" from calling qvalue() or pi0est() #38

@audreyqyfu

Description

@audreyqyfu

Like other users, we came across these errors with our data and have done some debugging (qvalue version 2.32.0; R version 4.3.1). I think that these errors are due to unexpected behavior of findInterval() and tabulate(), the two functions used in pi0est(), which is in turn called by the qvalue() function.

Here is the input p-value vector we used:

> tmp.p
  [1] 0.000000e+00 0.000000e+00 9.115477e-01 0.000000e+00 0.000000e+00
  [6] 3.336338e-01 0.000000e+00 2.726045e-08 3.108624e-15 0.000000e+00
 [11] 0.000000e+00 0.000000e+00 0.000000e+00 1.357958e-11 0.000000e+00
 [16] 3.856843e-03 0.000000e+00 2.925816e-04 0.000000e+00 6.495249e-12
 [21] 3.713274e-11 4.321365e-11 3.597123e-14 3.204471e-03 0.000000e+00
 [26] 9.569107e-03 4.440892e-15 2.224478e-02 0.000000e+00 0.000000e+00
 [31] 1.339865e-03 0.000000e+00 5.534483e-05 0.000000e+00 0.000000e+00
 [36] 9.760413e-05 5.216207e-06 0.000000e+00 0.000000e+00 4.793859e-01
 [41] 0.000000e+00 0.000000e+00 2.014265e-05 0.000000e+00 0.000000e+00
 [46] 1.775049e-09 2.462234e-03 0.000000e+00 0.000000e+00 6.318696e-01
 [51] 0.000000e+00 8.194909e-03 0.000000e+00 0.000000e+00 0.000000e+00
 [56] 5.049079e-10 0.000000e+00 5.062230e-01 5.554316e-05 0.000000e+00
 [61] 7.600417e-05 1.635782e-02 0.000000e+00 1.093950e-01 1.281121e-04
 [66] 0.000000e+00 6.194023e-01 0.000000e+00 1.001962e-06 6.646150e-11
 [71] 9.172546e-01 2.849969e-01 3.437511e-01 3.351566e-01 6.483649e-01
 [76] 1.255449e-01 7.011413e-01 1.962199e-01 5.366918e-02 5.416075e-01
 [81] 7.638786e-01 3.642310e-01 1.424749e-02 5.753318e-02 3.731248e-01
 [86] 9.191109e-01 9.034551e-01 6.005921e-01 2.480674e-02 4.445593e-01
 [91] 6.079983e-01 2.083319e-01 4.481868e-01 5.910949e-01 2.630329e-01
 [96] 2.480999e-01 8.275802e-01 3.632738e-01 9.944623e-02 8.778394e-01
[101] 4.098542e-01

> summary (tmp.p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000000 0.0000000 0.0000555 0.1704693 0.3336338 0.9191109 

> sum (tmp.p < 0.05)
[1] 65
> qvalue (tmp.p)
Error in smooth.spline(lambda, pi0, df = smooth.df) : 
  missing or infinite values in inputs are not allowed
> ?qvalue
> qvalue (tmp.p, pi0.method = "bootstrap")
Error in quantile.default(pi0, prob = 0.1) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
> qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2)
Error in quantile.default(pi0, prob = 0.1) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE

The qvalue() calls pi0est() to estimate pi0. The latter uses findInterval() and tabulate() to count the number of p-values in a series of nonoverlapping intervals given by lambda:

> lambda <- seq(0.05, 0.95, 0.05)
> lambda
 [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
[16] 0.80 0.85 0.90 0.95
> length (lambda)
[1] 19
> findInterval(tmp.p, vec = lambda)
  [1]  0  0 18  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [25]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0
 [49]  0 12  0  0  0  0  0  0  0 10  0  0  0  0  0  2  0  0 12  0  0  0 18  5
 [73]  6  6 12  2 14  3  1 10 15  7  0  1  7 18 18 12  0  8 12  4  8 11  5  4
 [97] 16  7  1 17  8
# This output means that the intervals generated by lambda are as follows:
# 0: <0.05; 1: [0.05, 0.1); 2: [0.1, 0.15), ..., 18: [0.90, 0.95), 19: >=0.95.
# Note that the above output contains 65 0s and no 19.
> tabulate(findInterval(tmp.p, vec = lambda))
 [1] 3 2 1 2 2 3 3 3 1 2 1 5 0 1 1 1 1 4
# Note that 65 is not in the output above
> length (tabulate(findInterval(tmp.p, vec = lambda)))
[1] 18

The problem here is that "tabulate(findInterval(tmp.p, vec = lambda))" ignores the intervals at the two ends: <0.05 and >=0.95. It ignores the left interval of <0.05 because the tabulate() function ignores zeros in the input by default (see its documentation), even though there are 65 values in this interval. It ignores the right interval of >=0.95 because tmp.p contains has no value above 0.95 and therefore "findInterval(tmp.p, vec = lambda)" returns no 19.

In fact, in the R package vignette, the 605 hedenfalk p-values that are <0.05 are also ignored in the pi0 estimation. The output from debugging the pi0est() function is as follows:

> debug (pi0est)
> length (hedenfalk$p)
[1] 3170
> summary (hedenfalk$p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517 
> qobj <- qvalue(p = hedenfalk$p)

Browse[2]> length (p)
[1] 3170
Browse[2]> summary (p)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000032 0.0845647 0.2998155 0.3718702 0.6316112 0.9998517 
Browse[2]> lambda
 [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
[15] 0.75 0.80 0.85 0.90 0.95
Browse[2]> test.interval <- findInterval(p, vec = lambda)
Browse[2]> table (test.interval)
test.interval
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
605 263 194 190 180 154 128 130 134 120 108 101 113  83 115 118 109 122 
 18  19 
 94 109 
Browse[2]> tabulate(findInterval(p, vec = lambda))
 [1] 263 194 190 180 154 128 130 134 120 108 101 113  83 115 118 109 122
[18]  94 109

It is possible that the leftmost interval (<0.05) may not be necessary for later steps in estimating pi0; this part is not entirely clear to me. But to get around the programming problem that generates the error messages, we can at least change how lambda is specified:

> lambda <- seq(0, max(tmp.p), 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
      lambda count
 [1,]   0.00    65
 [2,]   0.05     3
 [3,]   0.10     2
 [4,]   0.15     1
 [5,]   0.20     2
 [6,]   0.25     2
 [7,]   0.30     3
 [8,]   0.35     3
 [9,]   0.40     3
[10,]   0.45     1
[11,]   0.50     2
[12,]   0.55     1
[13,]   0.60     5
[14,]   0.65     0
[15,]   0.70     1
[16,]   0.75     1
[17,]   0.80     1
[18,]   0.85     1
[19,]   0.90     4

In principle, there is also an interval #0, which is <0 and of course has no count. We can therefore ignore this interval safely.

@ajbass suggested an upper bound of "max(tmp.p)-0.05" in issue #20:

> lambda <- seq(0, max(tmp.p)-0.05, 0.05)
> cbind (lambda, count = tabulate(findInterval(tmp.p, vec = lambda)))
      lambda count
 [1,]   0.00    65
 [2,]   0.05     3
 [3,]   0.10     2
 [4,]   0.15     1
 [5,]   0.20     2
 [6,]   0.25     2
 [7,]   0.30     3
 [8,]   0.35     3
 [9,]   0.40     3
[10,]   0.45     1
[11,]   0.50     2
[12,]   0.55     1
[13,]   0.60     5
[14,]   0.65     0
[15,]   0.70     1
[16,]   0.75     1
[17,]   0.80     1
[18,]   0.85     5

This upper bound results in a wider last interval. We can double check the counts in the last intervals:

> sum (tmp.p>=0.95)
[1] 0
> sum (tmp.p>=0.9)
[1] 4
> sum (tmp.p>=0.85)
[1] 5

These experiments suggest several ways to deal with the error messages, with a higher estimated pi0 corresponding to fewer positives:

> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3198683
> sum (result$significant)
[1] 68

> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p)-0.05, 0.05))
> result$pi0
[1] 0.2835713
> sum (result$significant)
[1] 69

> result <- qvalue (tmp.p, fdr.level=0.05, lambda = seq (0, max(tmp.p), 0.05), pi0.method = "bootstrap")
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69

> result <- qvalue (tmp.p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2, lambda = seq (0, max(tmp.p), 0.05))
> result$pi0
[1] 0.3060306
> sum (result$significant)
[1] 69

# this option uses a fixed lambda instead of a sequence of intervals
> result <- qvalue (tmp.p, fdr.level=0.05, lambda = 0.5)
> result$pi0
[1] 0.3168317
> sum (result$significant)
[1] 68
> result$lambda
[1] 0.5

We can also rerun the function on the hedenfalk p-values and compare the results:

Using the default lambda:

> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05)
> qobj$pi0
[1] 0.669926
> sum (qobj$significant)
[1] 162

Using a revised lambda:

> qobj <- qvalue(p = hedenfalk$p, fdr.level=0.05, lambda = seq (0, max(hedenfalk$p), 0.05))
> qobj$pi0
[1] 0.6681177
> sum (qobj$significant)
[1] 162

On this dataset, there is no qualitative difference. Again, perhaps the leftmost interval (<0.05) is not necessary in the estimation of pi0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions