Skip to content

fast_pdf defaults can create likelihood approximation singularities #1

@mshvartsman

Description

@mshvartsman

I ran into what I think is unintended behavior in the density function in fast_pdf.

Specifically, fast_1d_kde defaults to only covering the range of the input data, and np.interp defaults to returning the edge value if queried past the edge. Both are set to their defaults in fast_pdf.

Suppose that your model can get close to a point mass somewhere over the support of the data (e.g. a DDM with a very low threshold). Then the density at that one point is pretty high. fast_1d_kde then defaults to only computing the density over that very narrow range and then np.interp defaults to returning that high density value for all the rest of the data (which lives outside the bounds of the grid the KDE function returned). Then a reasonable optimizer would keep tightening that point mass and getting ever higher likelihoods.

Below's a minimal (extreme) example, though I've originally seen this come up in a real model.

from RunDEMC.density import fast_pdf
import numpy as np 
model = np.array([1.00001,1,1,1,1], dtype=np.float64)
data =  np.array([1,2,3,4,5,6,7,8,9,10],dtype=np.float64)
print(fast_pdf(model, data, 200))  # probably not intended
# [ 98667.44131307  25479.89240456  25479.89240456  25479.89240456  25479.89240456 25479.89240456  25479.89240456  25479.89240456 25479.89240456  25479.89240456]
print(fast_pdf(model, data, 200, (0, 12)))  # looks better
# [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

An easy fix is either default the extrema to the range of (min(data,model), max(data,model) or call np.interp with left=0, right=0. I'm happy to submit a pull request with either option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions