-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I ran into what I think is unintended behavior in the density function in fast_pdf.
Specifically, fast_1d_kde defaults to only covering the range of the input data, and np.interp defaults to returning the edge value if queried past the edge. Both are set to their defaults in fast_pdf.
Suppose that your model can get close to a point mass somewhere over the support of the data (e.g. a DDM with a very low threshold). Then the density at that one point is pretty high. fast_1d_kde then defaults to only computing the density over that very narrow range and then np.interp defaults to returning that high density value for all the rest of the data (which lives outside the bounds of the grid the KDE function returned). Then a reasonable optimizer would keep tightening that point mass and getting ever higher likelihoods.
Below's a minimal (extreme) example, though I've originally seen this come up in a real model.
from RunDEMC.density import fast_pdf
import numpy as np
model = np.array([1.00001,1,1,1,1], dtype=np.float64)
data = np.array([1,2,3,4,5,6,7,8,9,10],dtype=np.float64)
print(fast_pdf(model, data, 200)) # probably not intended
# [ 98667.44131307 25479.89240456 25479.89240456 25479.89240456 25479.89240456 25479.89240456 25479.89240456 25479.89240456 25479.89240456 25479.89240456]
print(fast_pdf(model, data, 200, (0, 12))) # looks better
# [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]An easy fix is either default the extrema to the range of (min(data,model), max(data,model) or call np.interp with left=0, right=0. I'm happy to submit a pull request with either option.