Simulation Abnormality #250
Unanswered
danielagandrade
asked this question in
Q&A
Replies: 1 comment 2 replies
-
|
Hi Daniela, Is it possible that you're just missing a minus sign in front of your LR calculation, or likewise are not using -lnL inside the parentheses? I think this would make all of your distribution plots make a lot more sense. Otherwise, I don't have any good idea what's happening. Matt |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi CAFE5 developers, I am seeing some interesting results with my simulations and would like some input.
I have tested four different nested models using the base model. I modeled with a poisson distribution and included the error model for all of my runs. Here are the -lnL for the empirical data:
Global lambda model (GL): 96839.4
Two lambda model (2L): 93942.016575889
Three lambda model (3L): 93887.766913779
Four lambda model (4L): 93326.065646918
To select which model was best, I compared the GL to the 2L model, the 2L to the 3L model, and the 3L to the 4L model.
The following was my general procedure:
Simulate 1000 datasets using the root distribution from my data under the simpler one of the models (Important note: I was not able to make this run including the base error model , the error said "Trying to simulate leaf family size that was not included in error model
Fit both models to each one of the simulated datasets.
Calculate likelihood of ratios for every simulation and plot a distribution. Then analyze my empirical likelihood of ratios and compare it to the distribution. I used an alpha cutoff of 0.05.
I have attached the plots of the three comparisons, with the empirical LR plotted on them. I have out-ruled the global lambda model and the four lambda model because the plots for those comparisons are clear and straightforward. However, I am seeing some interesting results on the comparison of the two lambda model to the three lambda model and I would like your input.
My empirical LR is 108.4993. I have run both models multiple times with the empirical data and see convergence, with the -lnL indicating consistently that the 3L model is better (which is to be expected due to the extra parameter). Nonetheless, almost all of the LR values that come from the simulated data are negative, indicating that the 3L model has a worst fit. Almost all of the -lnL of the 3L model are larger than those of the 2L model.
Because the empirical LR is a positive value, when I compare it to the distribution of mostly negative numbers and the p value cutoff, it appears that the 3L model is the better choice. The p value of the empirical data is 0.001, calculated as follows:
However, I would like your input because this decision does not sit well with me since in almost all of the simulations the 3L model performed worse. I find this to be confusing since I would expect that increasing parameters would almost certainly always lead to a better fit, but this is not what I am seeing. Additionally the distribution of LR test values is skewed to the left. Based on the simulated data, I am inclined to choose the 2 lambda model. Nonetheless, I would like to see what your thoughts are and see if you have had similar behavior in the past when comparing models.
I have uploaded all of my distribution plots, as well as the 2L and 3L trees. Below is my code for the 2L vs 3L procedure in case it is helpful.
3_separate_lambdas.txt
2_separate_lambdas.txt
Distribution_Plot_2Lvs3L.pdf
Distribution_Plot_3Lvs4L.pdf
Distribution_Plot_GLvs2L.pdf
Thank you for developing and for all of your help when questions and issues arise, they are always very appreciated.
Daniela
CODE:
Beta Was this translation helpful? Give feedback.
All reactions