Confidence interval of random forest#276
Conversation
|
Cool! It's nice that the CIs shrink to zero as you get to the last data point. But generally these CIs seem too small, if there' 95%, since in much more than 5% of examples, the final data point is not inside the confidence interval on the first prediction. |
|
I'm still confused about the CIs and their appropriateness. Maybe do an experiment with some simulated data, to see if a simple regression behaves as we expect? |
|
Per discussion, the approach of using out-of-bag estimates to estimate prediction error is used. From [this doc](https://pages.pomona.edu/~jsh04747/Student%20Theses/[BenjiLu17.pdf](https://pages.pomona.edu/~jsh04747/Student%20Theses/BenjiLu17.pdf) and this post, the difference between the training data and the out-of-bag estimate is used as the in-sample prediction error as well as the out-of sample prediction error. Implementing in the demo code, the prediction interval gets wider: |
|
See #278 (comment) |
|
We agreed to do an interval over trees' prediction |




Using

forestcito calculate the prediction error (variance) and convert to 95% confidence interval, assuming the error is normally distributed. So far this is based on the examples, will check out the paper in detail later. Stay as a draft for now. @swo