Skip to content

Confidence interval of random forest#276

Closed
Fuhan-Yang wants to merge 19 commits into
mainfrom
fy_swo_rf
Closed

Confidence interval of random forest#276
Fuhan-Yang wants to merge 19 commits into
mainfrom
fy_swo_rf

Conversation

@Fuhan-Yang
Copy link
Copy Markdown
Contributor

Using forestci to calculate the prediction error (variance) and convert to 95% confidence interval, assuming the error is normally distributed. So far this is based on the examples, will check out the paper in detail later. Stay as a draft for now. @swo
image

@swo
Copy link
Copy Markdown
Collaborator

swo commented Feb 19, 2026

Cool! It's nice that the CIs shrink to zero as you get to the last data point. But generally these CIs seem too small, if there' 95%, since in much more than 5% of examples, the final data point is not inside the confidence interval on the first prediction.

@Fuhan-Yang
Copy link
Copy Markdown
Contributor Author

One confusion in the plot above is that the point is the prediction mean instead of the data, here is a plot with the data and the forecasts across forecast dates:
image

The prediction error uses bootstrapping error to represent sampling error from data generating process. When the amount of bootstrapping times is high enough, bootstrapping error should approximate to sampling error. But when it's low, bootstrapping error overestimates the sampling error.

The paper that provides support for the forecastci package uses a bias-corrected variance to represent the bootstrapping error, to allow that when the number of trees is low, the variance is still robust to reflect the sampling error. As I tested using 10,100, 1000, 10000 trees, the prediction intervals look pretty similar.

Also, it is not surprising to me to see the prediction interval is small. My take is that the training data all showed pretty similar correlation among the monthly uptake within a season (as the curve is pretty regular). While we still observe lack-of-fit. Comparing with the target data, not all the prediction means and intervals captured the data for all the states at the beginning of the season, and it gets better as season moves forward. This is expected, and also observed in LPL model.

@swo
Copy link
Copy Markdown
Collaborator

swo commented Mar 9, 2026

I'm still confused about the CIs and their appropriateness. Maybe do an experiment with some simulated data, to see if a simple regression behaves as we expect?

Base automatically changed from swo_rf to main March 9, 2026 15:53
@Fuhan-Yang
Copy link
Copy Markdown
Contributor Author

Some updates about how the variance estimator is derived in random_forest.md. And some plots to show how the variance estimator changes as the increase of the number of trees. When changing the number of trees within 10^x, x= 1,2,..5, the numerator of the variance estimator increases:

image

The estimator still decreases:
image

This means forestci may not be ideal to estimate the prediction error, as it is affected by the number of trees, and also it estimates the variance given the average prediction over the trees (bootstrap smoothed estimator), it may underestimate given the variance reduction of the smoothed estimator.

@Fuhan-Yang
Copy link
Copy Markdown
Contributor Author

Per discussion, the approach of using out-of-bag estimates to estimate prediction error is used. From [this doc](https://pages.pomona.edu/~jsh04747/Student%20Theses/[BenjiLu17.pdf](https://pages.pomona.edu/~jsh04747/Student%20Theses/BenjiLu17.pdf) and this post, the difference between the training data and the out-of-bag estimate is used as the in-sample prediction error as well as the out-of sample prediction error.

Implementing in the demo code, the prediction interval gets wider:
image

@swo

@swo
Copy link
Copy Markdown
Collaborator

swo commented Mar 12, 2026

See #278 (comment)

@swo
Copy link
Copy Markdown
Collaborator

swo commented Mar 23, 2026

We agreed to do an interval over trees' prediction

@swo swo closed this Mar 23, 2026
@swo swo deleted the fy_swo_rf branch March 23, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants