Confidence interval of random forest by Fuhan-Yang · Pull Request #276 · CDCgov/cfa-vaccination-coverage-forecasting

Fuhan-Yang · 2026-02-18T22:52:40Z

Using forestci to calculate the prediction error (variance) and convert to 95% confidence interval, assuming the error is normally distributed. So far this is based on the examples, will check out the paper in detail later. Stay as a draft for now. @swo

swo · 2026-02-19T22:45:33Z

Cool! It's nice that the CIs shrink to zero as you get to the last data point. But generally these CIs seem too small, if there' 95%, since in much more than 5% of examples, the final data point is not inside the confidence interval on the first prediction.

Fuhan-Yang · 2026-03-04T22:29:40Z

One confusion in the plot above is that the point is the prediction mean instead of the data, here is a plot with the data and the forecasts across forecast dates:

The prediction error uses bootstrapping error to represent sampling error from data generating process. When the amount of bootstrapping times is high enough, bootstrapping error should approximate to sampling error. But when it's low, bootstrapping error overestimates the sampling error.

The paper that provides support for the forecastci package uses a bias-corrected variance to represent the bootstrapping error, to allow that when the number of trees is low, the variance is still robust to reflect the sampling error. As I tested using 10,100, 1000, 10000 trees, the prediction intervals look pretty similar.

Also, it is not surprising to me to see the prediction interval is small. My take is that the training data all showed pretty similar correlation among the monthly uptake within a season (as the curve is pretty regular). While we still observe lack-of-fit. Comparing with the target data, not all the prediction means and intervals captured the data for all the states at the beginning of the season, and it gets better as season moves forward. This is expected, and also observed in LPL model.

swo · 2026-03-09T15:48:38Z

I'm still confused about the CIs and their appropriateness. Maybe do an experiment with some simulated data, to see if a simple regression behaves as we expect?

Fuhan-Yang · 2026-03-11T21:01:05Z

Some updates about how the variance estimator is derived in random_forest.md. And some plots to show how the variance estimator changes as the increase of the number of trees. When changing the number of trees within 10^x, x= 1,2,..5, the numerator of the variance estimator increases:

The estimator still decreases:

This means forestci may not be ideal to estimate the prediction error, as it is affected by the number of trees, and also it estimates the variance given the average prediction over the trees (bootstrap smoothed estimator), it may underestimate given the variance reduction of the smoothed estimator.

Fuhan-Yang · 2026-03-11T21:07:31Z

Per discussion, the approach of using out-of-bag estimates to estimate prediction error is used. From [this doc](https://pages.pomona.edu/~jsh04747/Student%20Theses/[BenjiLu17.pdf](https://pages.pomona.edu/~jsh04747/Student%20Theses/BenjiLu17.pdf) and this post, the difference between the training data and the out-of-bag estimate is used as the in-sample prediction error as well as the out-of sample prediction error.

Implementing in the demo code, the prediction interval gets wider:

@swo

swo · 2026-03-12T21:03:01Z

See #278 (comment)

swo · 2026-03-23T15:36:34Z

We agreed to do an interval over trees' prediction

Base automatically changed from swo_rf to main March 9, 2026 15:53

swo and others added 14 commits March 9, 2026 12:50

Demo: Random forest regression

65e4b7f

fixup

fe08ae9

fixup

186c5ca

checkpoint

8da79a7

Use one-hot encoding

29d7ede

Hot-encode season

9ac9fb8

Use a script, not a notebook

669a696

add ci

2b5a403

add forestci

2159508

it's not ci, it's error

709045c

random forest doc

ab786af

fix

09becc1

plot data with forecasts

5daa7f9

doc edit

1f9af3b

Fuhan-Yang force-pushed the fy_swo_rf branch from 1d1b897 to 1f9af3b Compare March 9, 2026 16:51

Fuhan-Yang added 4 commits March 9, 2026 13:13

update lock

43d3150

fix

bb4c522

error check on RF

7f0648b

more edits

dea550b

use in-sample oob error

52238ad

swo closed this Mar 23, 2026

swo deleted the fy_swo_rf branch March 23, 2026 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence interval of random forest#276

Confidence interval of random forest#276
Fuhan-Yang wants to merge 19 commits into
mainfrom
fy_swo_rf

Fuhan-Yang commented Feb 18, 2026

Uh oh!

swo commented Feb 19, 2026

Uh oh!

Fuhan-Yang commented Mar 4, 2026

Uh oh!

swo commented Mar 9, 2026

Uh oh!

Fuhan-Yang commented Mar 11, 2026

Uh oh!

Fuhan-Yang commented Mar 11, 2026

Uh oh!

swo commented Mar 12, 2026

Uh oh!

swo commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Fuhan-Yang commented Feb 18, 2026

Uh oh!

swo commented Feb 19, 2026

Uh oh!

Fuhan-Yang commented Mar 4, 2026

Uh oh!

swo commented Mar 9, 2026

Uh oh!

Fuhan-Yang commented Mar 11, 2026

Uh oh!

Fuhan-Yang commented Mar 11, 2026

Uh oh!

swo commented Mar 12, 2026

Uh oh!

swo commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants