Compare two prediction errors from RF in toy data #278
Closed
Fuhan-Yang wants to merge 2 commits into
Closed
Conversation
Collaborator
|
This is still kind of a complicated example. See #279 for something very very simple. I'm also still confused about how we'd use any method that does test/training split or in- vs. out-of-bag distinctions. We're interested in forecasting, so the target isn't in-bag or out-of-bag; it's just not in the dataset. My emerging conclusion is to either (1) take the interval over trees or (2) switch to gradient boosting. |
Collaborator
|
We agreed to do an interval over trees' prediction |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here is a EDA of the prediction errors from random forest using a toy data. The data is created in the way suited for linear regression. Then linear regression and random forest are used to fit the training data (80% of the total data) and do out-of-sample prediction. The prediction mean along with the prediction interval are plotted. The intervals were calculated in two ways for random forest: V_UIJ used by Wager et al and out-of-bag error by Lu et al., given number of trees varying within [100,1000, 5000, 10000]. We can see the prediction intervals are pretty consistent as the increase of trees. The OOB method creates wider interval than V_UIJ. Note that the width of the interval is the same across all the predictions for OOB, which assumes that the uncertainty is consistent across time. Applying this in vaccine coverage, this assumes that the prediction uncertainty of end-of-season coverage is the same as predicting the coverage one month after the forecast date. @swo