Leaves with no samples from original distribution

This issue was observed and reported by Jack Wimberley.

If there is a region with very few original samples, decision tree can build a leaf with samples only from target distribution (> min_samples_leaf) and 0 (exactly zero) from original.

As a result, 'corrections' made by a tree do not affect train weights, but this results in blowing up weights on the test.

## Workarounds

Basically, almost anything from 
- increase min_samples_leaf
- subsample=0.5
- increase regularization (available in develop version) 

(and any combination of the above) works well and resolves the problem in practice. 


## Proper solution (not available now)

Good, correct solution would be to introduce parameter 'minimal number of samples from original distribution in a leaf', but this isn't supported by decision trees of scikit-learn (or any other library).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Leaves with no samples from original distribution #39

Workarounds

Proper solution (not available now)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Leaves with no samples from original distribution #39

Description

Workarounds

Proper solution (not available now)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions