Feature request for sequential feature selector: random subsets at each step

Hello,

First of all: thanks for the great package! I have gotten a lot of good use out of it, especially the sequential feature selection.

SFS becomes problematic as the number of features _d_ increases, since the complexity grows as _O(d^2)_. I have found that one way to deal with this is to take a random subset of the remaining dimensions to check at each step instead of trying all of them. If the random subset has size _k_ then the complexity goes down to _O(dk)_.

Take an example of sequential forward selection with d=1000 and k=25.

During the first step, we can either try all 1000 univariate models or pick a random subset of 25 univariate models, and then take the best of them. It makes sense to try them all so as to start with a good baseline.

During the second step, instead of trying 999 bivariate models, we try only 25 of them.

Then 25 instead of 998 trivariate models. And so on until we have 25 left, at which point we revert to trying them all. 

If you're interested in some empirical results, I wrote a blog post about this a while back: http://blog.explainmydata.com/2012/07/speeding-up-greedy-feature-selection.html

This would be a great feature to have!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request for sequential feature selector: random subsets at each step #47

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request for sequential feature selector: random subsets at each step #47

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions