dRFEtools is a package for dynamic recursive feature elimination with
scikit-learn.
Authors: Apuã Paquola, Kynon Jade Benjamin, and Tarun Katipalli
Package developed in Python 3.10+.
In addition to scikit-learn, dRFEtools is also built with NumPy, SciPy,
Pandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports
models with coef_ or feature_importances_ attribute.
This package provides several functions to run dynamic recursive feature elimination (dRFE) for random forest and linear model classifier and regression models. For random forest workflows, dRFEtools assumes Out-of-Bag (OOB) scoring is enabled. Linear-model workflows build a developmental split internally. For both classification and regression, three measurements are calculated for feature selection:
Classification:
- Normalized mutual information
- Accuracy
- Area under the curve (AUC) ROC curve
Regression:
- R2 (this can be negative if model is arbitrarily worse)
- Explained variance
- Mean squared error
The repository is organized into focused modules to match the runtime architecture:
dRFEtools.py– core interfaces for random-forest and developmental-set elimination workflows.scoring/– metric implementations for developmental splits and random-forest OOB scoring.lowess/– helpers for smoothing elimination curves and extracting optimal feature counts.metrics/– feature ranking utilities used during elimination.plotting.py– visualization helpers re-exported from the top-level package.cli.py– command-line entry points for running full dRFE pipelines.utils.py– shared helpers for normalizing results and persisting plots.
If using please cite the following:
Kynon J M Benjamin, Tarun Katipalli, Apuã C M Paquola, dRFEtools: dynamic recursive feature elimination for omics, Bioinformatics, Volume 39, Issue 8, August 2023, btad513, https://doi.org/10.1093/bioinformatics/btad513
PMID: 37632789
DOI: 10.1093/bioinformatics/btad513.
pip install --user dRFEtools
We have two tutorials for optimization (version 0.2) and classification (version 0.3+).
In addition to this, we have example code used in the manuscript for scikit-learn simulation, biological simulation, and BrainSEQ Phase 1 at the link below.
https://github.com/LieberInstitute/dRFEtools_manuscript
rf_rfe– Runs random-forest feature elimination and returns a tuple of(results_by_feature_count, first_step). Each dictionary entry contains the number of retained features (n_features), the task-appropriate metrics, and the indices of surviving features.dev_rfe– Performs the same elimination loop for estimators that rely on a developmental split, yielding the same standardized result structure asrf_rfe.
features_rank_fnc– Ranks features during elimination and optionally persists the ranking table for each fold.- Developmental-set metrics (
dev_score_*) live underdRFEtools.scoring.dev. - Random-forest OOB metrics (
oob_score_*) live underdRFEtools.scoring.random_forest.
extract_max_lowess– Identifies the optimal feature count from the LOWESS-smoothed elimination curve.extract_peripheral_lowess– Detects the inflection point associated with peripheral features.optimize_lowess_plot– Visualizes the LOWESS curve with annotations about the selected feature counts.
Plotting helpers are defined in dRFEtools.plotting and re-exported from the
top-level package:
plot_metric– Render elimination trajectories for individual metrics.plot_with_lowess_vline– Overlay LOWESS-derived selection cutoffs on the metric trajectory plot.
normalize_rfe_result,get_feature_importances, andsave_plot_variantsare available underdRFEtools.utilsand support the standardized dictionary-based API.- The command-line interface in
dRFEtools.cliwraps the same workflows for CSV inputs: runpython -m dRFEtools.cli --helpto explore available commands.