Skip to content

Support for fastDummies to overcome memory problems of model.matrix? #35

@jarbet

Description

@jarbet

When the number of predictors is large, model.matrix quickly blows up memory when using the formula interface. For example, I get memory errors when trying to fit a model with 30k predictors and 100 GB of RAM.

A simple solution is to use the fastDummies R package to convert factors/character features to numeric dummy variables. This function is much more memory efficient, i.e. I am running my same model on a computer with 15 GB RAM (when printing gc(), it says at most 2 GB of RAM was used).

Here's an example of how to use fastDummies to setup the x matrix:

suppressPackageStartupMessages(library(fastDummies));
data(iris);

x <- iris;
head(x);
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

x.matrix <- as.matrix(fastDummies::dummy_columns(
    .data = x,
    remove_first_dummy = TRUE, # use K-1 dummy variables for a factor with K levels
    remove_selected_columns = TRUE # remove the original factor variables, otherwise it still keeps them by default
    ));
rownames(x.matrix) <- rownames(x); # if patient ids are rownames, need to readd here.
head(x.matrix)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species_versicolor
#> 1          5.1         3.5          1.4         0.2                  0
#> 2          4.9         3.0          1.4         0.2                  0
#> 3          4.7         3.2          1.3         0.2                  0
#> 4          4.6         3.1          1.5         0.2                  0
#> 5          5.0         3.6          1.4         0.2                  0
#> 6          5.4         3.9          1.7         0.4                  0
#>   Species_virginica
#> 1                 0
#> 2                 0
#> 3                 0
#> 4                 0
#> 5                 0
#> 6                 0

Created on 2024-07-25 with reprex v2.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions