When the number of predictors is large, model.matrix quickly blows up memory when using the formula interface. For example, I get memory errors when trying to fit a model with 30k predictors and 100 GB of RAM.
A simple solution is to use the fastDummies R package to convert factors/character features to numeric dummy variables. This function is much more memory efficient, i.e. I am running my same model on a computer with 15 GB RAM (when printing gc(), it says at most 2 GB of RAM was used).
Here's an example of how to use fastDummies to setup the x matrix:
suppressPackageStartupMessages(library(fastDummies));
data(iris);
x <- iris;
head(x);
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
x.matrix <- as.matrix(fastDummies::dummy_columns(
.data = x,
remove_first_dummy = TRUE, # use K-1 dummy variables for a factor with K levels
remove_selected_columns = TRUE # remove the original factor variables, otherwise it still keeps them by default
));
rownames(x.matrix) <- rownames(x); # if patient ids are rownames, need to readd here.
head(x.matrix)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species_versicolor
#> 1 5.1 3.5 1.4 0.2 0
#> 2 4.9 3.0 1.4 0.2 0
#> 3 4.7 3.2 1.3 0.2 0
#> 4 4.6 3.1 1.5 0.2 0
#> 5 5.0 3.6 1.4 0.2 0
#> 6 5.4 3.9 1.7 0.4 0
#> Species_virginica
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
Created on 2024-07-25 with reprex v2.0.2
When the number of predictors is large,
model.matrixquickly blows up memory when using the formula interface. For example, I get memory errors when trying to fit a model with 30k predictors and 100 GB of RAM.A simple solution is to use the fastDummies R package to convert factors/character features to numeric dummy variables. This function is much more memory efficient, i.e. I am running my same model on a computer with 15 GB RAM (when printing
gc(), it says at most 2 GB of RAM was used).Here's an example of how to use fastDummies to setup the x matrix:
Created on 2024-07-25 with reprex v2.0.2