When I first created train_rf(), I assumed the input data frames would only have columns: docname, writer (optional), cluster1,...,cluster40. The function runs without error if the input data frames have additional columns, but I need to make sure that train_rf() only trains on the cluster1,...,cluster40 columns.