-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
Summary
I think polars library is on the path to replace the majority of pandas use-cases. It is already being adopted by the community. We use it internally in my company for new projects and we try not to use pandas at all.
Motivation
Polars is blazingly fast and it has several times a lower memory footprint. There is no need to use extra memory to convert data into numpy or pandas to be used for training in LightGBM.
Description
I would like the following to be working, where data_train and data_test are instances of pl.DataFrame
y_train = data_train[col_target]
y_test = data_test[col_target]
X_train = data_train.select(cols_pred)
X_test = data_test.select(cols_pred)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
params = {
"boosting_type": "gbdt",
"objective": "regression",
"metric": {"l2", "l1"},
"learning_rate": 0.1,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"verbose": 0,
"num_leaves": 42,
"max_depth": 5,
"num_iterations": 5000,
"min_data_in_leaf": 500,
"reg_alpha": 2,
"reg_lambda": 5,
}
gbm = lgb.train(
params,
lgb_train,
valid_sets=lgb_eval,
callbacks=[lgb.early_stopping(stopping_rounds=500)],
)as of now I have to convert it into numpy matrices
y_train = data_train[col_target].to_numpy()
y_test = data_test[col_target].to_numpy()
X_train = data_train.select(cols_pred).to_numpy()
X_test = data_test.select(cols_pred).to_numpy()vavrato, ion-elgreco, kszlim, Julian-J-S, ritchie46 and 32 more