FLAML with mixed numerical and categorical features #1226
Unanswered
Sukantabasu
asked this question in
Q&A
Replies: 1 comment
-
automl does some simple preprocessing before invoking the trained estimator. That could be the reason. Could you try applying automl.feature_transformer to the data before using bestMod for prediction? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Everyone,
I have been using FLAML for a while. Thus far, all my datasets only included numerical features. Recently, I started with a new dataset that has both numerical and categorical features. I am using lightgbm and noticed an interesting behavior. Let us assume I have a dataframe df_X which has a few categorical columns; df_X.dtypes are: float64, float64, float64, float64, category, category, float64, etc.
After training, if I do the following, I get good results.
Y1 = automl.predict(df_X)
However, if I do the following, I get quite erroneous results.
bestMod = automl.best_model_for_estimator('lgbm')
Y2 = bestMod.predict(df_X)
Typically, with numerical data, I use multiple models 'lgbm', 'xgboost', 'rf', etc. Since I am using mixed features (without one-hot-encoding), I was just testing my code with only lgbm. My understanding is that lgbm handles categorical data very efficiently with integer encoding.
For my trial run, I was expecting Y1 and Y2 to be identical. Why there is a difference? I did not see any such differences for purely numerical dataframes.
Best regards,
Sukanta
Beta Was this translation helpful? Give feedback.
All reactions