Skip to content

gbt dtrees always use float instead of the real data type cause unexpected predict result. #1628

Open
@cmsxbc

Description

Describe the bug
I'm working on convert lightGBM model to daal gbt model. And the converted model predict different result which is much different
to that lightGBM predict.

And I have found that the gbt_dtrees model always use float as the code shows.

https://github.com/oneapi-src/oneDAL/blob/c6c5219c85e5bceb0392e54e653e00e8cc45e21f/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i#L44

If i changed the typedef to typedef double ModelFPType, the result would be just as expected.

To Reproduce
Steps to reproduce the behavior:

  1. use example model from lightgbm lightgbm simple_example, and apply this change for generate a 200 iterations model to make error more easy to be observed.
index 9af83008..debb339c 100644
--- a/examples/python-guide/simple_example.py                                                                          
+++ b/examples/python-guide/simple_example.py
@@ -35,9 +35,10 @@ print('Starting training...')
 # train                                                   
 gbm = lgb.train(params,                                   
                 lgb_train,                                
-                num_boost_round=20,
+                num_boost_round=340,
                 valid_sets=lgb_eval,
-                early_stopping_rounds=5)
+                verbose_eval=False,
+                early_stopping_rounds=100)
  1. convert the model and predict
d4p_model = daal4py.get_gbt_model_from_lightgbm(gbm)
d4p_y_pred = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction.reshape(-1)
print("The rmse of daal4py's prediction is:", mean_squared_error(y_test, d4p_y_pred))

print('are preds of daal4py and lightGBM equal:', (d4p_y_pred == y_pred).all())
print('The rmse of daal4py vs lightGBM is:', mean_squared_error(d4p_y_pred, y_pred) ** 0.5)
  1. the release package will see two pred result is not equal.

  2. apply the patch

index 6b4ffc3af..16aac9b18 100644
--- a/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
+++ b/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
@@ -41,7 +41,7 @@ namespace prediction
 {
 namespace internal
 {
-typedef float ModelFPType;
+typedef double ModelFPType;
 typedef uint32_t FeatureIndexType;
 const FeatureIndexType VECTOR_BLOCK_SIZE = 64;
  1. all works well.

Expected behavior
Use real data type, or the reason why the float is the one~

Output/Screenshots
Up one is the unexpected result, and bottom is what I recompiled with typedef double ModelFPType.

2021-05-22-012202_1885x913_scrot

My patch:
2021-05-22-011158_1896x549_scrot

Environment:

  • OS: ArchLInux
  • Compiler: gcc 11.1.0
  • Version: 2021.2.2

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions