add multi-label support by louis-huang · Pull Request #298 · ray-project/xgboost_ray

louis-huang · 2023-09-29T20:10:37Z

Hi I added support to allow label as a list. So we can support reading data with multiple labels. This can then solve #286.
I verified new unit tests pass. Also test_matrix.py all pass with my local set up.
I verified locally by training a xgboost model with parquet data format, it works well. So far it should work well for parquet data format. Thank you!

louis-huang · 2023-09-29T20:16:21Z

I verified the change works with the blow code example:

from sklearn.datasets import make_multilabel_classification
import pandas as pd
import numpy as np
n_classes = 5
random_state = 0
X, y = make_multilabel_classification(n_samples=32, n_classes=5, n_labels=3, random_state=random_state)
features = [f"f{i}" for i in range(len(X[0]))]
labels = [f"label_{i}" for i in range(n_classes)]

X_df = pd.DataFrame(X, columns = features)
y_df = pd.DataFrame(y, columns = labels)
data = pd.concat([X_df, y_df], axis = 1)

data.to_parquet("~/Desktop/sample_data/data.parquet")

from xgboost_ray import RayDMatrix, RayParams, train, RayFileType
n_classes = 5
features = [f"f{i}" for i in range(20)]

labels = [f"label_{i}" for i in range(n_classes)]

training_data = "~/Desktop/sample_data"
train_set = RayDMatrix(training_data, labels, columns = features + labels, filetype=RayFileType.PARQUET)

evals_result = {}
bst = train(
    {
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
        "random_state": random_state,
    },
    train_set,
    num_boost_round = 1,
    evals_result=evals_result,
    evals=[(train_set, "train")],
    verbose_eval=False,
    ray_params=RayParams(
        num_actors=1,  # Number of remote actors
        cpus_per_actor=1))

#bst.save_model("model.xgb")
#print("Final training error: {:.4f}".format(
#    evals_result["train"]["error"][-1]))

from xgboost_ray import predict
pred_ray = predict(bst, train_set, ray_params=RayParams(num_actors=1))
print(pred_ray)


import xgboost as xgb

clf = xgb.XGBClassifier(tree_method="hist", n_estimators = 1, random_state=0)
clf.fit(X, y)
expected = clf.predict_proba(X)

np.testing.assert_allclose(expected, pred_ray)

heyitsmui · 2023-10-02T20:55:10Z

@Yard1 can you help take a look when you get a chance? thanks!

heyitsmui · 2023-10-02T20:56:21Z

xgboost_ray/data_sources/data_source.py

    def get_column(
        cls, data: pd.DataFrame, column: Any
-    ) -> Tuple[pd.Series, Optional[str]]:
+    ) -> Tuple[pd.Series, Optional[Union[str, List]]]:


should we open up a separate get_columns(...) instead of overloading this method?

Yard1

LGTM, thanks! cc @krfricke

xgboost_ray/matrix.py

xgboost_ray/tests/test_matrix.py

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

xgboost_ray/matrix.py

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

louis-huang · 2023-11-09T00:53:21Z

Hi @Yard1 may I ask how to fix the lint test? Seems it still blocks the merge. Thank you!

Yard1 · 2023-11-09T01:01:42Z

Can you run the ./format.sh script in the root of the repo?

yc2984 · 2024-03-21T16:53:01Z

@louis-huang can you please run the above test please?

add multi-label support

262f54b

heyitsmui reviewed Oct 2, 2023

View reviewed changes

Yard1 approved these changes Oct 2, 2023

View reviewed changes

Yard1 reviewed Oct 12, 2023

View reviewed changes

xgboost_ray/matrix.py Outdated Show resolved Hide resolved

xgboost_ray/tests/test_matrix.py Outdated Show resolved Hide resolved

xgboost_ray/tests/test_matrix.py Outdated Show resolved Hide resolved

Apply suggestions from code review

8d4cfaf

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 reviewed Oct 13, 2023

View reviewed changes

xgboost_ray/matrix.py Outdated Show resolved Hide resolved

Update xgboost_ray/matrix.py

462dcd5

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

louis-huang mentioned this pull request Feb 9, 2024

Add multi label support v2 #306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multi-label support#298

add multi-label support#298
louis-huang wants to merge 3 commits intoray-project:masterfrom
louis-huang:add_multi_label_support

louis-huang commented Sep 29, 2023

Uh oh!

louis-huang commented Sep 29, 2023

Uh oh!

heyitsmui commented Oct 2, 2023

Uh oh!

heyitsmui Oct 2, 2023

Uh oh!

Yard1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

louis-huang commented Nov 9, 2023

Uh oh!

Yard1 commented Nov 9, 2023

Uh oh!

yc2984 commented Mar 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

louis-huang commented Sep 29, 2023

Uh oh!

louis-huang commented Sep 29, 2023

Uh oh!

heyitsmui commented Oct 2, 2023

Uh oh!

heyitsmui Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

Yard1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

louis-huang commented Nov 9, 2023

Uh oh!

Yard1 commented Nov 9, 2023

Uh oh!

yc2984 commented Mar 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants