Add multi label support v2 by louis-huang · Pull Request #306 · ray-project/xgboost_ray

louis-huang · 2024-02-09T02:41:55Z

This is a new pull request to replace #298 because I'm unable to work on that branch.

I copied content from that pull requests:
Hi I added support to allow label as a list. So we can support reading data with multiple labels. This can then solve #286.
I verified new unit tests pass. Also test_matrix.py all pass with my local set up.
I verified locally by training a xgboost model with parquet data format, it works well. So far it should work well for parquet data format. Thank you!

I verified the change works with the blow code example:

from sklearn.datasets import make_multilabel_classification
import pandas as pd
import numpy as np
n_classes = 5
random_state = 0
X, y = make_multilabel_classification(n_samples=32, n_classes=5, n_labels=3, random_state=random_state)
features = [f"f{i}" for i in range(len(X[0]))]
labels = [f"label_{i}" for i in range(n_classes)]

X_df = pd.DataFrame(X, columns = features)
y_df = pd.DataFrame(y, columns = labels)
data = pd.concat([X_df, y_df], axis = 1)

data.to_parquet("~/Desktop/sample_data/data.parquet")

from xgboost_ray import RayDMatrix, RayParams, train, RayFileType
n_classes = 5
features = [f"f{i}" for i in range(20)]

labels = [f"label_{i}" for i in range(n_classes)]

training_data = "~/Desktop/sample_data"
train_set = RayDMatrix(training_data, labels, columns = features + labels, filetype=RayFileType.PARQUET)

evals_result = {}
bst = train(
    {
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
        "random_state": random_state,
    },
    train_set,
    num_boost_round = 1,
    evals_result=evals_result,
    evals=[(train_set, "train")],
    verbose_eval=False,
    ray_params=RayParams(
        num_actors=1,  # Number of remote actors
        cpus_per_actor=1))

#bst.save_model("model.xgb")
#print("Final training error: {:.4f}".format(
#    evals_result["train"]["error"][-1]))

from xgboost_ray import predict
pred_ray = predict(bst, train_set, ray_params=RayParams(num_actors=1))
print(pred_ray)


import xgboost as xgb

clf = xgb.XGBClassifier(tree_method="hist", n_estimators = 1, random_state=0)
clf.fit(X, y)
expected = clf.predict_proba(X)

np.testing.assert_allclose(expected, pred_ray)

louis-huang · 2024-02-09T02:43:02Z

@Yard1 Hi, could you please review this again? Thank you so much!!

Yard1

Thanks, LGTM!

Yard1 · 2024-02-27T22:21:13Z

@matthewdeng could you help get this merged and released?

louis-huang · 2024-02-28T06:46:45Z

Hi @Yard1 Thanks for your review! I missed out a change and just added it. I need approval to run the workflow. Thank you!

Yard1 · 2024-03-01T20:24:43Z

@louis-huang Looks like just lint needs to be fixed

louis-huang · 2024-03-02T04:30:40Z

@Yard1 Got it, fixed it now. Not get used to running format.sh whenever I make a change. Will remember this next time. Thank you for your help!

louis-huang added 2 commits February 8, 2024 18:10

Add multi label support to xgboost ray

c9ec3a1

fix lint

a04956b

add a missing change

edd7959

Yard1 approved these changes Feb 27, 2024

View reviewed changes

add another missing change

3244b45

fix lint

0abc6f3

Yard1 merged commit e904925 into ray-project:master Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi label support v2#306

Add multi label support v2#306
Yard1 merged 5 commits intoray-project:masterfrom
louis-huang:add_multi_label_support_v2

louis-huang commented Feb 9, 2024

Uh oh!

louis-huang commented Feb 9, 2024

Uh oh!

Yard1 left a comment

Uh oh!

Yard1 commented Feb 27, 2024

Uh oh!

louis-huang commented Feb 28, 2024 •

edited

Loading

Uh oh!

Yard1 commented Mar 1, 2024

Uh oh!

louis-huang commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

louis-huang commented Feb 9, 2024

Uh oh!

louis-huang commented Feb 9, 2024

Uh oh!

Yard1 left a comment

Choose a reason for hiding this comment

Uh oh!

Yard1 commented Feb 27, 2024

Uh oh!

louis-huang commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yard1 commented Mar 1, 2024

Uh oh!

louis-huang commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

louis-huang commented Feb 28, 2024 •

edited

Loading