Skip to content

[Data] Cap pandas to <3 for ray[data] and make SettingWithCopyWarning handling pandas-3-safe#60406

Open
daiping8 wants to merge 7 commits intoray-project:masterfrom
daiping8:pandas
Open

[Data] Cap pandas to <3 for ray[data] and make SettingWithCopyWarning handling pandas-3-safe#60406
daiping8 wants to merge 7 commits intoray-project:masterfrom
daiping8:pandas

Conversation

@daiping8
Copy link
Contributor

Description

Summary

This PR prevents pip install "ray[data]" from pulling in pandas==3.* (which currently breaks Ray Data at runtime), and hardens Ray's warning-handling code to avoid crashing if SettingWithCopyWarning is missing.

Changes

  • Dependency fix: Set pandas requirement to >=1.3,<3 for ray[data] (and align ray[tune] to use the same bound) in python/setup.py.
  • Runtime robustness: Update SettingWithCopyWarning lookup in:python/ray/air/util/data_batch_conversion.py python/ray/data/util/data_batch_conversion.py to gracefully degrade when the warning isn’t available.
  • Dev requirements sync: Align python/requirements.txt to pandas>=1.3,<3.

How to reproduce

pip install "ray[data]==2.53.0" "pandas==3.0.0"

Execute the following code before and after applying the patch of this PR.

import numpy as np
import ray


def main() -> None:
    import pandas as pd

    ray.init()
    dfs = []
    for _ in range(50):
        dfs.append(
            pd.DataFrame(
                {
                    "tensor": [np.zeros((2, 2), dtype=np.float32) for _ in range(50)],
                    "x": np.arange(50, dtype=np.int64),
                }
            )
        )
    ds = ray.data.from_pandas(dfs)
    ds = ds.map_batches(lambda df: df, batch_format="pandas")
    ds = ds.repartition(8)
    ds = ds.materialize()
    print("count =", ds.count())

    ray.shutdown()


if __name__ == "__main__":
    main()

Related issues

Closes #60402"

…setup files

- Updated pandas dependency in requirements.txt to specify version range: >=1.3,<3.
- Adjusted setup.py to reflect the same version constraints for Ray Data, ensuring compatibility with future pandas versions.
- Modified data_batch_conversion.py and util/data_batch_conversion.py to handle potential changes in SettingWithCopyWarning across pandas versions.

This change is aimed at maintaining compatibility with upcoming pandas releases while avoiding breaking changes in the codebase.

Change-Id: I64b4b464ed63350839c365a81e65f0d6e4b0f53f
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
@daiping8 daiping8 requested review from a team, aslonnie, edoakes and richardliaw as code owners January 22, 2026 08:32
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly caps the pandas dependency to <3 to avoid runtime breakages in Ray Data and makes the SettingWithCopyWarning handling more robust. The changes to setup.py and requirements.txt are appropriate.

My review includes suggestions to improve the maintainability of the warning handling code in data_batch_conversion.py by simplifying the logic and addressing code duplication. I've also pointed out a duplicated dependency in requirements.txt that could be cleaned up.

1
Change-Id: Ibcbc6efba323adbf906cb844f9ea5d33d4c4ed30
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
1
Change-Id: I952be3671074ae3778f2964dbd583170b7508b9c
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 22, 2026
@aslonnie
Copy link
Collaborator

need @ray-project/ray-data comments.

@bveeramani
Copy link
Member

@daiping8 after fix to SettingWithCopyWarning, does Ray Data still crash with pandas==3.0.0? If so, what's the traceback?

@daiping8
Copy link
Contributor Author

daiping8 commented Jan 24, 2026

@daiping8 after fix to SettingWithCopyWarning, does Ray Data still crash with pandas==3.0.0? If so, what's the traceback?

After fixing the warning, it no longer crashes. You can run the test cases in the PR Description.
image

@aslonnie
Copy link
Collaborator

it no longer crashes. You can run the test cases in the PR Description.

nice, so we can close this PR now? feel like ray data team is trying to support running with pandas>=3

@daiping8
Copy link
Contributor Author

daiping8 commented Jan 24, 2026

it no longer crashes. You can run the test cases in the PR Description.

nice, so we can close this PR now? feel like ray data team is trying to support running with pandas>=3

I think it can be merged, but we'd better check with others for their opinions.

Pandas 3.0.0 was released on January 21, 2026. I think it will take some time to identify bugs and formulate a support plan for Ray Data.

@aslonnie
Copy link
Collaborator

we do not upper bound version if it is not broken. that is library user's choice and freedom.

I think it will take some time to identify bugs and formulate a support plan for Ray Data.

user can cap with additional requirement constraints themselves if desired. or use our images / tested compiled lock files if stability is the priority.

check with others for their opinions.

@bveeramani what do you think?

it can be merged

just in case that balaji/data team accepts this, if this PR wants to be merged, it needs to update the requirement / lock files in the repo. no rush though.

@bveeramani
Copy link
Member

@daiping8 could you help me understand -- other than the breaking change to SettingWithCopyWarning, what are the other changes in pandas 3 that might affect Ray Data?

Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(waiting for @ray-project/ray-data to understand the context and make a decision)

@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Feb 7, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Ray Data breaks with pandas 3.0.0 due to removed SettingWithCopyWarning

4 participants