Skip to content

[Data] Fix: Replace bare raise with TypeError in string concatenation#60795

Open
slfan1989 wants to merge 3 commits intoray-project:masterfrom
slfan1989:fix/pa-string-input-typeerror
Open

[Data] Fix: Replace bare raise with TypeError in string concatenation#60795
slfan1989 wants to merge 3 commits intoray-project:masterfrom
slfan1989:fix/pa-string-input-typeerror

Conversation

@slfan1989
Copy link
Contributor

Description

This PR fixes a bug in _to_pa_string_input() where attempting to concatenate string columns with non-string columns (e.g., numeric types) would raise a bare RuntimeError instead of a descriptive TypeError.

Changes:

  • Replaced bare raise statement with proper TypeError that includes a clear error message indicating expected vs actual input types
  • Simplified control flow using early returns
  • Added unit test test_string_concat_invalid_input_type to verify the fix

Before: Bare raise caused cryptic RuntimeError: No active exception to reraise

After: Clear TypeError: Expected string or string-like pyarrow Array/ChunkedArray for string concatenation, got int64.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Example of the fixed behavior:

import pyarrow as pa
from ray.data.expressions import col

table = pa.table({"name": ["Alice", "Bob"], "age": [25, 30]})
expr = col("name") + col("age")  # Attempting to concat string with int

# Now raises: TypeError: Expected string or string-like pyarrow Array/ChunkedArray 
# for string concatenation, got int64.

Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989 slfan1989 requested a review from a team as a code owner February 6, 2026 01:40
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid improvement, replacing a bare raise with a descriptive TypeError for invalid string concatenation operations. The code is also simplified by using early returns. I've added one suggestion to make the error message even more informative by including the specific data type of the invalid input, which aligns with the goal stated in the pull request description. The new unit test is a great addition to prevent regressions.

Comment on lines 102 to 105
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type(x).__name__}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message can be made more specific to better align with the goal stated in the PR description. When x is a pyarrow.Array or pyarrow.ChunkedArray, using x.type instead of type(x).__name__ will provide the underlying data type (e.g., int64), which is more informative for debugging.

Suggested change
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type(x).__name__}."
)
type_name = x.type if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type_name}."
)

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@ray-gardener ray-gardener bot added the community-contribution Contributed by the community label Feb 6, 2026
Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
Comment on lines 100 to 109
if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x):
return _pa_decode_dict_string_array(x)
if isinstance(x, (pa.Array, pa.ChunkedArray)):
actual_type = str(x.type)
else:
raise
return x
actual_type = type(x).__name__
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {actual_type}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested cleanup:

    if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x):
        return _pa_decode_dict_string_array(x)
    actual_type = str(x.type) if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__
    raise TypeError(
        "Expected string or string-like pyarrow Array/ChunkedArray for string "
        f"concatenation, got {actual_type}."
    )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! I’ve updated the PR accordingly—could you please take another look?

Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants