Skip to content

[Data] Fixing BQ datasink to be able to handle empty blocks#60797

Open
alexeykudinkin wants to merge 2 commits intomasterfrom
ak/bq-empt-blk-fix
Open

[Data] Fixing BQ datasink to be able to handle empty blocks#60797
alexeykudinkin wants to merge 2 commits intomasterfrom
ak/bq-empt-blk-fix

Conversation

@alexeykudinkin
Copy link
Contributor

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner February 6, 2026 02:56
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue in the BigQuery datasink to handle empty blocks by filtering them out before processing. A corresponding unit test has been added to validate the fix. While the fix itself is correct, the assertion in the new test case is flawed and should be corrected to accurately reflect the expected behavior.

ctx=ctx,
)

ray_get_mock.assert_not_called()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion assert_not_called() is incorrect because ray.get() is still invoked even when the list of remote tasks is empty. To correctly verify that no tasks are submitted for an empty block, you should assert that ray.get() was called once with an empty list.

Suggested change
ray_get_mock.assert_not_called()
ray_get_mock.assert_called_once_with([])

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

ctx=ctx,
)

ray_get_mock.assert_not_called()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertion incorrect for empty block case

Medium Severity

The test assertion ray_get_mock.assert_not_called() is incorrect. When the write method is called with only empty blocks, the list comprehension filters them out, producing an empty list []. However, ray.get([]) is still called (the ray.get call is unconditional). The mock would be called with an empty list argument, causing assert_not_called() to fail. The assertion should verify ray.get was called with an empty list instead.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[data] Zero-sized blocks crashes write_bigquery Ray fails to serialize self-reference objects

1 participant