-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Data] Fixing BQ datasink to be able to handle empty blocks #60797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -255,6 +255,28 @@ def test_write_dataset_exists(self, ray_get_mock): | |
| ), | ||
| ) | ||
|
|
||
| def test_write_empty_block(self, ray_get_mock): | ||
| """Test that writing a zero-sized block doesn't crash. | ||
|
|
||
| See https://github.com/ray-project/ray/issues/51892 | ||
| """ | ||
| bq_datasink = BigQueryDatasink( | ||
| project_id=_TEST_GCP_PROJECT_ID, | ||
| dataset=_TEST_BQ_DATASET, | ||
| ) | ||
| # Create an empty block with schema but no rows | ||
| block = pa.Table.from_arrays( | ||
| [pa.array([], type=pa.int64())], names=["data"] | ||
| ) | ||
| ctx = TaskContext(1, "") | ||
| # This should not raise an error - empty blocks should be skipped | ||
| bq_datasink.write( | ||
| blocks=[block], | ||
| ctx=ctx, | ||
| ) | ||
|
|
||
| ray_get_mock.assert_not_called() | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test assertion incorrect for empty block caseMedium Severity The test assertion |
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| import sys | ||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion
assert_not_called()is incorrect becauseray.get()is still invoked even when the list of remote tasks is empty. To correctly verify that no tasks are submitted for an empty block, you should assert thatray.get()was called once with an empty list.