[Data] Fixing BQ datasink to be able to handle empty blocks#60797
[Data] Fixing BQ datasink to be able to handle empty blocks#60797alexeykudinkin wants to merge 2 commits intomasterfrom
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue in the BigQuery datasink to handle empty blocks by filtering them out before processing. A corresponding unit test has been added to validate the fix. While the fix itself is correct, the assertion in the new test case is flawed and should be corrected to accurately reflect the expected behavior.
| ctx=ctx, | ||
| ) | ||
|
|
||
| ray_get_mock.assert_not_called() |
There was a problem hiding this comment.
The assertion assert_not_called() is incorrect because ray.get() is still invoked even when the list of remote tasks is empty. To correctly verify that no tasks are submitted for an empty block, you should assert that ray.get() was called once with an empty list.
| ray_get_mock.assert_not_called() | |
| ray_get_mock.assert_called_once_with([]) |
| ctx=ctx, | ||
| ) | ||
|
|
||
| ray_get_mock.assert_not_called() |
There was a problem hiding this comment.
Test assertion incorrect for empty block case
Medium Severity
The test assertion ray_get_mock.assert_not_called() is incorrect. When the write method is called with only empty blocks, the list comprehension filters them out, producing an empty list []. However, ray.get([]) is still called (the ray.get call is unconditional). The mock would be called with an empty list argument, causing assert_not_called() to fail. The assertion should verify ray.get was called with an empty list instead.


Description
Related issues
Additional information