Skip to content

refactor(bigframes): Make join nullity optimizations more robust#16541

Draft
TrevorBergeron wants to merge 2 commits intomainfrom
tbergeron_null_prop_rewriter
Draft

refactor(bigframes): Make join nullity optimizations more robust#16541
TrevorBergeron wants to merge 2 commits intomainfrom
tbergeron_null_prop_rewriter

Conversation

@TrevorBergeron
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a nulls_equal attribute to JoinNode to explicitly handle pandas-style null equality in joins. It includes a new rewrite rule, simplify_join, which optimizes SQL generation by removing unnecessary nullity checks when join columns are non-nullable. The feedback suggests refining the simplify_join logic to be less restrictive by using an and condition for nullability checks, ensuring the optimization is applied more broadly. Additionally, it is recommended to reorder the compiler steps to ensure all join nodes are optimized and to remove an unused import in the new rewrite module.

Comment on lines +31 to +32
if node.left_child.field_by_id[left_ref.id].nullable or node.right_child.field_by_id[right_ref.id].nullable:
return node
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of or here is too restrictive and leads to a regression in SQL quality (unnecessary COALESCE operations) when only one side of a join condition is nullable. Since NULL == NULL can only occur if both sides are nullable, we can safely set nulls_equal=False as long as at least one side of every join condition is non-nullable. This aligns with the previous logic used in the joins_nulls property and would resolve the regression seen in test_compile_fromrange snapshots.

Suggested change
if node.left_child.field_by_id[left_ref.id].nullable or node.right_child.field_by_id[right_ref.id].nullable:
return node
if node.left_child.field_by_id[left_ref.id].nullable and node.right_child.field_by_id[right_ref.id].nullable:
return node

Comment on lines 52 to 53
result_node = result_node.bottom_up(rewrites.simplify_join)
result_node = _replace_unsupported_ops(result_node)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the sqlglot compiler and to ensure that any JoinNodes introduced during the lowering of unsupported operations are also optimized, simplify_join should be called after _replace_unsupported_ops.

Suggested change
result_node = result_node.bottom_up(rewrites.simplify_join)
result_node = _replace_unsupported_ops(result_node)
result_node = _replace_unsupported_ops(result_node)
result_node = result_node.bottom_up(rewrites.simplify_join)

# limitations under the License.

from __future__ import annotations
import itertools
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The itertools module is imported but not used in this file.

@TrevorBergeron TrevorBergeron force-pushed the tbergeron_null_prop_rewriter branch from 84eccc4 to 7fd26f9 Compare April 2, 2026 22:14
@TrevorBergeron TrevorBergeron force-pushed the tbergeron_null_prop_rewriter branch from 7fd26f9 to 7f635df Compare April 2, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant