Skip to content

Conversation

@max-sixty
Copy link
Member

Summary

Fixes #5130: Filtering on combined aggregates was generating invalid SQL with CTEs missing GROUP BY.

The issue occurred when aggregate expressions contained operations like (sum x) + (sum y). These expressions were being extracted as Compute transforms, but when the pipeline was split for a following Filter, the Computes ended up in a separate CTE from their Aggregate, losing the GROUP BY clause.

Fix: In is_split_required, don't split between Compute and Filter when there's an Aggregate later in the pipeline. This keeps aggregate-related Computes together with their GROUP BY.

Trade-off: Some intermediate CTEs may include unused columns (documented in test comment). These don't leak to final output.

Test plan

  • Added regression tests for (sum x) + (sum y) and (sum x) * 2 with filter
  • All 636 tests pass
  • Verified fix produces valid SQL with proper GROUP BY/HAVING

🤖 Generated with Claude Code

max-sixty and others added 10 commits December 20, 2025 18:01
Replace internal compiler error with user-facing error message when a
bare lambda like `x -> y` is used at the top level. Now produces:
"expected a table, but found a function"

Fixes PRQL#4280

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
When using `group {} (sort a | take 1)`, the inner sort was being
incorrectly dropped because `sort_undone` was unconditionally set
to true for all groups. Now only set `sort_undone` when there's an
actual partition (non-empty group keys).

Fixes PRQL#5100

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The error message for bare lambda expressions (e.g., `x -> y`) was fixed
in PR PRQL#5634. Move the test from bad_error_messages.rs to error_messages.rs
since the error is now clear and helpful.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…QL#4432

Add inline snapshot tests to prevent regressions for bugs that have been
fixed:

- test_source_column_name: Using `source` as column name no longer causes
  "Ambiguous name" error (PRQL#5094)
- test_column_inference_with_into: Column inference works correctly with
  `into` statement (PRQL#4723)
- test_distinct_on_columns_propagated: DISTINCT ON propagates necessary
  columns to CTE (PRQL#4432)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
When `sort | take` was lowered to RQ, the sort was embedded in the Take
struct. But postprocessing only tracked explicit SqlTransform::Sort,
ignoring the embedded sort. This caused ORDER BY to be lost in CTEs.

Now extracts `take.sort` for simple takes (no partition) where the sort
was merged during RQ lowering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
When aggregate expressions include operations like `(sum x) + (sum y)`,
the pipeline was being split such that aggregation functions ended up
in a CTE without GROUP BY, producing invalid SQL.

The fix modifies `is_split_required` to not split between Compute and
Filter when there's an Aggregate in the following transforms. Filter
after Aggregate becomes HAVING in the same SELECT, so all preceding
Computes must stay with the Aggregate for proper GROUP BY.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@max-sixty max-sixty merged commit b73bf89 into PRQL:main Dec 21, 2025
36 checks passed
@max-sixty max-sixty deleted the bug-fixes branch December 21, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue on query compilation when filtering on combined aggregates

1 participant