-
-
Notifications
You must be signed in to change notification settings - Fork 29
fix: correct array_agg inference for union subselects and CTEs #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, array_agg would fail to correctly infer types when the input came from union subqueries or CTEs. This caused type inference to crash.
🦋 Changeset detectedLatest commit: a41e2b1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
📝 WalkthroughWalkthroughExpose CTE/select sources and thread prior statement sources through AST description; improve column resolution by using select-sourced descriptions for CTEs/subselects; guard array_agg handling for empty subquery results and normalize literal unions; add tests and changelog entry for these fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as AST Describer
participant Resolver as Sources Resolver
participant Schema as Schema Lookup
participant Select as Select/CTE Analyzer
Client->>Resolver: request resolve columnRef (with prevSources / cteSelects)
Resolver->>Resolver: check sources map (tables, cteSources, prevSources)
alt source found (table/cte/subselect)
Resolver-->>Client: return source-derived column descriptions
else not found in sources
Resolver->>Schema: lookup by schema/table
Schema-->>Resolver: column definitions or not found
alt schema found
Resolver-->>Client: return schema-derived descriptions
else schema not found
Client->>Select: inspect referenced CTEs / subselects (cteSelects)
Select->>Resolver: build select-sourced column descriptions
Resolver-->>Client: return select-sourced descriptions (array/scalar handling)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile OverviewGreptile SummaryThis PR fixes a critical bug where
The implementation follows existing patterns in the codebase, reusing the recursive The changes are well-structured and maintain backward compatibility while extending functionality to handle previously unsupported query patterns. Confidence Score: 4/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a206eae069
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @packages/generate/src/ast-get-sources.ts:
- Around line 143-147: The early-return only handles inheritedSource?.kind ===
"cte" but must also treat "subselect" the same; locate the logic that sets
inheritedSource using prevSources?.get(node.RangeVar.relname) (and where
combinedPrevSources merges kind: "subselect" entries) and change the condition
to check for inheritedSource?.kind === "cte" || inheritedSource?.kind ===
"subselect" so that when a RangeVar.relname matches a previously recorded
subselect it is returned immediately just like a CTE.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.changeset/large-insects-peel.mdpackages/generate/src/ast-describe.tspackages/generate/src/ast-get-sources.tspackages/generate/src/generate.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- .changeset/large-insects-peel.md
🧰 Additional context used
🧬 Code graph analysis (1)
packages/generate/src/ast-describe.ts (3)
packages/generate/src/ast-get-sources.ts (1)
SourcesResolver(6-6)packages/ast-types/src/index.ts (3)
ColumnRef(1701-1704)SelectStmt(887-908)ParseResult(4-7)packages/generate/src/ast-decribe.utils.ts (1)
isColumnUnknownRef(15-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check (17)
- GitHub Check: check (16)
🔇 Additional comments (12)
packages/generate/src/ast-get-sources.ts (2)
109-114: LGTM!The
cteSourcesmapping correctly transforms CTE entries intoCteSubselectSourceobjects withkind: "cte", making them available for subsequent resolution.
216-220: Verify source merge order for potential shadowing.The merge order places
cteSourceslast, meaning CTE sources will override any conflicting names fromprevSourcesorfromClause. This is likely intentional to allow CTEs to shadow outer scopes, but worth confirming this matches PostgreSQL's scoping semantics.packages/generate/src/ast-describe.ts (7)
30-31: LGTM!Adding
prevSourcesas an optional parameter toASTDescriptionOptionsenables threading source context through nested descriptions, which is essential for resolving columns in CTEs and subselects.
916-927: Core fix for issue #441 - correctly handles empty described arrays.This guard prevents the
TypeError: Cannot read properties of undefined (reading 'type')by checking iffirstArg.described.length === 0and returning an array type withunknownelements. This matches the approach used forjson_agghandling at lines 965-967.
943-943: Normalize literal unions for consistent array element types.Using
normalizeLiteralUnionhere ensures that when the first argument's type is a union of literals (e.g., from UNION subqueries), they are collapsed to their base types. This prevents overly specific literal types in the resulting array.
1091-1114: Extended context resolution for CTE and subselect sources.The updated
getContextForColumnRefnow properly sets theresolverandselectcontext when the source is a CTE or subselect, enabling accurate column description resolution from nested queries.
1134-1140: Two-phase resolution strategy for column references.The updated flow tries schema-based resolution first, then falls back to select-sources resolution. This is a sensible approach that maintains backward compatibility while enabling resolution from CTEs and subselects.
1197-1252: New select-sourced resolution functions.
getDescribedColumnRefFromSelectSourcesandgetDescribedColumnsFromSelectenable column resolution from CTE and subselect sources by re-describing the nested SELECT statement. The implementation correctly:
- Only handles unknown column references (single-field ColumnRef)
- Collects selects from CTEs and subselects
- Uses
getASTDescriptionto get column descriptions- Filters by matching column name
One note:
getDescribedColumnsFromSelectmay compute descriptions multiple times for the same select if called repeatedly. Consider memoization if performance becomes a concern.
1461-1474: LGTM!
normalizeLiteralUnioncorrectly collapses unions containing literal types to their base types usinggetBaseTypeandmergeDescribedColumnTypes. This ensures consistent type inference for array elements from UNION queries.packages/generate/src/generate.test.ts (3)
924-949: Good test coverage for the core issue.This test validates
array_aggover a UNION subquery, which was one of the crash scenarios. The expected type{ kind: "array", value: { kind: "type", value: "string", type: "text" } }correctly reflects that literal unions are normalized to their base types.
951-974: Excellent - this test covers the exact scenario from issue #441.The test combines:
- CTE with UNION subquery
array_aggon the UNION resultcoalescewrapping the nullable array resultThis directly validates the fix prevents the TypeError and produces correct type inference.
2393-2409: Validates CTE scalar subquery type inference.This test confirms that a scalar subquery referencing a CTE correctly infers the column type (
stringforTEXT NOT NULL). The non-nullable result is expected since the CTE'snamecolumn isNOT NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @packages/generate/src/generate.test.ts:
- Around line 2422-2438: The test "scalar subquery from CTE should infer correct
type" is wrong because scalar subqueries can return null; update the expected
result to a union of string and null (matching the other scalar subquery test),
and in the implementation change the call that handles EXPR_SUBLINK so it passes
nullable: true to resolveType (look for EXPR_SUBLINK handling where
resolveType(...) is invoked, and set nullable flag to true) so scalar subqueries
always include null in their inferred type.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/generate/src/ast-describe.tspackages/generate/src/generate.test.ts
🧰 Additional context used
🧬 Code graph analysis (1)
packages/generate/src/ast-describe.ts (2)
packages/generate/src/ast-get-sources.ts (1)
getSources(42-483)packages/generate/src/ast-decribe.utils.ts (1)
isColumnUnknownRef(15-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check (16)
- GitHub Check: check (17)
🔇 Additional comments (12)
packages/generate/src/ast-describe.ts (9)
30-31: LGTM! Well-designed optional parameters for CTE context propagation.The addition of
prevSourcesandcteSelectsas optional parameters cleanly extends the API without breaking existing callers.
96-102: LGTM! Correct CTE collection with proper defensive checks.The logic correctly merges inherited CTEs from
params.cteSelectswith CTEs from the currentSELECTstatement'sWITHclause. The null checks onctequery?.SelectStmtandctenameensure robustness.
618-619: LGTM! Proper context threading for nested resolution.Passing
context.resolver.sourcesasprevSourcesandcontext.cteSelectsenables nestedSELECTstatements to resolve columns from parent CTEs and subqueries correctly.
927-938: LGTM! Core fix for Issue #441 - empty-array guard prevents TypeError.This guard correctly handles the case where
array_agg()is applied to UNION subqueries or CTEs that result in an emptydescribedarray. Returning an array ofunknownelements mirrors the approach ingetDescribedJsonAggFunCalland prevents theTypeErrorwhen accessingfirstArg.described[0].type.
1484-1496: LGTM! Clean normalization of literal unions to base types.This helper correctly collapses unions like
'a' | 'b'into their base typestring, ensuringarray_aggover literal-valued UNION branches infersstring[]rather than a union of literal types. The implementation properly short-circuits for non-unions and unions without literals.
1102-1128: LGTM! Correct context switching for CTE and subselect sources.The enhanced logic properly retrieves the
SelectStmtfor both CTE and subselect sources, enabling column references likecte.columnto be resolved against the correct resolver context. The fallback tocontext.selectwhen the select is not found is a reasonable defensive measure.
1145-1151: LGTM! Proper fallback chain for column resolution.The refactored
getDescribedColumnRefcorrectly prioritizes schema-based resolution and only falls back to select-sourced resolution when needed. This enables columns from CTEs and subqueries to be resolved without polluting the main resolution path.
1208-1246: LGTM! Comprehensive select-sourced column resolution.The helper correctly handles three source types:
RangeSubselect- inline subqueriesRangeVar- CTE references viacteSelectslookupJoinExpr- recursively processes both sidesThe flat-map over all matching columns enables proper type inference when multiple sources might provide the same column name.
1248-1274: LGTM! Clean helper for re-describing a SELECT statement.The function correctly wraps a
SelectStmtin a syntheticParseResultand threads all necessary context (includingprevSourcesandcteSelects) to the recursivegetASTDescriptioncall. This enables proper type inference for columns from nested selects and CTEs.packages/generate/src/generate.test.ts (3)
924-949: LGTM! Good regression test for Issue #441.This test directly validates the fix for the
TypeErrorthat occurred whenarray_agg()was applied to UNION subqueries. The expected typestring[] | nullis correct.
951-978: LGTM! Important test for CTE type isolation.This test ensures that only the referenced CTE (
foo) influences type inference, and the unused CTE (barwith integer type) doesn't leak into the result. This validates the fix mentioned in the commit message.
980-1003: LGTM! Good edge case test for coalesce with array_agg.This test validates that
coalesce(array_agg(...), ARRAY[]::TEXT[])correctly infers a non-nullablestring[]type, combining CTE resolution with nullability handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/generate/src/ast-get-sources.test.ts (1)
1-45: Test validates subselect inheritance correctly.The test appropriately covers the new
prevSourcesinheritance behavior for RangeVar lookups. The setup is clean and the assertion validates the core fix for issue #440.Consider adding edge case coverage:
- A test for CTE source inheritance (kind: "cte")
- A test verifying column resolution through inherited subselects
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
packages/generate/src/ast-describe.tspackages/generate/src/ast-get-sources.test.tspackages/generate/src/ast-get-sources.tspackages/generate/src/generate.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/generate/src/generate.test.ts
🧰 Additional context used
🧬 Code graph analysis (1)
packages/generate/src/ast-get-sources.test.ts (3)
packages/generate/src/generate.ts (1)
PgColRow(650-661)packages/generate/src/utils/get-relations-with-joins.ts (1)
FlattenedRelationWithJoins(74-79)packages/generate/src/ast-get-sources.ts (1)
getSources(42-483)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: check (16)
- GitHub Check: check (17)
🔇 Additional comments (14)
packages/generate/src/ast-get-sources.ts (3)
109-114: CTE sources map construction is correct.The transformation from the
ctesresolver map tocteSourceswith the{ kind: "cte", name, sources }shape is properly implemented and aligns with theCteSubselectSourcetype definition.
143-147: Inherited source lookup correctly prioritizes CTE/subselect propagation.The early return for inherited CTE and subselect sources enables proper type propagation for scalar subqueries referencing CTEs, addressing issue #440. The conditional correctly excludes table sources, which should be resolved fresh from schema metadata.
216-220: Source merging order correctly prioritizes local CTEs.The merge order ensures CTEs defined in the current
WITHclause take precedence over inherited sources and from-clause sources, which matches PostgreSQL's scoping semantics where CTEs shadow tables of the same name.packages/generate/src/ast-describe.ts (11)
30-31: New optional parameters correctly typed.The
prevSourcesandcteSelectsparameters are appropriately optional to maintain backward compatibility while enabling context propagation for nested CTE/subselect resolution.
104-116: Context initialization correctly threads CTE and source information.The
cteSelectsandprevSourcesare properly wired into the context and resolver, enabling the nested resolution features.
589-595: Correct nullability handling for scalar subqueries.Scalar subqueries (
EXPR_SUBLINK) can returnNULLwhen no rows match, so marking them as nullable is accurate.EXISTSsublinks always return a boolean, so they correctly remain non-nullable.
618-619: Proper context threading for nested SELECT descriptions.Passing
context.resolver.sourcesandcontext.cteSelectsto nestedgetASTDescriptioncalls enables correct type resolution for CTEs and subselects referenced in nested queries.
927-938: Core fix for issue #441: prevents crash on empty described array.This guard correctly handles the case when
array_agg()receives input from UNION subqueries or CTEs wherefirstArg.describedis empty. Returning an array ofunknownis an appropriate fallback that prevents theTypeErrorwhile maintaining type safety.
954-954: Normalizing literal unions improves type inference accuracy.Applying
normalizeLiteralUnionprevents overly specific literal types (e.g.,Array<1 | 2>) and instead produces the expected base type (e.g.,Array<number>), which is the correct inference for aggregated values.
1102-1125: Context switching for CTE and subselect column resolution is well-implemented.The code correctly updates both the
resolverandselectproperties when resolving columns from CTE or subselect sources. The fallback tocontext.selectwhen the SelectStmt isn't found prevents crashes while maintaining reasonable behavior.
1145-1151: Fallback to select-sourced resolution enables CTE column inference.The two-phase approach—schema-based resolution first, then select-sourced resolution—correctly handles columns from CTEs and subselects that aren't present in the database schema.
1208-1274: New select-sourced resolution functions correctly handle CTE and subselect columns.The implementation properly:
- Extracts SelectStmt from various source types (RangeSubselect, RangeVar via cteSelects, JoinExpr recursively)
- Recursively describes nested selects with proper context threading
- Filters columns by name matching
Note: If multiple sources define a column with the same name, all matching columns are returned via
flatMap. This is acceptable since PostgreSQL would require qualification in such ambiguous cases, but be aware this could return multiple results.
1483-1496:normalizeLiteralUnioncorrectly collapses literal unions to base types.The function properly detects unions containing literals and merges them to their base types, which is essential for producing sensible array element types from
array_aggover union subqueries.
96-102: No fix needed. The code correctly handlesundefinedvalues.
new Map(undefined)does not throw aTypeError—it returns an emptyMap(0) {}, which is the correct behavior. SincecteSelectsis defined as an optional property in the type signature, passingundefinedto the Map constructor is the intended pattern and works as expected.Likely an incorrect or invalid review comment.
fixes #441
fixes #440
Previously, array_agg would fail to correctly infer types when the input came from union subqueries or CTEs. This caused type inference to crash.
Summary by CodeRabbit
Bug Fixes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.