[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419
[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419cloud-fan wants to merge 40 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
I think the current view implementation which stores the original SQL text and a bunch of context is too convoluted to put into the public DS v2 API. It's better if the view text is context-independent. We are going to improve it in #51410
Before the improvement is done, we only allow to read DS v2 views that has context-independent SQL text.
There was a problem hiding this comment.
Don't we store the current catalog and namespace in the view metadata? Do we expect the connectors to modify the view SQL text?
There was a problem hiding this comment.
My proposal is to let Spark modify the view text before saving it into the catalog, so that the catalog does not need to store the current catalog/namespace.
There was a problem hiding this comment.
So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?
There was a problem hiding this comment.
So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?
There was a problem hiding this comment.
Hive already did so and its view has both "view_text" and "original_view_text" fields. All identifiers should be fully qualified (with catalog name and namespace) in the view text.
There was a problem hiding this comment.
how about SPARK_TABLE_OR_VIEW
There was a problem hiding this comment.
shall we explicitly mention what operations will be affected?(read/write/DDL/...)
There was a problem hiding this comment.
The capability has since been reshaped: TableCapability is gone in favor of the concrete MetadataOnlyTable for the read side, and a TableCatalogCapability.SUPPORTS_VIEW gate for the write side. The SUPPORTS_VIEW javadoc now spells out the affected operations explicitly (CREATE VIEW / CREATE OR REPLACE VIEW / CREATE VIEW IF NOT EXISTS via createTable, ALTER VIEW ... AS via dropTable+createTable or stageReplace, and the read-path round-trip through MetadataOnlyTable). PTAL and let me know if anything is still unclear.
|
I'd love to take a look on Monday. |
There was a problem hiding this comment.
I understand we may want to use the new API to expose views, but what about the case with Spark table? When would this be helpful?
There was a problem hiding this comment.
For example, UC does not want to rely on the Spark file source, but just set the table provider to the file source name and leave read/write to Spark. Today this is done by a hack with V1Table: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala#L303
There was a problem hiding this comment.
Ah, got it. The use case is catalogs that govern tables but not necessarily implement read/write logic.
Have we considered offering a generic V2 table implementation for built-in formats that would be accessible to external connectors? Essentially, a public version of V1Table that doesn't need to expose CatalogTable. If we go with the table capability approach, then each connector will have to implement a custom V2 table for it to be simply replaced as a Parquet table or view. Each connector would have to be aware of how the translation will be done in Spark. For instance, it seems like we assume that serdeProps will be with prefixed with option..
Just curious to know your thinking, I don't have a strong opinion here.
There was a problem hiding this comment.
I see, this makes sense. A concrete class is easier to use than a table capability.
aaa56bc to
b5c909e
Compare
5d5a508 to
5517dee
Compare
| } | ||
|
|
||
| // TODO: move the v2 data source table handling from V2SessionCatalog to the analyzer | ||
| ignore("v2 data source table") { |
There was a problem hiding this comment.
Shall we add this test case later?
There was a problem hiding this comment.
This will be fixed shortly after the PR is merged.
| * implementing read/write directly. It represents a general Spark data source table or | ||
| * a Spark view, and relies on Spark to interpret the table metadata, resolve the table | ||
| * provider into a data source, or read it as a view. | ||
| * This affects the table read/write operations but not DDL operations. |
There was a problem hiding this comment.
This affects the table read/write operations but not DDL operations.
It seems a bit unclear. Before the change, DDL operations of DSV2 tables relies on Spark too.
There was a problem hiding this comment.
maybe we should just remove this line? DDL operations do not call read/write APIs of Table anyway.
szehon-ho
left a comment
There was a problem hiding this comment.
Makes sense, just wondering if some of it can be less bound to HMS concepts
| return this; | ||
| } | ||
|
|
||
| public Builder withSerdeProps(Map<String, String> serdeProps) { |
There was a problem hiding this comment.
Just an opinion, serde is quite bound to HMS (?) and the newer DSV2 Catalog dont have that, will it make sense to abstract this (maybe something like 'storageProperties')? This is just for HMS table support?
There was a problem hiding this comment.
good point. I was trying to save the work of splitting serde properties from table properties, but API simplicity is more important.
| database = Some(ident.namespace().lastOption.getOrElse("root")), | ||
| catalog = Some(catalog.name())), | ||
| tableType = tableType, | ||
| storage = CatalogStorageFormat.empty.copy( |
There was a problem hiding this comment.
curious, do we not need to set inputFormat/outputFormat?
There was a problem hiding this comment.
V1Table itself does not access the input/output format when translating v1 table to v2 table. I think Hive table is never supported by DS v2 and we will not handle it here.
szehon-ho
left a comment
There was a problem hiding this comment.
Not entirely familiar with the end use case, but the API looks good now, thanks
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Hi, @cloud-fan . Do you have a plan to proceed this forward?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
The capability now gates both CREATE VIEW and ALTER VIEW, so the create-only name misrepresents the feature set. "SUPPORTS_VIEW" reads like the other TableCatalogCapability entries (SUPPORTS_CREATE_TABLE_* are about creation only; view support is the full lifecycle). Co-authored-by: Isaac
… tests - uncacheTableOrView now uses ResolvedIdentifier overload so multi-level namespaces aren't narrowed to a single database part - V2AlterViewPreparation.viewSchemaMode delegates to CatalogTable.viewSchemaMode to match v1 defaults and honor viewSchemaBindingEnabled - drop unused referredTempFunctions field from V2 view execs - gate on TableCatalog + SUPPORTS_VIEW together in DataSourceV2Strategy so non-TableCatalog plugins still see MISSING_CATALOG_ABILITY.VIEWS - add tests for temp variable rejection and cyclic v2 view references - split DataSourceV2MetadataOnlyTableSuite into table-read and view suites - doc polish: PROP_VIEW_TEXT, MetadataOnlyTable javadoc, stale comments
…alyzer - Add CatalogTable.fullIdentOpt / fullIdent so v2 catalogs with multi-level namespaces (via MetadataOnlyTable) can carry the real [catalog, ns..., name] that v1 TableIdentifier can't represent. - V1Table.toCatalogTable populates fullIdentOpt from the v2 identifier. - SessionCatalog.getRelation uses fullIdentOpt for the SubqueryAlias qualifier, falling back to qualifyIdentifier for v1 session-catalog tables. Fixes fully-qualified column references against non-session v2 catalogs (qualifier was hardcoded to spark_catalog). - checkCyclicViewReference and recursiveViewDetectedError now take Seq[String] and compare via CatalogTable.fullIdent, so views in multi-level namespaces sharing the last segment (cat.ns1.a.v vs cat.ns2.a.v) no longer collide. - Move the v2-path cyclic-view check from the four exec sites into the new CheckViewReferences analyzer rule, gated on replace for CreateView. v1 keeps its exec-time check as the Dataset API safety net. - Replace two non-ASCII em-dashes in V1Table.scala comments with ASCII. - Tests: fully-qualified column reference on v2 catalog (TableSuite), cyclic detection across multi-level namespaces for both CREATE OR REPLACE and ALTER paths (ViewSuite). Co-authored-by: Isaac
- CreateV2ViewExec / AtomicCreateV2ViewExec: replace the separate `tableExists` + implicit-assume-view flow with a single `loadTable` round-trip and a `MetadataOnlyTable` + PROP_TABLE_TYPE=VIEW check. REPLACE'ing a non-view table as a view is rejected with EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE; plain CREATE surfaces TABLE_OR_VIEW_ALREADY_EXISTS; IF NOT EXISTS remains a no-op. Matches v1 CreateViewCommand semantics. - V2AlterViewPreparation: stop stripping TABLE_RESERVED_PROPERTIES from the existing view's properties. PROP_OWNER (and other non-transient reserved fields) now flow through unchanged, matching v1 AlterViewAsCommand.alterPermanentView's viewMeta.copy semantics. Keys the ALTER actually changes are overwritten downstream. - CheckViewReferences: collapse duplicated legacyNameFor/fullIdentFor extractors onto a shared `catalogAndIdent` helper. - Tests: add three new cases - CREATE/REPLACE-over-non-view-table rejection on both plain and staging catalogs, PROP_OWNER preservation across ALTER VIEW AS, and SCHEMA EVOLUTION mode preservation across ALTER VIEW AS.
| def unapply(resolved: LogicalPlan): Option[TableIdentifier] = resolved match { | ||
| case ResolvedPersistentView(catalog, ident, _) => | ||
| assert(isSessionCatalog(catalog)) | ||
| case ResolvedPersistentView(catalog, ident, _) if isSessionCatalog(catalog) => |
There was a problem hiding this comment.
The comment above says non-session views "fall through so they can be picked up by v2 strategies." Only AlterViewAs is actually picked up. ResolvedViewIdentifier is also used by SetViewProperties (line 176), UnsetViewProperties (line 179), AlterViewSchemaBinding (line 520), RenameTable-on-view (line 194), and DescribeRelation-on-view (line 198-199) — none of which have a v2 strategy case. After this PR a user can CREATE VIEW view_catalog.ns.v ... and then ALTER VIEW view_catalog.ns.v SET TBLPROPERTIES('k'='v') produces a generic planner "no plan for..." failure. Pre-PR, CreateView was rejected on non-session catalogs, so this orphan state was structurally unreachable; the pre-PR error path (MISSING_CATALOG_ABILITY.VIEWS) is no longer.
There is no test coverage for any of these five plans on a v2 view: DataSourceV2MetadataOnlyViewSuite has no test for SET TBLPROPERTIES, UNSET TBLPROPERTIES, WITH SCHEMA BINDING, RENAME TO, DESCRIBE, SHOW TBLPROPERTIES, or SHOW CREATE VIEW against a v2-catalog view (the only WITH SCHEMA EVOLUTION reference is on the CREATE VIEW path, line 462). The TODO at lines 36-37 of that suite acknowledges these as follow-ups, but nothing pins the current failure mode, so any future change (e.g., a reshape of planner error classes) can silently regress the UX further.
Two options:
- Pin the current behavior — for each of the five plan types, add a test that runs the statement against a v2 view and asserts the error it throws today. Future changes then surface in the diff.
- Close the gap up front — add explicit
DataSourceV2Strategycases (or a fall-through inResolveSessionCatalog) that throw a cleanUNSUPPORTED_FEATURE/FEATURE_NOT_YET_SUPPORTEDnaming the statement. Tests become one-per-plan and the UX doesn't regress between this PR and the follow-ups.
Option 2 is the cleaner closure given this PR is already landing the architectural change that enables the orphaning. Option 1 is the minimum safety net.
| partitionColumnNames = partCols, | ||
| bucketSpec = bucketSpec, | ||
| owner = props.getOrElse(TableCatalog.PROP_OWNER, "unknown"), | ||
| viewText = viewText, |
There was a problem hiding this comment.
viewText is read and assigned unconditionally. If a catalog returns a MetadataOnlyTable with PROP_VIEW_TEXT set but PROP_TABLE_TYPE is EXTERNAL/MANAGED (misconfiguration, or a future capability-composition catalog), the synthesized CatalogTable ends up with non-None viewText on a non-view — confusing downstream code that uses viewText.isDefined as an "is-view" proxy. Scoping the read to VIEW costs nothing:
| viewText = viewText, | |
| val viewText = if (tableType == CatalogTableType.VIEW) { | |
| props.get(TableCatalog.PROP_VIEW_TEXT) | |
| } else { | |
| None | |
| } |
| // asTableCatalog would throw). | ||
| val tableCatalog = catalog match { | ||
| case tc: TableCatalog | ||
| if tc.capabilities().contains(TableCatalogCapability.SUPPORTS_VIEW) => tc |
There was a problem hiding this comment.
Nice cleanup on the CheckViewReferences side with the new catalogAndIdent helper. The same case tc: TableCatalog if tc.capabilities().contains(TableCatalogCapability.SUPPORTS_VIEW) => tc; case _ => throw missingCatalogViewsAbilityError(catalog) pattern is still duplicated at lines 310-314 (CREATE VIEW) and 330-334 (ALTER VIEW AS). Similarly the TableIdentifier(ident.name, ident.namespace.lastOption, Some(catalog.name)) idiom for error-rendering is repeated in CreateV2ViewExec:60-64 and V1Table.toCatalogTable:153-156. Small helpers — e.g. CatalogV2Util.requireViewSupport(catalog) and an asLegacyTableIdentifier(catalogName) on IdentifierHelper — would eliminate the remaining drift risk. Non-blocking.
| } | ||
| } | ||
|
|
||
| test("ALTER VIEW on a catalog without SUPPORTS_VIEW fails") { |
There was a problem hiding this comment.
TestingTableOnlyCatalog.loadTable always throws NoSuchTableException, so the ALTER fails at view resolution — the capability gate in DataSourceV2Strategy (line 330-333) is never reached. The test body's own comment acknowledges this. As a result the capability-gate rejection on the ALTER path has no real coverage.
Two fix options:
- Rename to
"ALTER VIEW on a missing view fails"— matches what it actually tests. - Better: extend
TestingTableOnlyCatalogto store aMetadataOnlyTableview so the gate rejection is genuinely exercised. This would also catch regressions ifSUPPORTS_VIEWis inadvertently added to the default capability set.
| } | ||
| } | ||
|
|
||
| test("read view resolves unqualified refs via captured current catalog/namespace") { |
There was a problem hiding this comment.
The PR's multi-level-namespace correctness hinges on the QuotingUtils.quoted → parseMultipartIdentifier round-trip and the v1 unqualified-reference expansion both preserving the captured namespace. Today that's covered by (a) a builder-level serialization test (line 83) and (b) cycle-detection tests using ns1.inner.v-style identifiers. But there is no end-to-end read test that exercises the round-trip by actually resolving an unqualified reference inside a view whose captured namespace has >1 part. Add one: .withCurrentCatalogAndNamespace("spark_catalog", Array("db1", "db2")), create a table in that namespace, reference it unqualified in the view body, and check the result.
… errors, new tests - drop unused `viewOnly` parameter on `Analyzer.lookupTableOrView` - reorder `CreateV2ViewExec`/`AtomicCreateV2ViewExec` to short-circuit IF NOT EXISTS before building the TableInfo, matching v1 `CreateViewCommand.run` - extract `CatalogTable.viewSchemaModeFromProperties` so `V2AlterViewPreparation` no longer round-trips through `V1Table.toCatalogTable` just to read the mode - cross-reference v1/v2 view-check locations in `CreateViewCommand` and `AlterViewAsCommand` Scaladoc - document `TableInfo.Builder.withProperties` / convenience-setter ordering on `withProperties` itself and add brief docs to the convenience setters - require a view-typed `MetadataOnlyTable` at ALTER VIEW exec time (tightens the race-between-analysis-and-exec surface) - rename `CatalogTable.fullIdentOpt` to `multipartIdentifier` - widen `viewDepthExceedsMaxResolutionDepthError` to take `Seq[String]` so v2 multi-level namespaces are reflected in the error message - move the `SUPPORTS_VIEW` gate from `DataSourceV2Strategy` into `CheckViewReferences`; strategy cases now cast directly since analysis verifies the capability first - add regression tests: ALTER VIEW re-captures current session SQL configs; CREATE OR REPLACE VIEW whose new body references a nonexistent table fails at analysis Co-authored-by: Isaac
… doc reconciliation Read path: - V1Table.toCatalogTable: gate viewText read on tableType == VIEW so a non-view MetadataOnlyTable with a stray PROP_VIEW_TEXT doesn't synthesize a non-view CatalogTable with non-None viewText. ALTER VIEW execs: - Replace `val _ = existingTable` (obscure lazy-val side effect) with a named `requireExistingView()` helper in V2AlterViewPreparation. - Race between analysis and exec (target dropped or replaced as a non-view between lookup and run) now surfaces as EXPECT_VIEW_NOT_TABLE instead of SparkException.internalError. - AtomicCreateV2ViewExec: reject plain CREATE on an existing view up front with viewAlreadyExists(), matching the non-atomic exec (non-atomic path relied on catalog-side TableAlreadyExistsException, which StagingTableCatalog doesn't formally require). Orphan-plan pinning: - Add DataSourceV2Strategy cases for v2-catalog plans that ResolveSessionCatalog no longer rewrites: SetViewProperties, UnsetViewProperties, AlterViewSchemaBinding, RenameTable, ShowCreateTable, ShowTableProperties, ShowColumns, DescribeRelation, DescribeColumn on ResolvedPersistentView. Each throws UNSUPPORTED_FEATURE.TABLE_OPERATION naming the statement, pinning the current UX until the follow-up PRs land. SHOW VIEWS for v2: - New ShowViewsExec enumerates via TableCatalog.listTableSummaries(namespace) and filters to TableSummary.VIEW_TABLE_TYPE; wired in DataSourceV2Strategy. - ResolveSessionCatalog's ShowViews handler now skips (via guard) for SUPPORTS_VIEW catalogs so they reach the v2 strategy; non-session, non- SUPPORTS_VIEW catalogs still get the MISSING_CATALOG_ABILITY.VIEWS rejection. API contract reconciliation: - Javadocs on TableCatalog.loadTable / dropTable / tableExists / alterTable / renameTable / listTables / purgeTable and StagingTableCatalog.stageCreate / stageReplace / stageCreateOrReplace now spell out the SUPPORTS_VIEW split: loadTable returns views as MetadataOnlyTable, dropTable/tableExists/listTables include views (listTables also includes views for v1 parity with SHOW TABLES), while alterTable / renameTable / purgeTable / versioned+timestamped loadTable remain table-only. - Add IdentifierHelper.asLegacyTableIdentifier(catalogName) to share the lossy multi-part -> v1 TableIdentifier idiom; use in V1Table.toCatalogTable, V2ViewPreparation.legacyName, CheckViewReferences.legacyNameFor. Misc: - ResolveSessionCatalog: rename local var `child` -> `query` in CreateView pattern to match the case-class field name; update the stale ResolvedViewIdentifier comment to describe the new v2-strategy behavior. Tests: - New multi-part namespace round-trip unit test in DataSourceV2MetadataOnlyViewSuite (Builder -> V1Table.toCatalogTable -> viewCatalogAndNamespace preserves [cat, db1, db2]). - Orphan-plan pinning tests: UNSUPPORTED_FEATURE.TABLE_OPERATION for SET/UNSET TBLPROPERTIES, WITH SCHEMA, RENAME TO, SHOW CREATE TABLE, SHOW TBLPROPERTIES, SHOW COLUMNS, DESCRIBE TABLE; clean AnalysisException for DESCRIBE COLUMN (fails at column resolution before reaching the strategy). - SHOW TABLES on a v2 catalog includes views (v1 parity); SHOW VIEWS returns only views; SHOW VIEWS with LIKE filter; SHOW VIEWS on non-SUPPORTS_VIEW rejected with MISSING_CATALOG_ABILITY.VIEWS. - TestingTableOnlyCatalog now round-trips a view-typed MetadataOnlyTable so the ALTER VIEW capability-gate test actually reaches the gate (expected MISSING_CATALOG_ABILITY.VIEWS), closing a coverage hole. Co-authored-by: Isaac
…comment, add multi-part captured-namespace read test - CatalogV2Util.supportsView: shared predicate replacing duplicated TableCatalog+SUPPORTS_VIEW check in CheckViewReferences and ResolveSessionCatalog. - DataSourceV2MetadataOnlyViewSuite: correct the misleading "body is validated first" comment around CREATE VIEW IF NOT EXISTS on the atomic exec (tryLoadTable short-circuits before buildTableInfo), and add an end-to-end SQL test exercising multi-part captured catalog/namespace round-trip for an unqualified view-body reference. Co-authored-by: Isaac
…OP VIEW Co-authored-by: Isaac
Pre-PR, Analyzer.lookupTableOrView had a viewOnly gate that rejected all UnresolvedView lookups on non-session catalogs up front with UNSUPPORTED_FEATURE.CATALOG_OPERATION. That gate was removed earlier in this PR. For non-SUPPORTS_VIEW catalogs the ALTER VIEW path now falls through to CheckAnalysis, which surfaces TABLE_OR_VIEW_NOT_FOUND when the view does not exist. Either error is acceptable; this aligns the test with the simpler no-gate behavior. Co-authored-by: Isaac
…jection on MISSING_CATALOG_ABILITY.VIEWS Without the gate, ALTER VIEW variants on a non-SUPPORTS_VIEW v2 catalog fell through to TABLE_OR_VIEW_NOT_FOUND when the view did not exist -- misleading, since the catalog cannot host views at all. Bring back `lookupTableOrView`'s `viewOnly` flag and reject non-session non-SUPPORTS_VIEW catalogs upfront. Switch DROP VIEW's existing rejection path and the restored gate to use the same MISSING_CATALOG_ABILITY.VIEWS error class CheckViewReferences already uses for CREATE/ALTER VIEW AS, so users see one consistent error for the "catalog does not support views" condition across all view DDL. Co-authored-by: Isaac
…, drop defensive AlterViewAs gate, misc cleanups - MetadataOnlyTable: drop the no-arg constructor (and the "data_source_table_or_view" placeholder it defaulted to); require callers to pass a name, typically ident.toString. Before this, DESCRIBE TABLE EXTENDED on a MetadataOnlyTable-backed table showed "Name: data_source_table_or_view" instead of the real identifier. Updated all (test-only) callsites and added a DESCRIBE pin. - V2AlterViewPreparation.existingTable: fold through the parent trait's tryLoadTable helper so the load/view-check lives in one place. - CheckViewReferences: remove the redundant requireSupportsView call on the AlterViewAs branch. The analyzer's lookupTableOrView(viewOnly=true) already rejects non-SUPPORTS_VIEW catalogs before we get here. - V1Table.toCatalogTable: default owner to "" (matches v1 CatalogTable default) instead of "unknown". - Tests: add ALTER VIEW rejections for temp views and temp variables to mirror the CREATE VIEW matrix; fix the stale ALTER-capability-gate test comment; add a DESCRIBE-extended pin for the MetadataOnlyTable name surface. - Doc: fix a hardcoded line-number reference in DataSourceV2Strategy and a split Scaladoc link in v2Commands.ShowViews. Co-authored-by: Isaac
…serve on ALTER; minor dash nit Co-authored-by: Isaac
…ti-part error rendering - Introduce ViewInfo extends TableInfo carrying typed fields (queryText, currentCatalog, currentNamespace, sqlConfigs, schemaMode, queryColumnNames). SUPPORTS_VIEW catalogs branch on `instanceof ViewInfo` inside createTable and the StagingTableCatalog staging variants; loadTable returns MetadataOnlyTable wrapping a ViewInfo for views. ViewInfo's ctor auto-sets PROP_TABLE_TYPE=VIEW so generic viewers (listTableSummaries default impl, DESCRIBE) classify correctly. - Remove the property-bag encoding: PROP_VIEW_TEXT, PROP_VIEW_CURRENT_CATALOG_AND_NAMESPACE, VIEW_CONF_PREFIX gone from TableCatalog; the corresponding TABLE_RESERVED_PROPERTIES entries gone from CatalogV2Util; CatalogTable.VIEW_SQL_CONFIG_PREFIX reverted to its pre-PR form. - Delete the dormant ViewCatalog / View / ViewChange / old ViewInfo (@DeveloperAPI but never wired into analyzer/planner) since TableCatalog + SUPPORTS_VIEW subsumes it. - Fix multi-level-namespace rendering in four view error constructors: viewAlreadyExistsError, unsupportedCreateOrReplaceViewOnTableError, and the two CREATE_VIEW_COLUMN_ARITY_MISMATCH errors now take Seq[String] instead of a lossy TableIdentifier (asLegacyTableIdentifier collapsed cat.ns1.ns2.v to cat.ns2.v). v2 callers pass catalog.name +: ident.asMultipartIdentifier; v1 callers pass name.nameParts. - MetadataOnlyTable.constraints() now delegates to info.constraints() instead of returning an empty array.
…es in temp-object errors; restore 3-part v1 session-catalog SubqueryAlias Co-authored-by: Isaac
- PlanResolutionSuite drop-view v2: expect MISSING_CATALOG_ABILITY.VIEWS (Analyzer.lookupTableOrView now routes non-SUPPORTS_VIEW v2 catalogs through that error instead of UNSUPPORTED_FEATURE.CATALOG_OPERATION). - explain golden files: regenerate to include CreateView.isAnalyzed in argString (new field from this PR's AnalysisOnlyCommand conversion). Co-authored-by: Isaac
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
SPARK-39660 split v2 DESCRIBE TABLE PARTITION off into its own DescribeTablePartition plan and dropped `partitionSpec` from DescribeRelation. Our v2-view pin case had 4 wildcards; reduce to 3 to match the new 3-field case class. Co-authored-by: Isaac
Javadoc died mid-stream while generating CatalogV2Implicits.IdentifierHelper.html (the failing PR's log stops exactly there; the succeeding PRs continue past to MultipartIdentifierHelper and CatalogV2Util). The only diff in IdentifierHelper on this branch was the new asLegacyTableIdentifier method, whose scaladoc used `[[TableIdentifier]]` / `[[toQualifiedNameParts]]` / backtick-inlined code refs. Something in that doc tripped javadoc into a hard exit (not a warning) instead of a broken-link warning. Fix: downgrade both new scaladoc blocks in the exposed-to-javadoc connector/catalog package to plain `//` comments so genjavadoc doesn't emit them into the Java stub at all: - CatalogV2Implicits.IdentifierHelper.asLegacyTableIdentifier - CatalogV2Util.supportsView (same risky pattern, hasn't been reached yet because javadoc died earlier, but would break next) The method names are self-documenting; internal callers don't need the scaladoc. Co-authored-by: Isaac
Restore ViewCatalog as the plugin-facing API for view-only catalogs and
view DDL operations, instead of routing views through TableCatalog under
a SUPPORTS_VIEW capability flag. Catalog-implementer ergonomics:
* Pure view-only: implement ViewCatalog. 5 methods (listViews,
loadView/createView/replaceView/dropView), default viewExists. No
instanceof, no capability declaration, no TableCatalog stubs.
* Pure tables: implement TableCatalog. Same as today.
* Mixed (Iceberg/UC): implement both interfaces independently. Single
cross-cutting invariant -- one identifier namespace; createTable
rejects view-collisions, createView rejects table-collisions.
ViewCatalog API:
Identifier[] listViews(String[] namespace);
ViewInfo loadView(Identifier);
ViewInfo createView(Identifier, ViewInfo);
ViewInfo replaceView(Identifier, ViewInfo); // atomic per-call
boolean dropView(Identifier);
default boolean viewExists(Identifier);
No StagingViewCatalog -- view REPLACE writes only metadata, so a single
transactional metastore call (or equivalent) is sufficient. CREATE OR
REPLACE VIEW probes viewExists then dispatches createView/replaceView.
Spark-side dispatch:
* Analyzer.lookupTableOrView: try TableCatalog.loadTable first; on
NoSuchTableException, if catalog is ViewCatalog, fall back to
loadView and synthesize ResolvedPersistentView.
* Mixed-catalog perf opt-in: loadTable may return
MetadataOnlyTable(ViewInfo) for view idents, short-circuiting the
second RPC. Documented on TableCatalog#loadTable.
* DataSourceV2Strategy: routes CREATE/ALTER/DROP/SHOW VIEWS to
ViewCatalog only; staging branches removed.
* ResolveSessionCatalog: SUPPORTS_VIEW guards replaced with
instanceof ViewCatalog.
Internal: V1Table.toCatalogTable for ViewInfo is now public so the
analyzer can synthesize CatalogTable from a loadView result for the
session-catalog v1 view-resolution path.
Out of scope for this commit:
* Test suite rewrite (DataSourceV2MetadataOnlyViewSuite still uses
SUPPORTS_VIEW and TestingStagingCatalog) -- broken until the
follow-up commit.
* Lifting the session-catalog gate on DESCRIBE/SHOW CREATE TABLE/SHOW
COLUMNS/SHOW TBLPROPERTIES for v2 views -- still pinned with
UNSUPPORTED_FEATURE.TABLE_OPERATION; tracked as follow-up.
Co-authored-by: Isaac
…ewCatalog The structural rework removed TableCatalogCapability.SUPPORTS_VIEW and introduced ViewCatalog as the plugin-facing API for views. The existing test catalogs (TestingViewCatalog, TestingStagingCatalog) now implement both TableCatalog and ViewCatalog, sharing one identifier-keyed map per the mixed-catalog contract. Storage value's runtime type (ViewInfo vs TableInfo) distinguishes views from tables on each lookup; tableExists / listTables exclude view entries, viewExists / listViews include only views, and createTable / createView each reject cross-type collisions. Test-name renames replace "without SUPPORTS_VIEW" with "without ViewCatalog" to track the new API. The rest of the test bodies are unchanged. Co-authored-by: Isaac
…ec test names - Analyzer.lookupTableOrView and RelationResolution.tryResolvePersistent skip CatalogV2Util.loadTable for pure ViewCatalogs (no TableCatalog mixin), so asTableCatalog no longer throws MISSING_CATALOG_ABILITY.TABLES and masks the legitimate loadView fallback. SELECT and ALTER VIEW now work end-to-end on a pure ViewCatalog. - Add a TestingViewOnlyCatalog fixture (no TableCatalog mixin) plus read and ALTER VIEW tests that exercise the loadView fallback. - DataSourceV2MetadataOnlyViewSuite: rename "uses the atomic exec" tests to reflect that view DDL routes through ViewCatalog.createView / replaceView (no separate staging variant); drop now-dead RecordingStagedTable; replace TestingStagingCatalog's stage* method bodies with explicit "must not be invoked by view DDL" throws so any future regression that misroutes through the staging API surfaces immediately. Co-authored-by: Isaac
… over a non-view table; align SHOW TABLES test with new listTables contract; defensive null check on MetadataOnlyTable - CreateV2ViewExec.run: probe both viewExists and tableExists up front. CREATE VIEW IF NOT EXISTS over a non-view table is now a no-op (v1 parity: see SQLViewSuite "existing a table with the duplicate name when CREATE VIEW IF NOT EXISTS"); the previous code called rejectIfTable() unconditionally before the allowExisting check and threw TABLE_OR_VIEW_ALREADY_EXISTS for what should be a no-op. Non-IF-NOT-EXISTS CREATE / OR REPLACE still surfaces the dedicated EXPECT_VIEW_NOT_TABLE / TABLE_OR_VIEW_ALREADY_EXISTS error. Drop the now-unused rejectIfTable / replaceArg trait helpers (and the AlterV2ViewExec override). - DataSourceV2MetadataOnlyViewSuite: rename "SHOW TABLES on a v2 catalog includes views (v1 parity)" to "SHOW TABLES on a v2 catalog returns only tables" and flip the assertion. The new TableCatalog.listTables contract excludes views (per the file Javadoc); the previous test name + body asserted v1-parity which the implementation does not provide and ShowTablesExec is not changed by this PR. Documents the intentional v2 divergence. - MetadataOnlyTable: Objects.requireNonNull on `info` and `name` so a connector that constructs the wrapper with nulls fails fast at construction time rather than producing cryptic NPEs in downstream consumers (DescribeTableExec's Name row, DataSourceV2Relation logging). Co-authored-by: Isaac
- ViewInfo class doc: complete the dangling "construct." sentence with its
direct object ("construct a ViewInfo") so the line reads as a complete
thought.
- TableInfo Builder: replace the awkward use of "write" as a noun
("discards the convenience setter's write") with verb form ("discards
the value the convenience setter wrote").
Co-authored-by: Isaac
…talog gating Commit 66fa409 added `catalog.isInstanceOf[TableCatalog]` to RelationResolution.tryResolvePersistent's gating but didn't add TableCatalog to the explicit-list import block; CI failed at catalyst compile with `not found: type TableCatalog`. Add the import. Co-authored-by: Isaac
What changes were proposed in this pull request?
This PR exposes a DS v2 API for metadata-only tables (read side), CREATE VIEW, and ALTER VIEW ... AS (write side) so that third-party v2 catalogs can participate in Spark's resolution and creation flows without reimplementing read/write themselves.
1. Read path —
MetadataOnlyTable:Tableimplementation that carries aTableInfoand delegates everything to it. Catalogs return it fromloadTableto signal "Spark, interpret this via V1 paths" — data-source reads for file-source tables, view-text expansion for views (the latter via the perf opt-in described in section 4).Analyzer.lookupTableOrViewandRelationResolution.createRelationdetectMetadataOnlyTableand route through a newV1Table.toCatalogTableadapter to the existing V1 data-source / view machinery.2. Shared DTO —
TableInfo:TableInfo.Buildergains convenience setters that write reserved keys intoproperties:withProvider,withLocation,withComment,withCollation,withOwner,withTableType, pluswithSchema(StructType). The read side (MetadataOnlyTable) and the write side (createTable(ident, TableInfo)) use the same struct. View-specific fields live on a typed subclass (see section 3) so they are not encoded as string properties.withPropertiestakes a defensive copy so convenience setters don't mutate the caller's map.3. Typed view DTO —
ViewInfo:ViewInfo extends TableInfocarries the view-specific fields that cannot be represented as string table properties:queryText,currentCatalog,currentNamespace(multi-part, nevernull; empty when no namespace was captured),sqlConfigs(unprefixed SQL config keys),schemaMode(BINDING/COMPENSATION/TYPE EVOLUTION/EVOLUTION), andqueryColumnNames(mapping query output to the view's declared columns; empty inEVOLUTIONmode).ViewInfo.Builder extends TableInfo.BaseBuilder<Builder>adds typed setters:withQueryText,withCurrentCatalog,withCurrentNamespace,withSqlConfigs,withSchemaMode,withQueryColumnNames. The inheritedTableInfo.BaseBuildersetters (schema, properties, owner, comment, collation, etc.) are available on the same builder so view and table writes share one fluent API.ViewInfoconstructor stampsPROP_TABLE_TYPE = TableSummary.VIEW_TABLE_TYPEintoproperties()so catalogs and generic viewers readingPROP_TABLE_TYPEfrom the properties bag (e.g.TableCatalog.listTableSummariesdefault impl,DESCRIBE) classify the entry asVIEWwithout requiring authors to rememberwithTableType(VIEW).ViewInfois the typed payload returned byViewCatalog.loadViewand accepted bycreateView/replaceView. It still extendsTableInfoso a mixed catalog can opt into the perf path described in section 4 (returningMetadataOnlyTable(ViewInfo)fromloadTable); pure view-only catalogs never seeTableInfodirectly because the typed builder covers everything they construct.4. View support —
ViewCataloginterface:ViewCataloginterface is the plugin-facing API for views. It is independent fromTableCatalog: a connector implements justViewCatalog(view-only catalog), justTableCatalog(table-only catalog), or both (mixed catalog like Hive / Iceberg / Unity Catalog). There is no capability flag — interface presence is the signal.listViews(namespace),loadView(ident)returningViewInfo,createView(ident, ViewInfo),replaceView(ident, ViewInfo),dropView(ident), defaultviewExists(ident), defaultinvalidateView(ident). No staging variant —replaceViewis a single atomic-swap call.createTablerejects view-collisions andcreateViewrejects table-collisions (one extra existence check the catalog already needs internally).loadTablemay return aMetadataOnlyTablewrapping aViewInfofor a view identifier; Spark's resolver discriminates byinstanceof ViewInfoand routes through view resolution without a follow-uploadViewRPC. If the catalog instead throwsNoSuchTableExceptionfor a view identifier, Spark falls back toloadView(one extra RPC on cold cache).Analyzer.lookupTableOrView(andRelationResolution.tryResolvePersistent): triesloadTablefirst only when the catalog is aTableCatalog(or the session catalog) — otherwise the underlyingasTableCatalogcast would throwMISSING_CATALOG_ABILITY.TABLESfor a pureViewCatalogand mask the legitimateloadViewfallback. OnNoSuchTableException(or whenloadTableis skipped), if the catalog is aViewCatalog, callsloadViewand synthesizes aResolvedPersistentViewfrom the resultingViewInfoviaV1Table.toCatalogTable(catalog, ident, viewInfo).ViewCataloggetMISSING_CATALOG_ABILITY.VIEWSfrom the resolver gate (forUnresolvedView) and fromCheckViewReferences(for CREATE / ALTER VIEW), matching the previous capability-flag rejection.5. Write path — DS v2 CREATE VIEW:
DataSourceV2StrategyroutesCreateView(ResolvedIdentifier(catalog, ident), …)toCreateV2ViewExec(catalog: ViewCatalog, …), which dispatches:createViewfor plain CREATE / IF NOT EXISTS;replaceViewfor CREATE OR REPLACE on an existing view (with aNoSuchViewException → createViewfallback for the race where the view disappears between probe and replace);createViewfor CREATE OR REPLACE on a non-existent view. Cross-type collision (CREATE VIEW over a non-view table in a mixed catalog) is rejected up front withEXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.ViewInfovia aV2ViewPreparationtrait reusing v1ViewHelperhelpers (aliasPlan,sqlConfigsToProps) to populate aViewInfo.Builderwith the current session's captured catalog/namespace and SQL configs. Cyclic-reference detection and auto-generated-alias rejection run once at analysis time inCheckViewReferences(see section 7).CreateViewlogical plan extendsAnalysisOnlyCommand(same shape asV2CreateTableAsSelectPlan) soHandleSpecialCommand.markAsAnalyzedcapturesreferredTempFunctionsfromAnalysisContext. The v1 rewriting path (ResolveSessionCatalog→CreateViewCommand) is unchanged.6. Write path — DS v2 ALTER VIEW ... AS:
AlterViewAslogical plan also extendsAnalysisOnlyCommandsoreferredTempFunctionsis captured for the non-session path.DataSourceV2StrategyroutesAlterViewAs(ResolvedPersistentView(catalog, ident, _), …)toAlterV2ViewExec(catalog: ViewCatalog, …), which callsreplaceView(the single atomic-swap entry point — no separate staging variant, since view REPLACE writes only metadata).V2AlterViewPreparationtrait (extendsV2ViewPreparation) callscatalog.loadView(ident)once and uses the result to preserve user TBLPROPERTIES, comment, collation, owner, and schema-binding mode when constructing the replacementViewInfo. Session-scoped fields (SQL configs, query column names) are re-emitted bybuildViewInfo()from the activeSparkSession, matching v1AlterViewAsCommand.alterPermanentView. A racing DDL between analysis and exec (the view dropped, or replaced with a non-view table in a mixed catalog) surfacesNoSuchViewException/EXPECT_VIEW_NOT_TABLErather than a stale-resolution error.ResolvedViewIdentifier.unapply(inResolveSessionCatalog) replaces itsassert(isSessionCatalog)with anif isSessionCatalogguard so non-sessionResolvedPersistentViewplans fall through to the v2 strategy instead of tripping the assertion.7. Post-analysis check —
CheckViewReferences:BaseSessionStateBuilder.extendedCheckRules. Rejects permanent views that reference temporary objects and rejects view bodies with auto-generated aliases for bothCreateViewandAlterViewAs(v2 paths). v1CreateViewCommand/AlterViewAsCommandkeep their existing exec-time safety net — Dataset-built commands can be constructed withisAnalyzed=truedirectly and bypass the analyzer's re-capture path.8. Listing —
SHOW TABLES/SHOW VIEWS:TableCatalog.listTablesreturns table identifiers only — views (if the catalog also implementsViewCatalog) are listed separately viaViewCatalog.listViews.listTableSummaries's default impl enumerates vialistTables+loadTableand returns one summary per table. This is an intentional v2 divergence from v1SHOW TABLES, which includes both tables and views; restoring the v1-parity output forSHOW TABLESon a v2 catalog (i.e. routing it through bothlistTablesandlistViews) is left as a follow-up so this PR's API surface stays narrowly scoped.SHOW VIEWSon a non-sessionViewCatalogis routed through a newShowViewsExecthat enumerates viaViewCatalog.listViews(namespace).ResolveSessionCatalog.ShowViewsskips (via guard) forViewCatalogcatalogs so they fall through to this strategy; non-session, non-ViewCatalogcatalogs still hit the existingMISSING_CATALOG_ABILITY.VIEWSrejection. v2 catalogs have no temp views, so theisTemporarycolumn is always false (mirroring v1, which only sets it true for local/global temp views).Why are the changes needed?
A v2
Tableis not always backed by a connector that implements read/write. Catalogs like HMS and Unity Catalog store only metadata and rely on Spark to interpret the table provider as a data source or to execute the view SQL. Previously the only way to achieve that was a hack aroundV1Table, which leaks private v1 types into v2 connectors (example: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala).Separately, v2 catalogs had no public way to handle CREATE VIEW or ALTER VIEW.
ResolveSessionCatalogrejected CREATE VIEW on any non-session catalog withMISSING_CATALOG_ABILITY.VIEWS, so third-party catalogs could not own view lifecycle at all. The newViewCataloginterface gives catalogs a clean view-shaped API (listViews/loadView/createView/replaceView/dropView) that is independent ofTableCatalog: a view-only catalog implements justViewCatalog(noTableCatalogboilerplate), a mixed catalog implements both, and the cross-type collision invariant is one extra existence check atcreateTable/createViewtime.Does this PR introduce any user-facing change?
Yes to connector developers:
TableCatalogimplementations can now return aMetadataOnlyTablefromloadTableto delegate reads to Spark.ViewCataloginterface to handle CREATE VIEW / CREATE OR REPLACE VIEW / CREATE VIEW IF NOT EXISTS / ALTER VIEW … AS / DROP VIEW / SHOW VIEWS, with view text, schema, captured current catalog+namespace, SQL configs, and temp-object-reference rejection handled the same way as for session-catalog views. View-only catalogs implement justViewCatalog; mixed catalogs implement bothTableCatalogandViewCatalog.No SQL-level or user-visible behavior change for existing deployments.
Remaining work (follow-up PRs)
This PR covers the core read path, CREATE VIEW (all shapes),
ALTER VIEW ... AS,DROP VIEW, andSHOW VIEWS. The following view-scoped plans for DS v2 catalogs are not yet supported and are tracked for follow-ups. Until the follow-ups land, each currently surfaces a cleanUNSUPPORTED_FEATURE.TABLE_OPERATIONerror (wired up inDataSourceV2Strategyand pinned by tests inDataSourceV2MetadataOnlyViewSuite), so users get a meaningful message rather than a generic planner failure:ALTER VIEW ... SET/UNSET TBLPROPERTIES— separate logical plans (SetViewProperties,UnsetViewProperties); need their ownDataSourceV2Strategycases backed by newTableChangerouting.ALTER VIEW ... RENAME TO—RenameTableat the logical level; needs v2 view awareness and the catalog-side rename semantics.ALTER VIEW ... WITH SCHEMA BINDING—AlterViewSchemaBindinglogical plan; needs the same treatment asALTER VIEW AS(AnalysisOnlyCommand shape + v2 exec).DESCRIBE/SHOW CREATE TABLE/SHOW TBLPROPERTIES/SHOW COLUMNSon v2 views — currently route throughResolvedViewIdentifierwhich only matches session-catalog views; the v2 equivalents need dedicated handling.SHOW TABLESv1-parity output (include views) on a v2 catalog —TableCatalog.listTablesnow intentionally returns tables only; restoring v1-parity in the SQL layer (routeSHOW TABLESthrough bothlistTablesandlistViews) is a separate piece of work.How was this patch tested?
New
DataSourceV2MetadataOnlyTableSuitecovering:TableCatalog): end-to-end CREATE,CREATE VIEW IF NOT EXISTS(no-op on existing), CREATE on existing (failure),CREATE OR REPLACE VIEW(replacement); user-specified columns (too-few / too-many);DEFAULT COLLATIONpropagation intoViewInfo.ViewCatalogend-to-end (noTableCatalogmixin): dedicatedTestingViewOnlyCatalogfixture exercises view-text expansion on read +ALTER VIEW … ASagainst a pureViewCatalog, ensuring the resolver'sloadViewfallback fires correctly whenloadTableis skipped.CREATE OR REPLACE VIEWagainst a non-view table entry withEXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE; rejects plainCREATE VIEWagainst a non-view table entry withTABLE_OR_VIEW_ALREADY_EXISTS;CREATE VIEW IF NOT EXISTSover a table is a no-op (matches v1SQLViewSuite"existing a table with the duplicate name when CREATE VIEW IF NOT EXISTS").TestingTableOnlyCatalogexercises the rejection on each path (expectedMISSING_CATALOG_ABILITY.VIEWS).PROP_OWNERandSCHEMA EVOLUTIONbinding mode across the ALTER (v1-parity); missing view surfaces asAnalysisException.ViewInfo.Builder.withCurrentCatalog(cat).withCurrentNamespace([db1, db2])->V1Table.toCatalogTable->CatalogTable.viewCatalogAndNamespacepreserves the full multi-part form, including namespace parts containing dots (which flow through structurally, not via any string encoding). A companion test pins the absent-branch: aViewInfobuilt withoutwithCurrentCatalogyields an emptyviewCatalogAndNamespace.SET TBLPROPERTIES,UNSET TBLPROPERTIES,WITH SCHEMA,RENAME TO,SHOW CREATE TABLE,SHOW TBLPROPERTIES,SHOW COLUMNS,DESCRIBE TABLEagainst a v2 view all surfaceUNSUPPORTED_FEATURE.TABLE_OPERATION;DESCRIBE TABLE ... COLUMNsurfaces a cleanAnalysisException. Pins the current failure mode so a future regression to a generic planner error is caught in the diff.SHOW TABLES/SHOW VIEWSon a v2 catalog:SHOW TABLESreturns tables only;SHOW VIEWSreturns views only (isTemporary=falsethroughout for v2 catalogs);SHOW VIEWS ... LIKEfilters on the view name;SHOW VIEWSagainst a non-ViewCatalogis rejected withMISSING_CATALOG_ABILITY.VIEWS.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)