Skip to content

[SPARK-56626][SQL] Introduce SupportsReportCatalogStatistics mixin for Table#55546

Open
anton5798 wants to merge 2 commits intoapache:masterfrom
anton5798:alykov/dsv2-catalog-stats-mixin
Open

[SPARK-56626][SQL] Introduce SupportsReportCatalogStatistics mixin for Table#55546
anton5798 wants to merge 2 commits intoapache:masterfrom
anton5798:alykov/dsv2-catalog-stats-mixin

Conversation

@anton5798
Copy link
Copy Markdown
Contributor

@anton5798 anton5798 commented Apr 24, 2026

What changes were proposed in this pull request?

Adds a new @Evolving mix-in on org.apache.spark.sql.connector.catalog.Table:

public interface SupportsReportCatalogStatistics extends Table {
  Statistics catalogStatistics();
}

The returned stats describe the full table — pre-filter, pre-pruning, scan-independent — analogous to DSv1's CatalogStatistics. Distinct from Scan-level SupportsReportStatistics, which reports post-pushdown stats; a table may implement both.

Wires it into DataSourceV2ScanRelation.computeStats additively: if the table implements the mixin and catalogStatistics().numRows() is present, those stats are used via transformV2Stats; otherwise control falls through to the existing Scan.estimateStatistics() path unchanged. The numRows-present guard ensures the fallthrough when the mixin has nothing meaningful to report.

No built-in Table implements the mixin in this PR.

Why are the changes needed?

DSv2 currently conflates catalog-level and scan-level statistics on Scan.estimateStatistics(), so reading table-wide stats requires building a ScanBuilder — which, depending on the connector, can trigger file listing, planScanFiles, or remote metadata round-trips. This mixin is a scan-independent accessor with a stable, pre-pushdown contract, suitable for CBO decisions on the unfiltered relation.

Does this PR introduce any user-facing change?

No. The interface is new; no built-in table implements it, so existing queries observe identical behavior.

How was this patch tested?

No new tests. The computeStats change is a guarded prefix on the existing implementation — with no implementor it degenerates to the unchanged scan path, covered by existing DSv2 suites. A follow-up adding an in-tree implementor (e.g. V1Table) will add targeted coverage.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

…r Table

Adds a Table mixin that lets DSv2 connectors expose table-level
(pre-filter, pre-pruning) statistics without going through a Scan.
DataSourceV2ScanRelation.computeStats prefers the mixin when the
table implements it and reports numRows; otherwise it falls through
to the existing Scan-based path unchanged.
@anton5798 anton5798 changed the title [SPARK-XXXXX][SQL] Introduce SupportsReportCatalogStatistics mixin for Table [SPARK-56626][SQL] Introduce SupportsReportCatalogStatistics mixin for Table Apr 24, 2026
@anton5798 anton5798 requested a review from yyanyy April 24, 2026 20:20
* @since 4.2.0
*/
@Evolving
public interface SupportsReportCatalogStatistics extends Table {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need usage of this new trait. E.g. DataSourceV2Relation.

Also, there should be test cases for the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants