[SPARK-56626][SQL] Introduce SupportsReportCatalogStatistics mixin for Table#55546
Open
anton5798 wants to merge 2 commits intoapache:masterfrom
Open
[SPARK-56626][SQL] Introduce SupportsReportCatalogStatistics mixin for Table#55546anton5798 wants to merge 2 commits intoapache:masterfrom
anton5798 wants to merge 2 commits intoapache:masterfrom
Conversation
…r Table Adds a Table mixin that lets DSv2 connectors expose table-level (pre-filter, pre-pruning) statistics without going through a Scan. DataSourceV2ScanRelation.computeStats prefers the mixin when the table implements it and reports numRows; otherwise it falls through to the existing Scan-based path unchanged.
yyanyy
reviewed
Apr 24, 2026
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public interface SupportsReportCatalogStatistics extends Table { |
Member
There was a problem hiding this comment.
We need usage of this new trait. E.g. DataSourceV2Relation.
Also, there should be test cases for the changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Adds a new
@Evolvingmix-in onorg.apache.spark.sql.connector.catalog.Table:The returned stats describe the full table — pre-filter, pre-pruning, scan-independent — analogous to DSv1's
CatalogStatistics. Distinct fromScan-levelSupportsReportStatistics, which reports post-pushdown stats; a table may implement both.Wires it into
DataSourceV2ScanRelation.computeStatsadditively: if the table implements the mixin andcatalogStatistics().numRows()is present, those stats are used viatransformV2Stats; otherwise control falls through to the existingScan.estimateStatistics()path unchanged. ThenumRows-present guard ensures the fallthrough when the mixin has nothing meaningful to report.No built-in
Tableimplements the mixin in this PR.Why are the changes needed?
DSv2 currently conflates catalog-level and scan-level statistics on
Scan.estimateStatistics(), so reading table-wide stats requires building aScanBuilder— which, depending on the connector, can trigger file listing,planScanFiles, or remote metadata round-trips. This mixin is a scan-independent accessor with a stable, pre-pushdown contract, suitable for CBO decisions on the unfiltered relation.Does this PR introduce any user-facing change?
No. The interface is new; no built-in table implements it, so existing queries observe identical behavior.
How was this patch tested?
No new tests. The
computeStatschange is a guarded prefix on the existing implementation — with no implementor it degenerates to the unchanged scan path, covered by existing DSv2 suites. A follow-up adding an in-tree implementor (e.g.V1Table) will add targeted coverage.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)