Skip to content

Refactor PushDownWidenCast to source-side coercion annotation, extensible to other operators #28077

Description

@yingsu00

PushDownWidenCast (PR #28038) currently runs at fragmentation time and substitutes
narrowVar → wideVar through each fragment. This eliminates the CAST Project at
runtime but creates an observability gap as @aaneja pointed out:

Command CAST Project visible?
EXPLAIN <query> (default TYPE LOGICAL) yes (pre-fragmentation)
EXPLAIN ANALYZE <query> no (post-fragmentation)

Users running both see the cast appear and disappear. Documented in the class
javadoc (EXPLAIN vs runtime divergence) as the price of running
per-fragment.

Expected Behavior or Use Case

  • EXPLAIN <query> and EXPLAIN ANALYZE <query> show the same plan structure (no
    CAST Project in either).
  • Wire format on every Exchange stays narrow.
  • No FilterProject operator survives for elided casts.
  • All existing TestPushDownWidenCast* tests pass plus new EXPLAIN-parity tests.

Presto Component, Service, or Connector

planner

Possible Implementation

As @aaneja suggested, we could replace variable-type substitution with a source-side coercion annotation on
producer nodes:

public interface CoercionAwareNode {
    /** narrowVar → target type to emit at runtime; variable's declared type is
unchanged */
    Map<VariableReferenceExpression, Type> getCoercions();
}

A pre-fragmentation rule walks Project(... CAST(narrow AS T) ...) patterns over
any CoercionAwareNode and populates the map. A coercion-aware pruner then elides
the now-redundant CAST Project. The variable's declared type never changes, so the
substitution can't widen the producer fragment's outputLayout — the wire-format
invariant comes for free
.

Phase 1 (this issue)

  • Add CoercionAwareNode interface in presto-spi/.../plan/.
  • Add coercions field to TableScanNode and RemoteSourceNode (constructor
    +withCoercions, JSON, equals/hashCode, default empty so connectors compile
    unchanged).
  • New optimizer (or refactor of PushDownWidenCast) that populates
    coercions. Wire in PlanOptimizers.java after PushdownSubfields, same gates
    (push_down_widen_cast_enabled + native_execution_enabled).
  • Coercion-aware pruner that elides wide := CAST(narrow AS T) Projects when
    the producer's coercion map covers them. Add to the cleanup pass that already
    includes pruneIdentityProjects.
  • Drop applyPushDownWidenCastPerFragment from PlanFragmenterUtils once the
    new path covers all cases. Keep pruneIdentityProjects — it's independently
    useful.
  • Velox: teach HiveDataSource and the Exchange page source to consult
    coercions (paired issue in prestissimo).
  • Audit out-of-tree-friendly connectors (Iceberg, JDBC, …) for custom
    TableScanNode construction.

Phase 2 (follow-up issues)

Same pattern, different producer operators:

  • AggregationNode — emit aggregation result at a wider type (also unblocks
    re-enabling testCastAboveExchangePushedIntoRemoteSourceConsumerSide, currently
    removed).
  • WindowNode, JoinNode, UnnestNode, ValuesNode.

Each phase needs paired prestissimo work to teach the operator to honor the
coercion map.

Example Screenshots (if appropriate):

N/A

Context

There's an observability gap in the current implementation:

A user who runs EXPLAIN q and then EXPLAIN ANALYZE q sees the CAST appear in
the first and disappear in the second. The class javadoc on PushDownWidenCast
(<h3>EXPLAIN vs runtime divergence</h3>) documents this as the price of running
per-fragment, but the cleaner long-term shape is to make the optimization visible
in the LOGICAL plan via an annotation while still keeping the wire format narrow.

The annotation model also generalizes: any operator that produces a value through
some computation can carry a coercion telling the runtime "emit this output at a
different (compatible) type than the operator's natural output type". That unlocks
similar push-downs against Aggregation, Window, Join, etc. without inventing
a new pass per operator.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions