PushDownWidenCast (PR #28038) currently runs at fragmentation time and substitutes
narrowVar → wideVar through each fragment. This eliminates the CAST Project at
runtime but creates an observability gap as @aaneja pointed out:
| Command |
CAST Project visible? |
EXPLAIN <query> (default TYPE LOGICAL) |
yes (pre-fragmentation) |
EXPLAIN ANALYZE <query> |
no (post-fragmentation) |
Users running both see the cast appear and disappear. Documented in the class
javadoc (EXPLAIN vs runtime divergence) as the price of running
per-fragment.
Expected Behavior or Use Case
EXPLAIN <query> and EXPLAIN ANALYZE <query> show the same plan structure (no
CAST Project in either).
- Wire format on every
Exchange stays narrow.
- No
FilterProject operator survives for elided casts.
- All existing
TestPushDownWidenCast* tests pass plus new EXPLAIN-parity tests.
Presto Component, Service, or Connector
planner
Possible Implementation
As @aaneja suggested, we could replace variable-type substitution with a source-side coercion annotation on
producer nodes:
public interface CoercionAwareNode {
/** narrowVar → target type to emit at runtime; variable's declared type is
unchanged */
Map<VariableReferenceExpression, Type> getCoercions();
}
A pre-fragmentation rule walks Project(... CAST(narrow AS T) ...) patterns over
any CoercionAwareNode and populates the map. A coercion-aware pruner then elides
the now-redundant CAST Project. The variable's declared type never changes, so the
substitution can't widen the producer fragment's outputLayout — the wire-format
invariant comes for free.
Phase 1 (this issue)
- Add
CoercionAwareNode interface in presto-spi/.../plan/.
- Add
coercions field to TableScanNode and RemoteSourceNode (constructor
+withCoercions, JSON, equals/hashCode, default empty so connectors compile
unchanged).
- New optimizer (or refactor of
PushDownWidenCast) that populates
coercions. Wire in PlanOptimizers.java after PushdownSubfields, same gates
(push_down_widen_cast_enabled + native_execution_enabled).
- Coercion-aware pruner that elides
wide := CAST(narrow AS T) Projects when
the producer's coercion map covers them. Add to the cleanup pass that already
includes pruneIdentityProjects.
- Drop
applyPushDownWidenCastPerFragment from PlanFragmenterUtils once the
new path covers all cases. Keep pruneIdentityProjects — it's independently
useful.
- Velox: teach
HiveDataSource and the Exchange page source to consult
coercions (paired issue in prestissimo).
- Audit out-of-tree-friendly connectors (Iceberg, JDBC, …) for custom
TableScanNode construction.
Phase 2 (follow-up issues)
Same pattern, different producer operators:
AggregationNode — emit aggregation result at a wider type (also unblocks
re-enabling testCastAboveExchangePushedIntoRemoteSourceConsumerSide, currently
removed).
WindowNode, JoinNode, UnnestNode, ValuesNode.
Each phase needs paired prestissimo work to teach the operator to honor the
coercion map.
Example Screenshots (if appropriate):
N/A
Context
There's an observability gap in the current implementation:
A user who runs EXPLAIN q and then EXPLAIN ANALYZE q sees the CAST appear in
the first and disappear in the second. The class javadoc on PushDownWidenCast
(<h3>EXPLAIN vs runtime divergence</h3>) documents this as the price of running
per-fragment, but the cleaner long-term shape is to make the optimization visible
in the LOGICAL plan via an annotation while still keeping the wire format narrow.
The annotation model also generalizes: any operator that produces a value through
some computation can carry a coercion telling the runtime "emit this output at a
different (compatible) type than the operator's natural output type". That unlocks
similar push-downs against Aggregation, Window, Join, etc. without inventing
a new pass per operator.
PushDownWidenCast(PR #28038) currently runs at fragmentation time and substitutesnarrowVar → wideVarthrough each fragment. This eliminates the CAST Project atruntime but creates an observability gap as @aaneja pointed out:
EXPLAIN <query>(defaultTYPE LOGICAL)EXPLAIN ANALYZE <query>Users running both see the cast appear and disappear. Documented in the class
javadoc (
EXPLAIN vs runtime divergence) as the price of runningper-fragment.
Expected Behavior or Use Case
EXPLAIN <query>andEXPLAIN ANALYZE <query>show the same plan structure (noCAST Project in either).
Exchangestays narrow.FilterProjectoperator survives for elided casts.TestPushDownWidenCast*tests pass plus new EXPLAIN-parity tests.Presto Component, Service, or Connector
planner
Possible Implementation
As @aaneja suggested, we could replace variable-type substitution with a source-side coercion annotation on
producer nodes:
A pre-fragmentation rule walks
Project(... CAST(narrow AS T) ...)patterns overany
CoercionAwareNodeand populates the map. A coercion-aware pruner then elidesthe now-redundant CAST Project. The variable's declared type never changes, so the
substitution can't widen the producer fragment's
outputLayout— the wire-formatinvariant comes for free.
Phase 1 (this issue)
CoercionAwareNodeinterface inpresto-spi/.../plan/.coercionsfield toTableScanNodeandRemoteSourceNode(constructor+
withCoercions, JSON,equals/hashCode, default empty so connectors compileunchanged).
PushDownWidenCast) that populatescoercions. Wire inPlanOptimizers.javaafterPushdownSubfields, same gates(
push_down_widen_cast_enabled+native_execution_enabled).wide := CAST(narrow AS T)Projects whenthe producer's coercion map covers them. Add to the cleanup pass that already
includes
pruneIdentityProjects.applyPushDownWidenCastPerFragmentfromPlanFragmenterUtilsonce thenew path covers all cases. Keep
pruneIdentityProjects— it's independentlyuseful.
HiveDataSourceand the Exchange page source to consultcoercions(paired issue in prestissimo).TableScanNodeconstruction.Phase 2 (follow-up issues)
Same pattern, different producer operators:
AggregationNode— emit aggregation result at a wider type (also unblocksre-enabling
testCastAboveExchangePushedIntoRemoteSourceConsumerSide, currentlyremoved).
WindowNode,JoinNode,UnnestNode,ValuesNode.Each phase needs paired prestissimo work to teach the operator to honor the
coercion map.
Example Screenshots (if appropriate):
N/A
Context
There's an observability gap in the current implementation:
A user who runs
EXPLAIN qand thenEXPLAIN ANALYZE qsees the CAST appear inthe first and disappear in the second. The class javadoc on
PushDownWidenCast(
<h3>EXPLAIN vs runtime divergence</h3>) documents this as the price of runningper-fragment, but the cleaner long-term shape is to make the optimization visible
in the LOGICAL plan via an annotation while still keeping the wire format narrow.
The annotation model also generalizes: any operator that produces a value through
some computation can carry a coercion telling the runtime "emit this output at a
different (compatible) type than the operator's natural output type". That unlocks
similar push-downs against
Aggregation,Window,Join, etc. without inventinga new pass per operator.