-
Notifications
You must be signed in to change notification settings - Fork 282
[AI-AUDIT] Introduce iterator API for pandas grouped agg UDF (SQL_GROUPED_AGG_PANDAS_ITER_UDF) #14485
Description
Audit: SPARK-53616
Commit Info
- Hash: f601aa65c0dc48a28afd1277a1aafe1b784f4eb6
- Date: 2025-12-12
- Author: Yicong Huang
- Subject: [SPARK-53616][PYTHON] Introduce iterator API for pandas grouped agg UDF
- Files Changed: 12
Classification
NO_IMPACT
Reason: New Python eval type for pandas UDF, falls back to CPU. Duplicate of existing issue.
Confidence: HIGH
Analysis
Summary
Introduces an iterator-based API for pandas grouped aggregation UDFs (SQL_GROUPED_AGG_PANDAS_ITER_UDF). Adds the new eval type to ArrowAggregatePythonExec.supportedEvalTypes. This enables batch-by-batch processing for memory efficiency.
Changed Components
ArrowAggregatePythonExec.scala: AddedSQL_GROUPED_AGG_PANDAS_ITER_UDFto supported eval types (2-line change)UserDefinedPythonFunction.scala: New eval type constant- Python worker files: Iterator support implementation
spark-rapids Impact
No impact on existing shims. The new SQL_GROUPED_AGG_PANDAS_ITER_UDF eval type is a Spark 4.2 addition. When the spark420 shim is added, GpuArrowAggregatePythonExecMeta will need a guard against this eval type — same pattern as SPARK-53615 / SQL_GROUPED_AGG_ARROW_ITER_UDF.
References Found
GpuOverrides.scala: ArrowAggregatePythonExec registrationGpuAggregateInPandasExec.scala: GPU implementationGpuArrowAggregatePythonExecMeta.scala(spark400db173 shim): Meta class for GPU overrideAggregateInPandasExecShims.scala: Shim layer
Recommended Actions
- No new issue needed — covered by existing issue
Existing Issue
Similar to spark-rapids#14432 — that issue covers SPARK-53615 (SQL_GROUPED_AGG_ARROW_ITER_UDF) and the same guard pattern applies to this commit's SQL_GROUPED_AGG_PANDAS_ITER_UDF. Both need a willNotWorkOnGpu guard in GpuArrowAggregatePythonExecMeta for the spark420 shim.
Generated Test
N/A
Notes
SPARK-53616 (pandas iterator) is the companion to SPARK-53615 (arrow iterator). Both add new iterator-based eval types for grouped aggregation UDFs. The fix for both is identical: guard in the meta class when adding the spark420 shim.
Audited by: Cursor AI
Date: 2026-03-30