Skip to content

[AI-AUDIT] Introduce iterator API for pandas grouped agg UDF (SQL_GROUPED_AGG_PANDAS_ITER_UDF) #14485

@abellina

Description

@abellina

Audit: SPARK-53616

Commit Info

  • Hash: f601aa65c0dc48a28afd1277a1aafe1b784f4eb6
  • Date: 2025-12-12
  • Author: Yicong Huang
  • Subject: [SPARK-53616][PYTHON] Introduce iterator API for pandas grouped agg UDF
  • Files Changed: 12

Classification

NO_IMPACT

Reason: New Python eval type for pandas UDF, falls back to CPU. Duplicate of existing issue.
Confidence: HIGH

Analysis

Summary

Introduces an iterator-based API for pandas grouped aggregation UDFs (SQL_GROUPED_AGG_PANDAS_ITER_UDF). Adds the new eval type to ArrowAggregatePythonExec.supportedEvalTypes. This enables batch-by-batch processing for memory efficiency.

Changed Components

  • ArrowAggregatePythonExec.scala: Added SQL_GROUPED_AGG_PANDAS_ITER_UDF to supported eval types (2-line change)
  • UserDefinedPythonFunction.scala: New eval type constant
  • Python worker files: Iterator support implementation

spark-rapids Impact

No impact on existing shims. The new SQL_GROUPED_AGG_PANDAS_ITER_UDF eval type is a Spark 4.2 addition. When the spark420 shim is added, GpuArrowAggregatePythonExecMeta will need a guard against this eval type — same pattern as SPARK-53615 / SQL_GROUPED_AGG_ARROW_ITER_UDF.

References Found

  • GpuOverrides.scala: ArrowAggregatePythonExec registration
  • GpuAggregateInPandasExec.scala: GPU implementation
  • GpuArrowAggregatePythonExecMeta.scala (spark400db173 shim): Meta class for GPU override
  • AggregateInPandasExecShims.scala: Shim layer

Recommended Actions

  • No new issue needed — covered by existing issue

Existing Issue

Similar to spark-rapids#14432 — that issue covers SPARK-53615 (SQL_GROUPED_AGG_ARROW_ITER_UDF) and the same guard pattern applies to this commit's SQL_GROUPED_AGG_PANDAS_ITER_UDF. Both need a willNotWorkOnGpu guard in GpuArrowAggregatePythonExecMeta for the spark420 shim.

Generated Test

N/A

Notes

SPARK-53616 (pandas iterator) is the companion to SPARK-53615 (arrow iterator). Both add new iterator-based eval types for grouped aggregation UDFs. The fix for both is identical: guard in the meta class when adding the spark420 shim.


Audited by: Cursor AI
Date: 2026-03-30

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions