[WIP][SPARK-56608][PYTHON] Migrate grouped/cogrouped map Arrow UDF verify checks into enforce_schema by Yicong-Huang · Pull Request #55530 · apache/spark

Yicong-Huang · 2026-04-24T07:40:58Z

What changes were proposed in this pull request?

Make ArrowBatchTransformer.enforce_schema the single entry point for Arrow UDF output schema enforcement, and migrate the grouped/cogrouped map Arrow UDF paths (SQL_COGROUPED_MAP_ARROW_UDF, SQL_GROUPED_MAP_ARROW_UDF, SQL_GROUPED_MAP_ARROW_ITER_UDF) to use it instead of the separate verify_arrow_table / verify_arrow_batch helpers plus manual reorder.

enforce_schema is generalized to:

Accept both pa.RecordBatch and pa.Table.
Add reorder_by_name: bool = True: name-based matching + reorder + rename (with RESULT_COLUMN_NAMES_MISMATCH) vs positional matching preserving input names.
Collect all missing/extra/type mismatches before raising (previously raised on first).
Raise PySparkRuntimeError with existing errorClasses (RESULT_COLUMN_NAMES_MISMATCH / RESULT_COLUMN_TYPES_MISMATCH / RESULT_COLUMN_SCHEMA_MISMATCH) instead of bare-string PySparkTypeError, matching what verify_arrow_result already did.

verify_arrow_table and verify_arrow_batch are deleted; their pa.Table / pa.RecordBatch instance check is inlined at the call site. verify_arrow_result remains only for SQL_ARROW_TABLE_UDF (out of scope for this PR — no benchmark yet).

Why are the changes needed?

Part of SPARK-55388 (Refactor PythonEvalType processing logic). Today output validation is split between verify_arrow_result (friendly errorClass errors) in worker.py and enforce_schema (bare f-string errors) in conversion.py. Consolidating behind enforce_schema gives one code path and one error convention, and drops the redundant "verify + manual reorder" in grouped-map paths.

Does this PR introduce any user-facing change?

Yes, minor: error messages for SQL_ARROW_UDTF (the pre-existing enforce_schema consumer) switch from bare f-strings to the same friendly errorClass-templated format already used by other Arrow UDFs. Error-class names and message formats for grouped/cogrouped map Arrow UDFs are unchanged.

How was this patch tested?

Existing integration tests in test_arrow_grouped_map.py / test_arrow_cogrouped_map.py already assert the errorClass-templated error format and pass unchanged.
Unit tests in test_conversion.py updated and extended for the new reorder_by_name, pa.Table input, and count-mismatch paths.
test_arrow_udtf.py regex updated for the two SQL_ARROW_UDTF error tests.
ASV benchmarks on CogroupedMapArrowUDFTimeBench, GroupedMapArrowUDFTimeBench, and GroupedMapArrowIterUDFTimeBench (repeat=3) vs upstream master: 52 parameter combinations, 0 regressions at -f 1.05.

Was this patch authored or co-authored using generative AI tooling?

No.

…/cogrouped map Arrow UDF paths

refactor: migrate verify_arrow_result into enforce_schema for grouped…

39dd966

…/cogrouped map Arrow UDF paths

Yicong-Huang marked this pull request as draft April 24, 2026 07:59

Yicong-Huang changed the title ~~[SPARK-56608][PYTHON] Migrate grouped/cogrouped map Arrow UDF verify checks into enforce_schema~~ [WIP][SPARK-56608][PYTHON] Migrate grouped/cogrouped map Arrow UDF verify checks into enforce_schema Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-56608][PYTHON] Migrate grouped/cogrouped map Arrow UDF verify checks into enforce_schema#55530

[WIP][SPARK-56608][PYTHON] Migrate grouped/cogrouped map Arrow UDF verify checks into enforce_schema#55530
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-56608

Yicong-Huang commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yicong-Huang commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Yicong-Huang commented Apr 24, 2026 •

edited

Loading