Skip to content

fix: fall back to Spark for str_to_map when legacy regex split truncation is enabled#4627

Merged
andygrove merged 1 commit into
apache:mainfrom
andygrove:fix-str-to-map-legacy-truncate
Jun 12, 2026
Merged

fix: fall back to Spark for str_to_map when legacy regex split truncation is enabled#4627
andygrove merged 1 commit into
apache:mainfrom
andygrove:fix-str-to-map-legacy-truncate

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4477.

Rationale for this change

Spark 4.1.1 added the spark.sql.legacy.truncateForEmptyRegexSplit flag, which makes StringToMap truncate trailing empty entries from the split result when enabled. Comet's native str_to_map always behaves as if the flag were false, so with legacy truncation enabled it returns trailing empty entries that Spark would have dropped, producing incorrect results.

What changes are included in this PR?

CometStrToMap now reports Incompatible when spark.sql.legacy.truncateForEmptyRegexSplit=true, so the expression falls back to Spark unless the user explicitly opts in via spark.comet.expression.StringToMap.allowIncompatible=true. The default (non-legacy) behavior is unchanged: str_to_map continues to run natively. The config is read by string key with a false default so it resolves on Spark versions where the config is not registered.

How are these changes tested?

Added a SQL file test expressions/map/str_to_map_legacy_truncate.sql that sets the legacy flag and asserts that str_to_map falls back to Spark (for both literal and column inputs) while still producing Spark-matching results. The existing str_to_map.sql test confirms native execution is unaffected when the flag is off.

…tion is enabled

Spark 4.1.1 added spark.sql.legacy.truncateForEmptyRegexSplit, which makes
StringToMap truncate trailing empty entries from the split result. Comet's
native str_to_map always behaves as if the flag were false, producing
incorrect results in legacy mode.

Downgrade CometStrToMap to Incompatible when the flag is enabled so it falls
back to Spark unless the user explicitly opts in via allowIncompatible. The
config is read by string key so it resolves on Spark versions where it is not
registered.

Closes apache#4477

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

I was initially thinking should we just disable Comet in the first place if any of spark.sql.legacy* is true?

@andygrove andygrove merged commit af134cf into apache:main Jun 12, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] str_to_map does not honour Spark 4.1.1 legacy.truncateForEmptyRegexSplit

2 participants