fix: fall back to Spark for str_to_map when legacy regex split truncation is enabled#4627
Merged
Merged
Conversation
…tion is enabled Spark 4.1.1 added spark.sql.legacy.truncateForEmptyRegexSplit, which makes StringToMap truncate trailing empty entries from the split result. Comet's native str_to_map always behaves as if the flag were false, producing incorrect results in legacy mode. Downgrade CometStrToMap to Incompatible when the flag is enabled so it falls back to Spark unless the user explicitly opts in via allowIncompatible. The config is read by string key so it resolves on Spark versions where it is not registered. Closes apache#4477
comphead
approved these changes
Jun 12, 2026
comphead
left a comment
Contributor
There was a problem hiding this comment.
Thanks @andygrove
I was initially thinking should we just disable Comet in the first place if any of spark.sql.legacy* is true?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4477.
Rationale for this change
Spark 4.1.1 added the
spark.sql.legacy.truncateForEmptyRegexSplitflag, which makesStringToMaptruncate trailing empty entries from the split result when enabled. Comet's nativestr_to_mapalways behaves as if the flag werefalse, so with legacy truncation enabled it returns trailing empty entries that Spark would have dropped, producing incorrect results.What changes are included in this PR?
CometStrToMapnow reportsIncompatiblewhenspark.sql.legacy.truncateForEmptyRegexSplit=true, so the expression falls back to Spark unless the user explicitly opts in viaspark.comet.expression.StringToMap.allowIncompatible=true. The default (non-legacy) behavior is unchanged:str_to_mapcontinues to run natively. The config is read by string key with afalsedefault so it resolves on Spark versions where the config is not registered.How are these changes tested?
Added a SQL file test
expressions/map/str_to_map_legacy_truncate.sqlthat sets the legacy flag and asserts thatstr_to_mapfalls back to Spark (for both literal and column inputs) while still producing Spark-matching results. The existingstr_to_map.sqltest confirms native execution is unaffected when the flag is off.