Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType #4143

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ala
Copy link
Contributor

@ala ala commented Feb 11, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR modifies the nested filed name normalization code. This code is used when we write to a Delta table. It makes sure that the names of all the columns in the source data (meaning, the data that we are writing to the table) match the target table column names in terms of case. For example, if the source contains column address.CITY, while the table contains column address.city, the source column is renamed to match the table.

If the schema of the source data and the table data differs enough to interfere with the name normalization, we log delta.assertions.schemaNormalization.nonNestedTypeMismatch. However, this check is too restrictive, and as a result, we log this assertion too often. This PR relaxes the check. We now assume that any two atomic columns can be matched to each other.

How was this patch tested?

New tests in SchemaUtilsSuite.

Does this PR introduce any user-facing changes?

No.

Copy link
Contributor

@c27kwan c27kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Don't forget to tag the PR title with [Spark] prefix. I would also remove the whitespace after SchemaUtils. in the PR title. :)

@ala ala changed the title Relax type check in SchemaUtils. normalizeColumnNamesInDataType [Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants