[Spark] Relax type check in `SchemaUtils.normalizeColumnNamesInDataType` #4143

ala · 2025-02-11T15:14:13Z

Which Delta project/connector is this regarding?

Description

This PR modifies the nested filed name normalization code. This code is used when we write to a Delta table. It makes sure that the names of all the columns in the source data (meaning, the data that we are writing to the table) match the target table column names in terms of case. For example, if the source contains column address.CITY, while the table contains column address.city, the source column is renamed to match the table.

If the schema of the source data and the table data differs enough to interfere with the name normalization, we log delta.assertions.schemaNormalization.nonNestedTypeMismatch. However, this check is too restrictive, and as a result, we log this assertion too often. This PR relaxes the check. We now assume that any two atomic columns can be matched to each other.

How was this patch tested?

New tests in SchemaUtilsSuite.

Does this PR introduce any user-facing changes?

No.

c27kwan

lgtm! Don't forget to tag the PR title with [Spark] prefix. I would also remove the whitespace after SchemaUtils. in the PR title. :)

c27kwan

still lgtm. The whitespace changes improve readability.

c27kwan · 2025-02-12T13:31:03Z

actually, maybe we can leave out the whitespace for another time. :)

ala · 2025-02-13T09:50:39Z

All right. Back to the original minimal change then.

make type check more permissive

11e11c3

c27kwan approved these changes Feb 11, 2025

View reviewed changes

ala changed the title ~~Relax type check in SchemaUtils. normalizeColumnNamesInDataType~~ [Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType Feb 12, 2025

c27kwan approved these changes Feb 12, 2025

View reviewed changes

ala force-pushed the permissive_type_check branch from 682f931 to 39c79ef Compare February 12, 2025 12:57

ala force-pushed the permissive_type_check branch from cfd1f75 to 11e11c3 Compare February 13, 2025 09:49

allisonport-db merged commit 0b818bf into delta-io:master Feb 13, 2025
30 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Spark] Relax type check in `SchemaUtils.normalizeColumnNamesInDataType` #4143

[Spark] Relax type check in `SchemaUtils.normalizeColumnNamesInDataType` #4143

Uh oh!

ala commented Feb 11, 2025

Uh oh!

c27kwan left a comment

Uh oh!

c27kwan left a comment

Uh oh!

c27kwan commented Feb 12, 2025

Uh oh!

ala commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

[Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType #4143

[Spark] Relax type check in SchemaUtils.normalizeColumnNamesInDataType #4143

Uh oh!

Conversation

ala commented Feb 11, 2025

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

c27kwan left a comment

Choose a reason for hiding this comment

Uh oh!

c27kwan left a comment

Choose a reason for hiding this comment

Uh oh!

c27kwan commented Feb 12, 2025

Uh oh!

ala commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

[Spark] Relax type check in `SchemaUtils.normalizeColumnNamesInDataType` #4143

[Spark] Relax type check in `SchemaUtils.normalizeColumnNamesInDataType` #4143