-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[Kernel] Add ability to store type changes on StructField #4519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
313f4b2
1b2676b
fff4355
2f32731
89eadb6
3f3ba31
0c067e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/* | ||
* Copyright (2023) The Delta Lake Project Authors. | ||
* Copyright (2025) The Delta Lake Project Authors. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
|
@@ -17,8 +17,10 @@ | |
package io.delta.kernel.types; | ||
|
||
import io.delta.kernel.annotation.Evolving; | ||
import io.delta.kernel.exceptions.KernelException; | ||
import io.delta.kernel.internal.util.Tuple2; | ||
import java.util.ArrayList; | ||
import java.util.Collections; | ||
import java.util.List; | ||
import java.util.Objects; | ||
|
||
|
@@ -48,7 +50,8 @@ public class StructField { | |
METADATA_ROW_INDEX_COLUMN_NAME, | ||
LongType.LONG, | ||
false, | ||
FieldMetadata.builder().putBoolean(IS_METADATA_COLUMN_KEY, true).build()); | ||
FieldMetadata.builder().putBoolean(IS_METADATA_COLUMN_KEY, true).build(), | ||
Collections.emptyList()); | ||
|
||
public static final String COLLATIONS_METADATA_KEY = "__COLLATIONS"; | ||
|
||
|
@@ -60,19 +63,36 @@ public class StructField { | |
private final DataType dataType; | ||
private final boolean nullable; | ||
private final FieldMetadata metadata; | ||
private final List<TypeChange> typeChanges; | ||
|
||
public StructField(String name, DataType dataType, boolean nullable) { | ||
this(name, dataType, nullable, FieldMetadata.empty()); | ||
} | ||
|
||
public StructField(String name, DataType dataType, boolean nullable, FieldMetadata metadata) { | ||
this(name, dataType, nullable, metadata, Collections.emptyList()); | ||
} | ||
|
||
public StructField( | ||
String name, | ||
DataType dataType, | ||
boolean nullable, | ||
FieldMetadata metadata, | ||
List<TypeChange> typeChanges) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm.. we need a builder for StructField soon. (not related to this PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep :) |
||
this.name = name; | ||
this.dataType = dataType; | ||
this.nullable = nullable; | ||
this.typeChanges = typeChanges == null ? Collections.emptyList() : typeChanges; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO we should discourage There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 probably can either (1) requireNonNull if we want this to always be present but empty for no type widening or (2) make it optional |
||
|
||
FieldMetadata collationMetadata = fetchCollationMetadata(); | ||
this.metadata = | ||
new FieldMetadata.Builder().fromMetadata(metadata).fromMetadata(collationMetadata).build(); | ||
if (!this.typeChanges.isEmpty() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the map contains a struct either as key or value, is the type change allowed in this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the idea of this assert is it enforces types changes are only recorded on leaf nodes of the schema. We could also assert this in the other way I think that DataType is a primitive type. I think this is a core design question that we should answer which option we want to pursue here.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the question is which pattern we want to follow (and then migrate all of these common elements to the common pattern). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @allisonport-db this is somewhat related to the bug on populating nested field IDs, not sure if you have thoughts on which approach you prefer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a little confused because I think this is the level where we would want the type changes for map/array? From the protocol https://github.com/delta-io/delta/blob/master/PROTOCOL.md#type-change-metadata There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We wouldn't want to support TypeChanges on their element or key/map types though. I think I maybe understand what you're saying however, we could choose to store them at either level and then serialize them at this level? I honestly find the use of StructField for the key/value/element really confusing since there is no such thing as a key's fieldMetadata (it does not exist!). For that reason, I'd prefer to store it here at this level. And honestly I've been meaning to take a look at that code and see if there's a better way to make sure there isn't confusing when using StructField in those data types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then we need to add the additional "fieldPath" to the TypeChange class right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also feel free to disagree, I think choosing where to store this info (and other stuff like the nested ids) is an important discussion to be had |
||
&& (dataType instanceof MapType | ||
|| dataType instanceof StructType | ||
|| dataType instanceof ArrayType)) { | ||
throw new KernelException("Type changes are not supported on nested types."); | ||
} | ||
} | ||
|
||
/** @return the name of this field */ | ||
|
@@ -95,6 +115,15 @@ public boolean isNullable() { | |
return nullable; | ||
} | ||
|
||
/** | ||
* Returns the list of type changes for this field. A field can go through multiple type changes | ||
* (e.g. {@code int->long->decimal}). Changes are ordered from least recent to most recent in the | ||
* list (index 0 is the oldest change). | ||
*/ | ||
public List<TypeChange> getTypeChanges() { | ||
return Collections.unmodifiableList(typeChanges); | ||
} | ||
|
||
public boolean isMetadataColumn() { | ||
return metadata.contains(IS_METADATA_COLUMN_KEY) | ||
&& (boolean) metadata.get(IS_METADATA_COLUMN_KEY); | ||
|
@@ -107,7 +136,8 @@ public boolean isDataColumn() { | |
@Override | ||
public String toString() { | ||
return String.format( | ||
"StructField(name=%s,type=%s,nullable=%s,metadata=%s)", name, dataType, nullable, metadata); | ||
"StructField(name=%s,type=%s,nullable=%s,metadata=%s,typeChanges=%s)", | ||
name, dataType, nullable, metadata, typeChanges); | ||
} | ||
|
||
@Override | ||
|
@@ -122,16 +152,27 @@ public boolean equals(Object o) { | |
return nullable == that.nullable | ||
&& name.equals(that.name) | ||
&& dataType.equals(that.dataType) | ||
&& metadata.equals(that.metadata); | ||
&& metadata.equals(that.metadata) | ||
&& Objects.equals(typeChanges, that.typeChanges); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(name, dataType, nullable, metadata); | ||
return Objects.hash(name, dataType, nullable, metadata, typeChanges); | ||
} | ||
|
||
public StructField withNewMetadata(FieldMetadata metadata) { | ||
return new StructField(name, dataType, nullable, metadata); | ||
return new StructField(name, dataType, nullable, metadata, typeChanges); | ||
} | ||
|
||
/** | ||
* Creates a copy of this StructField with the specified type changes. | ||
* | ||
* @param typeChanges The list of type changes to set | ||
* @return A new StructField with the same properties but with the specified type changes | ||
*/ | ||
public StructField withTypeChanges(List<TypeChange> typeChanges) { | ||
return new StructField(name, dataType, nullable, metadata, typeChanges); | ||
} | ||
|
||
private List<Tuple2<String, String>> getNestedCollatedFields(DataType parent, String path) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
/* | ||
* Copyright (2025) The Delta Lake Project Authors. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.delta.kernel.types; | ||
|
||
import java.util.Objects; | ||
|
||
/** | ||
* Represents a type change for a field, containing the original and new primitive types. | ||
* | ||
* <p>Type changes are actually persisted in metadata attached to StructFields but the rules for | ||
* where the metadata is attached depend on if the change is for nested arrays/maps or primitive | ||
* types. | ||
*/ | ||
public class TypeChange { | ||
private final BasePrimitiveType from; | ||
private final BasePrimitiveType to; | ||
|
||
public TypeChange(BasePrimitiveType from, BasePrimitiveType to) { | ||
this.from = Objects.requireNonNull(from, "from type cannot be null"); | ||
this.to = Objects.requireNonNull(to, "to type cannot be null"); | ||
} | ||
|
||
public BasePrimitiveType getFrom() { | ||
return from; | ||
} | ||
|
||
public BasePrimitiveType getTo() { | ||
return to; | ||
} | ||
|
||
@Override | ||
public boolean equals(Object o) { | ||
if (this == o) { | ||
return true; | ||
} | ||
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
TypeChange that = (TypeChange) o; | ||
return Objects.equals(from, that.from) && Objects.equals(to, that.to); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(from, to); | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return String.format("TypeChange(from=%s,to=%s)", from, to); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,176 @@ | ||||||
/* | ||||||
* Copyright (2023) The Delta Lake Project Authors. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* | ||||||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||||||
* you may not use this file except in compliance with the License. | ||||||
* You may obtain a copy of the License at | ||||||
* | ||||||
* http://www.apache.org/licenses/LICENSE-2.0 | ||||||
* | ||||||
* Unless required by applicable law or agreed to in writing, software | ||||||
* distributed under the License is distributed on an "AS IS" BASIS, | ||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
* See the License for the specific language governing permissions and | ||||||
* limitations under the License. | ||||||
*/ | ||||||
|
||||||
package io.delta.kernel.types | ||||||
|
||||||
import java.util.ArrayList | ||||||
|
||||||
import io.delta.kernel.exceptions.KernelException | ||||||
import io.delta.kernel.types.StructField.COLLATIONS_METADATA_KEY | ||||||
|
||||||
import collection.JavaConverters._ | ||||||
import org.scalatest.funsuite.AnyFunSuite | ||||||
|
||||||
/** | ||||||
* Test suite for [[StructField]] class. | ||||||
*/ | ||||||
class StructFieldSuite extends AnyFunSuite { | ||||||
|
||||||
// Test equality and hashcode | ||||||
test("equality and hashcode") { | ||||||
val field1 = new StructField( | ||||||
"field", | ||||||
LongType.LONG, | ||||||
true, | ||||||
FieldMetadata.empty(), | ||||||
Seq(new TypeChange(IntegerType.INTEGER, LongType.LONG)).asJava) | ||||||
val field2 = new StructField( | ||||||
"field", | ||||||
LongType.LONG, | ||||||
true, | ||||||
FieldMetadata.empty(), | ||||||
Seq(new TypeChange(IntegerType.INTEGER, LongType.LONG)).asJava) | ||||||
val field3 = new StructField("differentField", IntegerType.INTEGER, true) | ||||||
val field4 = new StructField("field", StringType.STRING, true) | ||||||
val field5 = new StructField("field", IntegerType.INTEGER, false) | ||||||
val field6 = new StructField( | ||||||
"field", | ||||||
IntegerType.INTEGER, | ||||||
true, | ||||||
FieldMetadata.builder().putBoolean("a", true).build(), | ||||||
Seq(new TypeChange(IntegerType.INTEGER, LongType.LONG)).asJava) | ||||||
val field7 = new StructField( | ||||||
"field", | ||||||
LongType.LONG, | ||||||
true, | ||||||
FieldMetadata.empty(), | ||||||
Seq(new TypeChange(IntegerType.INTEGER, StringType.STRING)).asJava) | ||||||
|
||||||
assert(field1 == field2) | ||||||
assert(field1.hashCode() == field2.hashCode()) | ||||||
|
||||||
assert(field1 != field3) | ||||||
assert(field1 != field4) | ||||||
assert(field1 != field5) | ||||||
assert(field1 != field6) | ||||||
assert(field1 != field7) | ||||||
} | ||||||
|
||||||
Seq( | ||||||
new StructType(), | ||||||
new ArrayType(LongType.LONG, false), | ||||||
new MapType(LongType.LONG, LongType.LONG, false)).foreach { dataType => | ||||||
test(s"withType should throw exception with change types for nested types $dataType") { | ||||||
val field = new StructField( | ||||||
"field", | ||||||
dataType, | ||||||
true) | ||||||
assertThrows[KernelException] { | ||||||
field.withTypeChanges(Seq(new TypeChange(IntegerType.INTEGER, LongType.LONG)).asJava) | ||||||
} | ||||||
} | ||||||
|
||||||
test(s"Constructor should throw exception with change types for nested types $dataType") { | ||||||
|
||||||
assertThrows[KernelException] { | ||||||
new StructField( | ||||||
"field", | ||||||
dataType, | ||||||
true, | ||||||
FieldMetadata.empty(), | ||||||
Seq(new TypeChange(IntegerType.INTEGER, LongType.LONG)).asJava) | ||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
// Test metadata column detection | ||||||
test("metadata column detection") { | ||||||
val regularField = new StructField("regularField", IntegerType.INTEGER, true) | ||||||
assert(!regularField.isMetadataColumn) | ||||||
assert(regularField.isDataColumn) | ||||||
|
||||||
// Create a metadata field | ||||||
val metadataFieldName = "_metadata.custom" | ||||||
val metadataBuilder = FieldMetadata.builder() | ||||||
metadataBuilder.putBoolean("isMetadataColumn", true) | ||||||
val metadataField = | ||||||
new StructField(metadataFieldName, LongType.LONG, false, metadataBuilder.build()) | ||||||
|
||||||
assert(metadataField.isMetadataColumn) | ||||||
assert(!metadataField.isDataColumn) | ||||||
} | ||||||
|
||||||
// Test withNewMetadata method | ||||||
test("withNewMetadata") { | ||||||
val originalField = new StructField("field", IntegerType.INTEGER, true) | ||||||
assert(originalField.getMetadata() == FieldMetadata.empty()) | ||||||
|
||||||
val newMetadataBuilder = FieldMetadata.builder() | ||||||
newMetadataBuilder.putString("key", "value") | ||||||
val newMetadata = newMetadataBuilder.build() | ||||||
|
||||||
val updatedField = originalField.withNewMetadata(newMetadata) | ||||||
|
||||||
assert(updatedField.getName == originalField.getName) | ||||||
assert(updatedField.getDataType == originalField.getDataType) | ||||||
assert(updatedField.isNullable == originalField.isNullable) | ||||||
assert(updatedField.getMetadata == newMetadata) | ||||||
assert(updatedField.getMetadata.getString("key") == "value") | ||||||
} | ||||||
|
||||||
// Test type changes | ||||||
test("type changes") { | ||||||
val originalField = new StructField( | ||||||
"field", | ||||||
IntegerType.INTEGER, | ||||||
true, | ||||||
FieldMetadata.builder().putString("a", "b").build()) | ||||||
assert(originalField.getTypeChanges.isEmpty) | ||||||
|
||||||
val typeChanges = new ArrayList[TypeChange]() | ||||||
typeChanges.add(new TypeChange(IntegerType.INTEGER, LongType.LONG)) | ||||||
|
||||||
val updatedField = originalField.withTypeChanges(typeChanges) | ||||||
|
||||||
assert(updatedField.getName == originalField.getName) | ||||||
assert(updatedField.getDataType == originalField.getDataType) | ||||||
assert(updatedField.isNullable == originalField.isNullable) | ||||||
assert(updatedField.getMetadata == originalField.getMetadata) | ||||||
assert(updatedField.getTypeChanges.size() == 1) | ||||||
|
||||||
val typeChange = updatedField.getTypeChanges.get(0) | ||||||
assert(typeChange.getFrom == IntegerType.INTEGER) | ||||||
assert(typeChange.getTo == LongType.LONG) | ||||||
} | ||||||
|
||||||
// Test TypeChange class | ||||||
test("TypeChange class") { | ||||||
val from = IntegerType.INTEGER | ||||||
val to = LongType.LONG | ||||||
val typeChange = new TypeChange(from, to) | ||||||
|
||||||
assert(typeChange.getFrom == from) | ||||||
assert(typeChange.getTo == to) | ||||||
|
||||||
// Test equals and hashCode | ||||||
val sameTypeChange = new TypeChange(IntegerType.INTEGER, LongType.LONG) | ||||||
val differentTypeChange = new TypeChange(IntegerType.INTEGER, StringType.STRING) | ||||||
|
||||||
assert(typeChange == sameTypeChange) | ||||||
assert(typeChange.hashCode() == sameTypeChange.hashCode()) | ||||||
assert(typeChange != differentTypeChange) | ||||||
} | ||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess probably a very important question, how do we expect connectors to communicate type widening schema changes?
(1) We expect them to add the full metadata with the
TypeChanges
class to the schema?(2) They just provide a schema with the new type and behind the scenes we add this
TypeChanges
metadata- If this one, it seems like once again we come back to this being another good place where it would be preferred to have an internal vs external interface
Maybe we want to support both?