Skip to content

Conversation

@dttung2905
Copy link
Contributor

@dttung2905 dttung2905 commented Oct 19, 2025

  • Write the logic for nested type checking
  • Add more test for the nested field validator
  • move validate unknown type from init() to somewhere like checkSchemaCompatibility

schema.go Outdated
Comment on lines 81 to 86
func (s *Schema) init() {
// Validate unknown type requirements
if err := s.validateUnknownTypes(); err != nil {
panic(fmt.Sprintf("Invalid schema: %v", err))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this return error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I think returning this in init() will have a cascading effect on many places in the code base. Let me look for some other places to validate the unknownTypes else, I'm happy to remove the validation of unknown type altogether :D . wydt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some schema validations in checkSchemaCompatibility, maybe you can fit it in there.

Eventually, we should bite the bullet and make the schema constructor fallible. While it's going to cascade and cause a lot of breakage, it'll give us a much safer type where we know that if it exists, it's in a valid configuration.

@dttung2905
Copy link
Contributor Author

@zeroshade Not sure if my understanding of the specs is correct but do we support UnknownType as nested type or only in the top-level column type? 🤔 Could you advise me on this?

@twuebi
Copy link
Contributor

twuebi commented Oct 27, 2025

@dttung2905, I'll dig a bit in the java code to see how they're handling that. Without having done that - to me it makes perfect sense to allow Unknown types within nested types. There may be a bunch of known types mixed with a single unsupported type in such a nested type. Allowing unknown here enables reading that.

@twuebi
Copy link
Contributor

twuebi commented Oct 27, 2025

@dttung2905, it's supported in nested types:

org.apache.iceberg.TestSchema#testUnknownSupport

  @Test
  public void testUnknownSupport() {
    // this needs a different schema because it cannot be used in required fields
    Schema schemaWithUnknown =
        new Schema(
            Types.NestedField.required(1, "id", Types.LongType.get()),
            Types.NestedField.optional(2, "top", Types.UnknownType.get()),
            Types.NestedField.optional(
                3, "arr", Types.ListType.ofOptional(4, Types.UnknownType.get())),
            Types.NestedField.required(
                5,
                "struct",
                Types.StructType.of(
                    Types.NestedField.optional(6, "inner_op", Types.UnknownType.get()),
                    Types.NestedField.optional(
                        7,
                        "inner_map",
                        Types.MapType.ofOptional(
                            8, 9, Types.StringType.get(), Types.UnknownType.get())),
                    Types.NestedField.optional(
                        10,
                        "struct_arr",
                        Types.StructType.of(
                            Types.NestedField.optional(11, "deep", Types.UnknownType.get()))))));

    assertThatThrownBy(() -> Schema.checkCompatibility(schemaWithUnknown, 2))
        .isInstanceOf(IllegalStateException.class)
        .hasMessage(
            "Invalid schema for v%s:\n"
                + "- Invalid type for top: %s is not supported until v%s\n"
                + "- Invalid type for arr.element: %s is not supported until v%s\n"
                + "- Invalid type for struct.inner_op: %s is not supported until v%s\n"
                + "- Invalid type for struct.inner_map.value: %s is not supported until v%s\n"
                + "- Invalid type for struct.struct_arr.deep: %s is not supported until v%s",
            2,
            Types.UnknownType.get(),
            MIN_FORMAT_VERSIONS.get(Type.TypeID.UNKNOWN),
            Types.UnknownType.get(),
            MIN_FORMAT_VERSIONS.get(Type.TypeID.UNKNOWN),
            Types.UnknownType.get(),
            MIN_FORMAT_VERSIONS.get(Type.TypeID.UNKNOWN),
            Types.UnknownType.get(),
            MIN_FORMAT_VERSIONS.get(Type.TypeID.UNKNOWN),
            Types.UnknownType.get(),
            MIN_FORMAT_VERSIONS.get(Type.TypeID.UNKNOWN));

    assertThatCode(() -> Schema.checkCompatibility(schemaWithUnknown, 3))
        .doesNotThrowAnyException();
  }

@zeroshade
Copy link
Member

thanks @twuebi for digging and confirming that

Signed-off-by: dttung2905 <[email protected]>
Signed-off-by: dttung2905 <[email protected]>
@dttung2905 dttung2905 marked this pull request as ready for review November 3, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants