Skip to content

Arrow to iceberg schema conversion does not preserve names #1039

Open
@feniljain

Description

@feniljain

Apache Iceberg Rust version

0.4.0 (latest version)

Describe the bug

I was trying to debug a failing test I wrote for nan value count PR, and realized name of primitive column inside list arrow type is changed and test was failing due to this reason. On further debugging, realized we use "element" constant name here.

Same seems to be the case for map type too:

One doubt I have is it worth fixing? Is it okay to write tests with hardcoded names? Can it break anything for external consumers?

To Reproduce

Test to reproduce:

file: arrow/schema.rs

test:

    #[test]
    fn test_arrow_schema_to_schema_list() {
        let schema_list_float_field = Field::new("col1", DataType::Float32, true).with_metadata(
            HashMap::from([(PARQUET_FIELD_ID_META_KEY.to_string(), "1".to_string())]),
        );

        let arrow_schema = {
            let fields = vec![Field::new_list(
                "col0",
                schema_list_float_field.clone(),
                true,
            )
            .with_metadata(HashMap::from([(
                PARQUET_FIELD_ID_META_KEY.to_string(),
                "0".to_string(),
            )]))];
            Arc::new(arrow_schema::Schema::new(fields))
        };

        let converted_schema = arrow_schema_to_schema(&arrow_schema).expect("Could not convert to iceberg schema");

        assert_eq!(converted_schema.field_by_id(1).unwrap().name, String::from("col1"));
    }

Expected behavior

Above test should pass :)

Willingness to contribute

  • I can contribute a fix for this bug independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    • Status

      No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions