Skip to content

fix: Manifest partition-spec serialization/deserialization is not interoperable with iceberg-java / iceberg-cpp / pyiceberg #2365

@123digits

Description

@123digits

Apache Iceberg Rust version

None

Describe the bug

Manifest partition-spec serialization/deserialization is not interoperable with iceberg-java / iceberg-cpp / pyiceberg

Problem

ManifestWriter in iceberg-rust writes the partition-spec entry of the Avro manifest user metadata as a bare JSON array of fields, and ManifestMetadata::try_from_avro_bytes only accepts that same bare-array shape. Every other Iceberg implementation (iceberg-java, iceberg-cpp, pyiceberg) writes (and requires) the spec-compliant serialized PartitionSpec object: {"spec-id": N, "fields": [...]}.

Consequences:

  • iceberg-rust fails on commit when the table has any snapshot committed by a non-rust writer. During fast_append, iceberg-rust loads the parent snapshot's manifests for the duplicate-file check (transaction/snapshot.rs), and deserialization blows up with:
    DataInvalid => Fail to parse partition spec in manifest metadata,
    source: invalid type: map, expected a sequence at line 1 column 0
    
  • Other Iceberg engines fail to read iceberg-rust manifests, because they expect the {"spec-id":N,"fields":[...]} object.

Net result: iceberg-rust can't share a table with any other implementation in either direction.

Reproduce

  1. Partition a table with any partition spec (using iceberg-java, iceberg-cpp, or pyiceberg).
  2. Commit at least one snapshot via that non-rust writer.
  3. Attempt a fast_append commit from iceberg-rust on the same table.

Error surfaces from crates/iceberg/src/spec/manifest/metadata.rs:83 via serde_json::from_slice::<Vec<PartitionField>>(bs).

Observed on 0.8.0, 0.9.0, and main (commit 4b0b3525) as of 2026-04-24.

Expected behavior

iceberg-rust should:

  1. Write partition-spec as the spec-compliant object form so other implementations can read our manifests.
  2. Accept both shapes on read — the spec-compliant object AND the historical bare-array form — so existing rust-written manifests continue to deserialize.

Proposed fix

Two small changes in crates/iceberg/src/spec/manifest/:

writer.rs — serialize the full PartitionSpec:

to_vec(&self.metadata.partition_spec).map_err(...)  // was: to_vec(&...partition_spec.fields())

metadata.rs — accept either shape on read, preferring the bare-array path for backwards compatibility:

serde_json::from_slice::<Vec<PartitionField>>(bs)
    .or_else(|_| {
        #[derive(serde::Deserialize)]
        struct PartitionSpecJson { fields: Vec<PartitionField> }
        serde_json::from_slice::<PartitionSpecJson>(bs).map(|s| s.fields)
    })
    .map_err(...)

Existing 17 manifest unit tests (including writer::tests::test_add_delete_existing and test_v3_delete_manifest_delete_file_roundtrip) pass unchanged with both patches applied. Diff total: 29 insertions, 8 deletions across the two files.

I have a local patch working against a table with mixed rust/cpp writers. Working on putting up a PR for this shortly.

To Reproduce

No response

Expected behavior

No response

Willingness to contribute

I can contribute a fix for this bug independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions