Skip to content

Conversation

@XuQianJin-Stars
Copy link
Contributor

@XuQianJin-Stars XuQianJin-Stars commented Dec 28, 2025

Purpose

Linked issue: close #2252

This PR adds support for array type conversion between Fluss and Iceberg, enabling tiering of tables with array columns to Iceberg lakehouse storage.

Brief change log

  • Updated FlussDataTypeToIcebergDataType to convert Fluss ARRAY type to Iceberg LIST type instead of throwing UnsupportedOperationException
  • Created FlussArrayAsIcebergList adapter class to wrap Fluss InternalArray as Java List for Iceberg, supporting:
    • All primitive types (boolean, byte, short, int, long, float, double)
    • Complex types (string, char, decimal, timestamp, timestamp_ltz, date, time, binary, bytes)
    • Nested arrays (array of arrays)
    • Null handling for both array elements and null arrays
  • Updated FlussRowAsIcebergRecord to handle array field conversion using the new adapter
  • Enhanced IcebergConversions to support bidirectional type conversion (Iceberg LIST ↔ Fluss ARRAY)
  • Updated documentation to include ARRAY → LIST in the Iceberg data type mapping table

Tests

Unit Tests (FlussRowAsIcebergRecordTest - 9 test cases):

  • testArrayWithIntElements - Array of integers
  • testArrayWithStringElements - Array of strings
  • testNestedArrayType - Nested arrays (array of arrays)
  • testArrayWithAllPrimitiveTypes - Arrays of all primitive types
  • testArrayWithDecimalElements - Array of decimal values
  • testArrayWithTimestampElements - Arrays of TIMESTAMP and TIMESTAMP_LTZ
  • testArrayWithNullElements - Arrays with null elements
  • testNullArray - Null array handling
  • testArrayWithBinaryElements - Array of binary data

Integration Tests (IcebergTieringTest):

  • testTieringWriteTableWithArrayType - Parameterized test (4 cases) covering:
    • Primary key tables with array columns (partitioned/non-partitioned)
    • Log tables with array columns (partitioned/non-partitioned)

Test Results: All 92 tests pass (increased from 88 tests before this PR)

API and Format

API Changes: None. No breaking changes to public APIs.

Storage Format: No changes to Fluss storage format. This only affects the conversion layer between Fluss and Iceberg during tiering operations.

Type Mapping:

  • Fluss ARRAY<T> → Iceberg LIST<T> (element type T is converted recursively)
  • Supports nullable and non-nullable arrays
  • Maintains element type semantics through recursive conversion

Documentation

New Feature: Yes, this introduces array type support for Iceberg tiering.

Documentation Changes:

  • Updated website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md
  • Added ARRAY → LIST mapping to the data type compatibility table

This commit adds support for array type conversion between Fluss and Iceberg,
enabling tiering of tables with array columns to Iceberg lakehouse.

Key changes:
- Updated FlussDataTypeToIcebergDataType to convert Fluss ARRAY to Iceberg LIST type
- Created FlussArrayAsIcebergList adapter to convert Fluss InternalArray to Java List
- Updated FlussRowAsIcebergRecord to handle array field conversion
- Added array type support in IcebergConversions for bidirectional type mapping
- Added comprehensive unit tests for array type conversion with various element types
- Added integration tests for array type tiering with both primary key and log tables
- Updated documentation to include ARRAY -> LIST type mapping
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/issue-2252-iceberg-array-type branch from 8d99aa6 to 5973558 Compare December 28, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support tier array type for iceberg

1 participant