-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Flink: Dynamic Iceberg Sink: Add HashKeyGenerator / RowDataEvolver / TableUpdateOperator #13277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…TableUpdateOperator This change adds the following components for the Flink Dynamic Iceberg Sink: *** HashKeyGenerator A hash key generator which will be used in DynamicIcebergSink class (next PR) to implement one of Iceberg's DistributionModes (NONE, HASH, RANGE). The HashKeyGenerator is responsible for creating the appropriate hash key for Flink's keyBy operation. The hash key is generated depending on the user-provided DynamicRecord and the table metadata. Under the hood, we maintain a set of Flink {@link KeySelector}s which implement the appropriate Iceberg {@link DistributionMode}. For every table, we randomly select a consistent subset of writer subtasks which receive data via their associated keys, depending on the chosen DistributionMode. Caching ensures that a new key selector is also created when the table metadata (e.g. schema, spec) or the user-provided metadata changes (e.g. distribution mode, write parallelism). *** RowDataEvolver RowDataEvolver is responsible to change the input RowData to make it compatible with the target schema. This is done when: 1. The input schema has fewer fields than the target schema. 2. The table types are wider than the input type. 3. The field order differs for source and target schema. The resolution is as follows: In the first case, we would add a null values for the missing field (if the field is optional). In the second case, we would convert the data for the input field to a wider type, e.g. int (input type) => long (table type). In the third case, we would rearrange the input data to match the target table. *** DynamicUpdateOperator A dedicated operator to updating the schema / spec for the table associated with a DynamicRecord.
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/NonThrowingKeySelector.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicSinkUtil.java
Show resolved
Hide resolved
|
||
private DynamicSinkUtil() {} | ||
|
||
static List<Integer> getEqualityFieldIds(List<String> equalityFields, Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit strange to me to ignore the empty equalityFields
.
Maybe a javadoc to highlight this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by ignoring empty equalityFields? There may be none defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, if the equalityFields is empty, then we fall back to the Schema defined stuff.
I feel that if (equalityFields==null) then Schema
is natural, but handling empty list as not set
is a bit strange.
Minimally we need to document it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have at least a javadoc for this?
Catalog catalog = catalogLoader.loadCatalog(); | ||
this.updater = | ||
new TableUpdater( | ||
new TableMetadataCache(catalog, cacheMaximumSize, cacheRefreshMs), catalog); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we share the TableMetadataCache
in a static way, so it is shared between the operator instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, possible. Let me look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to defer this change to a follow-up. There are some challenges with regard to concurrent get / put operations on the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
....0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicTableUpdateOperator.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
case RANGE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the RANGE
distribution mode work with the dyntable sink?
Range needed statistics collection and complicated infrastructure to work. It relied on Partitioners to direct the records to the correct writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we fall back to HASH. Adding support for RANGE in the Dynamic Sink will be a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - leave a comment/TODO
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Show resolved
Hide resolved
private final Integer specId; | ||
private final Schema schema; | ||
private final PartitionSpec spec; | ||
private final List<String> equalityFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a Set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. Need to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, we are always using List for equality fields (IcebergSink V1 / V2). It doesn't seem though that this is required. Updated everywhere. See 0412697.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only a single place where this touches a public API.
Maybe for backward compatibility we could use Collection
there. Or we just deprecate the old methods, add a new one (only ones which are currently used)
Seems like a good change to me, but I would ask @stevenzwu. He might know more why the equalityFields
are stored in a List
instead of a Set
.
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java
Show resolved
Hide resolved
...link/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestDynamicTableUpdateOperator.java
Show resolved
Hide resolved
@@ -60,7 +60,7 @@ public RowDataTaskWriterFactory( | |||
long targetFileSizeBytes, | |||
FileFormat format, | |||
Map<String, String> writeProperties, | |||
List<Integer> equalityFieldIds, | |||
Set<Integer> equalityFieldIds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is public API, since we don't have annotation on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I've reverted this change in favor of an internal conversion. See 05fd580
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/HashKeyGenerator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending tests
Merged to main. |
Thank for reviewing / merging @pvary 🙏 |
I'll prepare the backports. |
This change adds the following components for the Flink Dynamic Iceberg Sink:
HashKeyGenerator
A hash key generator which will be used in DynamicIcebergSink class (next PR) to implement one of Iceberg's DistributionModes (NONE, HASH, RANGE).
The HashKeyGenerator is responsible for creating the appropriate hash key for Flink's keyBy operation. The hash key is generated depending on the user-provided DynamicRecord and the table metadata. Under the hood, we maintain a set of Flink KeySelectors which implement the appropriate Iceberg DistributionMode. For every table, we randomly select a consistent subset of writer subtasks which receive data via their associated keys, depending on the chosen DistributionMode.
Caching ensures that a new key selector is also created when the table metadata (e.g. schema, spec) or the user-provided metadata changes (e.g. distribution mode, write parallelism).
RowDataEvolver
RowDataEvolver is responsible to change the input RowData to make it compatible with the target schema. This is done when:
The resolution is as follows:
In the first case, we would add a null values for the missing field (if the field is optional). In the second case, we would convert the data for the input field to a wider type, e.g. int (input type) => long (table type). In the third case, we would rearrange the input data to match the target table.
DynamicUpdateOperator
A dedicated operator to updating the schema / spec for the table associated with a DynamicRecord.