-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… in connector sink modules. Signed-off-by: Gavin <[email protected]>
…dules Signed-off-by: Gavin <[email protected]>
e1c1432 to
389519b
Compare
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 4 / 4 (100.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
|
@cursor review |
| .toLowerCase(); | ||
| this.compressionType = sessionVariable.getConnectorSinkCompressionCodec(); | ||
| this.compressionType = nativeTable.properties().getOrDefault(PARQUET_COMPRESSION, | ||
| sessionVariable.getConnectorSinkCompressionCodec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing compression validation for Iceberg table properties
The IcebergTableSink now reads compressionType from native table properties (line 65-66), but unlike HiveTableSink, there's no validation before using it in toThrift(). At line 124, PARQUET_COMPRESSION_TYPE_MAP.get(compressionType) can return null if the table property contains an unsupported compression codec. This null is then passed to setCompression_type(). In contrast, HiveTableSink validates using Preconditions.checkState() before calling .get(). If an Iceberg table was created outside StarRocks with an unusual compression codec, this could cause unexpected behavior in the backend.



Why I'm doing:
From iceberg spec, a write property
write.parquet.compression-codecof table is used to control the parquet compression codec of the table data, and the default value of the property iszstd.StarRocks now supports the
compression_codecwhen creating iceberg and hive tables, and it will be set to thewrite.parquet.compression-codecproperty for iceberg native table.However, when writing iceberg tables, we ignore this property. StarRocks now use the system variable
connector_sink_compression_codecto control the compression codec of all connector tables, and it's default value is "uncompressed".So, by default, when creating an iceberg table in StarRocks, the table property
write.parquet.compression-codeciszstd, while the data written to this table isuncompressed, which make users confused.What I'm doing:
write.parquet.compression-codecproperty when writing an iceberg table to follow the iceberg spec.connector_sink_compression_codecinstead.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Implements compression precedence for connector sinks and aligns docs.
compression_codec,write.parquet.compression-codec/PARQUET_COMPRESSION) first; fallback to session varconnector_sink_compression_codec; Textfile remainsNO_COMPRESSION.compression_codecproperty, document precedence over system variable, clarify whenconnector_sink_compression_codecapplies, and set Iceberg default compression tozstd.Written by Cursor Bugbot for commit 389519b. This will update automatically on new commits. Configure here.