[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205

GavinMar · 2025-12-24T10:54:30Z

Why I'm doing:

From iceberg spec, a write property write.parquet.compression-codec of table is used to control the parquet compression codec of the table data, and the default value of the property is zstd.

StarRocks now supports the compression_codec when creating iceberg and hive tables, and it will be set to the write.parquet.compression-codec property for iceberg native table.

However, when writing iceberg tables, we ignore this property. StarRocks now use the system variable connector_sink_compression_codec to control the compression codec of all connector tables, and it's default value is "uncompressed".

So, by default, when creating an iceberg table in StarRocks, the table property write.parquet.compression-codec is zstd, while the data written to this table is uncompressed, which make users confused.

What I'm doing:

Use the write.parquet.compression-codec property when writing an iceberg table to follow the iceberg spec.
If the table does not contain a compression codec property, use the system variable connector_sink_compression_codec instead.
- Similarly, modify the compression logic of hive tables.
Update the related documents.

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Note

Implements compression precedence for connector sinks and aligns docs.

HiveTableSink/IcebergTableSink: use table properties (compression_codec, write.parquet.compression-codec/PARQUET_COMPRESSION) first; fallback to session var connector_sink_compression_codec; Textfile remains NO_COMPRESSION.
UTs: add coverage for table-property usage and session fallback for both Hive and Iceberg.
Docs (EN/ZN): reintroduce compression_codec property, document precedence over system variable, clarify when connector_sink_compression_codec applies, and set Iceberg default compression to zstd.

^{Written by Cursor Bugbot for commit 389519b. This will update automatically on new commits. Configure here.}

… in connector sink modules. Signed-off-by: Gavin <[email protected]>

…dules Signed-off-by: Gavin <[email protected]>

sonarqubecloud · 2025-12-24T12:06:28Z

Quality Gate passed

Issues
18 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2025-12-24T12:58:05Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-12-24T12:58:54Z

[FE Incremental Coverage Report]

✅ pass : 4 / 4 (100.00%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/planner/IcebergTableSink.java	2	2	100.00%	[]
🔵	com/starrocks/planner/HiveTableSink.java	2	2	100.00%	[]

github-actions · 2025-12-24T12:59:15Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

alvin-celerdata · 2025-12-24T17:32:27Z

@cursor review

cursor · 2025-12-24T17:38:04Z

fe/fe-core/src/main/java/com/starrocks/planner/IcebergTableSink.java

                .toLowerCase();
-        this.compressionType = sessionVariable.getConnectorSinkCompressionCodec();
+        this.compressionType = nativeTable.properties().getOrDefault(PARQUET_COMPRESSION,
+                sessionVariable.getConnectorSinkCompressionCodec());


Missing compression validation for Iceberg table properties

The IcebergTableSink now reads compressionType from native table properties (line 65-66), but unlike HiveTableSink, there's no validation before using it in toThrift(). At line 124, PARQUET_COMPRESSION_TYPE_MAP.get(compressionType) can return null if the table property contains an unsupported compression codec. This null is then passed to setCompression_type(). In contrast, HiveTableSink validates using Preconditions.checkState() before calling .get(). If an Iceberg table was created outside StarRocks with an unusual compression codec, this could cause unexpected behavior in the backend.

Additional Locations (1)

fe/fe-core/src/main/java/com/starrocks/planner/IcebergTableSink.java#L123-L125

[Enhancement] Prioritize using the "compression_codec" table property…

438b9e5

… in connector sink modules. Signed-off-by: Gavin <[email protected]>

github-actions bot added behavior_changed documentation Improvements or additions to documentation 4.0 labels Dec 24, 2025

mergify bot assigned GavinMar Dec 24, 2025

[Doc] Add documents related to compression codec in connector sink mo…

389519b

…dules Signed-off-by: Gavin <[email protected]>

GavinMar force-pushed the iceberg_compress branch from e1c1432 to 389519b Compare December 24, 2025 11:07

github-actions bot removed the behavior_changed label Dec 24, 2025

cursor bot reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205

[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205

GavinMar commented Dec 24, 2025 •

edited by cursor bot

Loading

Uh oh!

sonarqubecloud bot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

alvin-celerdata commented Dec 24, 2025

Uh oh!

cursor bot Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205

Are you sure you want to change the base?

[Enhancement] Prioritize using the "compression_codec" table property in connector sink modules. #67205

Conversation

GavinMar commented Dec 24, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

sonarqubecloud bot commented Dec 24, 2025

Quality Gate passed

Uh oh!

github-actions bot commented Dec 24, 2025

[Java-Extensions Incremental Coverage Report]

Uh oh!

github-actions bot commented Dec 24, 2025

[FE Incremental Coverage Report]

file detail

Uh oh!

github-actions bot commented Dec 24, 2025

[BE Incremental Coverage Report]

Uh oh!

alvin-celerdata commented Dec 24, 2025

Uh oh!

cursor bot Dec 24, 2025

Choose a reason for hiding this comment

Missing compression validation for Iceberg table properties

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GavinMar commented Dec 24, 2025 •

edited by cursor bot

Loading