Skip to content

Conversation

@GavinMar
Copy link
Contributor

@GavinMar GavinMar commented Dec 24, 2025

Why I'm doing:

From iceberg spec, a write property write.parquet.compression-codec of table is used to control the parquet compression codec of the table data, and the default value of the property is zstd.

StarRocks now supports the compression_codec when creating iceberg and hive tables, and it will be set to the write.parquet.compression-codec property for iceberg native table.

However, when writing iceberg tables, we ignore this property. StarRocks now use the system variable connector_sink_compression_codec to control the compression codec of all connector tables, and it's default value is "uncompressed".

So, by default, when creating an iceberg table in StarRocks, the table property write.parquet.compression-codec is zstd, while the data written to this table is uncompressed, which make users confused.

What I'm doing:

  • Use the write.parquet.compression-codec property when writing an iceberg table to follow the iceberg spec.
  • If the table does not contain a compression codec property, use the system variable connector_sink_compression_codec instead.
    • Similarly, modify the compression logic of hive tables.
  • Update the related documents.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Implements compression precedence for connector sinks and aligns docs.

  • HiveTableSink/IcebergTableSink: use table properties (compression_codec, write.parquet.compression-codec/PARQUET_COMPRESSION) first; fallback to session var connector_sink_compression_codec; Textfile remains NO_COMPRESSION.
  • UTs: add coverage for table-property usage and session fallback for both Hive and Iceberg.
  • Docs (EN/ZN): reintroduce compression_codec property, document precedence over system variable, clarify when connector_sink_compression_codec applies, and set Iceberg default compression to zstd.

Written by Cursor Bugbot for commit 389519b. This will update automatically on new commits. Configure here.

@github-actions github-actions bot added behavior_changed documentation Improvements or additions to documentation 4.0 labels Dec 24, 2025
@sonarqubecloud
Copy link

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 4 / 4 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/planner/IcebergTableSink.java 2 2 100.00% []
🔵 com/starrocks/planner/HiveTableSink.java 2 2 100.00% []

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata
Copy link
Contributor

@cursor review

.toLowerCase();
this.compressionType = sessionVariable.getConnectorSinkCompressionCodec();
this.compressionType = nativeTable.properties().getOrDefault(PARQUET_COMPRESSION,
sessionVariable.getConnectorSinkCompressionCodec());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing compression validation for Iceberg table properties

The IcebergTableSink now reads compressionType from native table properties (line 65-66), but unlike HiveTableSink, there's no validation before using it in toThrift(). At line 124, PARQUET_COMPRESSION_TYPE_MAP.get(compressionType) can return null if the table property contains an unsupported compression codec. This null is then passed to setCompression_type(). In contrast, HiveTableSink validates using Preconditions.checkState() before calling .get(). If an Iceberg table was created outside StarRocks with an unusual compression codec, this could cause unexpected behavior in the backend.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4.0 documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants