Skip to content

Conversation

@Youngwb
Copy link
Contributor

@Youngwb Youngwb commented Dec 27, 2025

Why I'm doing:

#66944

What I'm doing:

This pull request introduces support for delete operations on Iceberg tables by adding a new IcebergDeleteSink class, its associated serialization logic, and comprehensive unit tests. The main focus is on enabling the planner to write position delete files to Iceberg tables, ensuring correct tuple validation and integration with the existing data sink infrastructure.

Iceberg Delete Sink Implementation:

  • Added a new IcebergDeleteSink class in fe/fe-core/src/main/java/com/starrocks/planner/IcebergDeleteSink.java to support delete operations for Iceberg tables, including tuple validation, configuration handling, and Thrift serialization.
  • Updated the TDataSinkType enum in gensrc/thrift/DataSinks.thrift to include the new ICEBERG_DELETE_SINK type for proper Thrift serialization and planner integration.

Testing and Validation:

  • Added a comprehensive test suite in fe/fe-core/src/test/java/com/starrocks/planner/IcebergDeleteSinkTest.java to verify tuple validation, Thrift serialization, and explain string output for the new sink.
    Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Iceberg delete sink

  • Adds IcebergDeleteSink (fe/fe-core/.../IcebergDeleteSink.java) to write Iceberg position delete files; validates tuple has _file (VARCHAR) and _pos (BIGINT), sets locations, compression, target file size, cloud config; provides explain output and Thrift serialization via TDataSinkType.ICEBERG_DELETE_SINK into TIcebergTableSink (uses data_location, file_format=parquet).

Thrift and tests

  • Extends gensrc/thrift/DataSinks.thrift with ICEBERG_DELETE_SINK.
  • Adds unit tests (IcebergDeleteSinkTest) covering tuple validation errors, Thrift serialization fields, and getExplainString().

Written by Cursor Bugbot for commit f63c4c7. This will update automatically on new commits. Configure here.

@wanpengfei-git wanpengfei-git requested a review from a team December 27, 2025 09:59
@Youngwb Youngwb changed the title [Feature] Add IcebergDeleteSink to support delete operations on Iceberg tables [Enhancement] Add IcebergDeleteSink to support delete operations on Iceberg tables Dec 27, 2025
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 54 / 59 (91.53%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/planner/IcebergDeleteSink.java 54 59 91.53% [95, 100, 146, 151, 156]

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata
Copy link
Contributor

@cursor review

tIcebergTableSink.setCompression_type(compression);
tIcebergTableSink.setTarget_max_file_size(targetMaxFileSize);
com.starrocks.thrift.TCloudConfiguration tCloudConfiguration = new com.starrocks.thrift.TCloudConfiguration();
cloudConfiguration.toThrift(tCloudConfiguration);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullPointerException when toThrift() called before init()

The cloudConfiguration field is only initialized in the init() method, not in the constructor. However, toThrift() uses cloudConfiguration.toThrift(tCloudConfiguration) without any null check. If toThrift() is called before init(), this will throw a NullPointerException. This is inconsistent with the similar IcebergTableSink class, which initializes cloudConfiguration directly in its constructor as a final field, making it safe to call toThrift() immediately after construction.

Additional Locations (2)

Fix in Cursor Fix in Web

if (IcebergTable.FILE_PATH.equals(colName)) {
hasFilePathColumn = true;
if (!slot.getType().equals(VarcharType.VARCHAR)) {
throw new StarRocksConnectorException("_file column must be type of VARCHAR");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type validation too strict, rejects valid VARCHAR lengths

The type validation uses equals() to compare types, but ScalarType.equals() for VARCHAR types also compares the length field. Since VarcharType.VARCHAR has len=-1 (wildcard), any column with a specific length like VARCHAR(255) would fail validation with the error "_file column must be type of VARCHAR" even though it is a valid VARCHAR. The codebase provides isVarchar() and matchesType() methods for flexible type checking that ignore length differences. The same issue applies to the BIGINT check, though BIGINT doesn't have length variants so it's less likely to manifest there.

Additional Locations (1)

Fix in Cursor Fix in Web

boolean hasPosColumn = false;

for (SlotDescriptor slot : desc.getSlots()) {
if (slot.getColumn() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (slot.getColumn() != null) {
if (slot.getColumn() == null) {
continue;
}

In this way, it will make the code more readable, because the indentation will be less.

Comment on lines +96 to +98
}
} else if (IcebergTable.ROW_POSITION.equals(colName)) {
hasPosColumn = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
} else if (IcebergTable.ROW_POSITION.equals(colName)) {
hasPosColumn = true;
}
continue;
}
if (IcebergTable.ROW_POSITION.equals(colName)) {
hasPosColumn = true;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants