Skip to content

Conversation

@wyb
Copy link
Contributor

@wyb wyb commented Dec 25, 2025

Why I'm doing:

What I'm doing:

This PR introduces functionality to identify and handle missing data files (segments, delete vectors, primary key index sst, cols files) when repairing cloud-native tables.

  1. Refactor TabletMetadatas to TabletResult and introduced TabletMetadataEntry to include missing_files information.
  2. Update LakeServiceImpl::get_tablet_metadatas to optionally perform missing file checks.
  3. Update TabletRepairHelper.java to leverage the new structures and integrate the missing file detection and repair logic.

#66015

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Introduces per-version tablet metadata results with optional missing-file detection to improve repair workflows.

  • Proto changes: add check_missing_files to GetTabletMetadatasRequest; replace TabletMetadatas with TabletResult and TabletMetadataEntry (with missing_files); response now returns tablet_results.
  • BE: get_tablet_metadatas now emits tablet_results and, when enabled, checks existence of segments, delete vectors, pk index sst, and cols files; new helper check_missing_files; updated status handling and logs; corresponding tests added/updated.
  • FE: TabletRepairHelper updated to consume tablet_results, request missing-file checks, select valid metadata (accepts only sst-missing, clears sstableMeta), and proceed with repair; comprehensive unit tests adjusted and expanded.

Written by Cursor Bugbot for commit 3ab362e. This will update automatically on new commits. Configure here.

@sonarqubecloud
Copy link

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 89 / 93 (95.70%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/lake/TabletRepairHelper.java 89 93 95.70% [287, 343, 344, 384]

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 45 / 55 (81.82%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/service/service_be/lake_service.cpp 45 55 81.82% [1408, 1424, 1425, 1431, 1432, 1438, 1439, 1440, 1516, 1517]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances cloud-native table repair functionality to detect and handle missing data files (segments, delete vectors, primary key index sst, cols files). When metadata files exist but their referenced data files are missing, the system can now identify these issues and roll back to valid previous versions during repair operations.

Key Changes:

  • Introduced TabletMetadataEntry structure to wrap metadata with missing file information
  • Refactored TabletMetadatas to TabletResult to better represent tablet-level results with status
  • Added backend file existence checking functionality that's optionally enabled via check_missing_files flag
  • Implemented validation logic to determine if metadata with missing files can be recovered (e.g., missing only SST files that can be rebuilt)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
gensrc/proto/lake_service.proto Refactored protobuf schema: renamed TabletMetadatas to TabletResult, introduced TabletMetadataEntry with missing_files field, and added check_missing_files option to request
be/src/service/service_be/lake_service.cpp Implemented check_missing_files() function to validate file existence for segments, delete vectors, pk index sst, and cols files; integrated checking into get_tablet_metadatas() workflow
be/test/service/lake_service_test.cpp Updated tests to use new TabletResult and TabletMetadataEntry structures; added test cases for missing file detection scenarios
fe/fe-core/src/main/java/com/starrocks/lake/TabletRepairHelper.java Added validation methods checkTabletMetadataValid() and getValidTabletMetadata() to determine if metadata with missing files can be used for repair; integrated missing file detection into repair workflow
fe/fe-core/src/test/java/com/starrocks/lake/TabletRepairHelperTest.java Added comprehensive test coverage for missing file scenarios including cases where only SST files are missing (recoverable) vs. data files missing (non-recoverable)

Comment on lines +329 to +344
private static TabletMetadataPB getValidTabletMetadata(TabletMetadataEntry metadataEntry) {
TabletMetadataPB metadata = metadataEntry.metadata;
List<String> missingFiles = metadataEntry.missingFiles;
if (missingFiles == null || missingFiles.isEmpty()) {
// no missing files, metadata is valid
return metadata;
}

// only missing pk index sst files, clear sstableMeta
if (checkOnlySstFilesMissing(missingFiles)) {
metadata.sstableMeta = null;
return metadata;
}

Preconditions.checkState(false, "should not reach here");
return null;
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method mutates the original metadata object by setting metadata.sstableMeta = null on line 339. Since the metadata object comes from the metadataEntry.metadata field, this directly modifies the metadata stored in the entry, which could have unintended side effects if the entry is accessed later. Consider creating a copy of the metadata before modification to avoid mutating shared state.

Copilot uses AI. Check for mistakes.
@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants