-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[BugFix] Store KVStore reference in Rowset to prevent metadata access errors #67266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… errors Signed-off-by: luohaha <[email protected]>
[BE Incremental Coverage Report]❌ fail : 50 / 66 (75.76%) file detail
|
Signed-off-by: luohaha <[email protected]>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
|
@cursor review |
| RETURN_IF_ERROR( | ||
| RowsetFactory::create_rowset(_context.tablet_schema, _context.rowset_path_prefix, rowset_meta, &rowset)); | ||
| TabletSharedPtr tablet = StorageEngine::instance()->tablet_manager()->get_tablet(_context.tablet_id); | ||
| RETURN_IF(tablet == nullptr, Status::InvalidArgument(fmt::format("Not Found tablet:{}", _context.tablet_id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rowset build fails when tablet doesn't exist during restore
The RowsetWriter::build() method now requires the tablet to exist by looking it up via tablet_manager()->get_tablet() and returning an error if null. Previously, the rowset was created without requiring the tablet to exist. This breaks snapshot restore flows where _rename_rowset_id calls rs_writer->build() but the tablet hasn't been created yet. In SnapshotManager::convert_rowset_ids, rowsets are renamed and rebuilt before the tablet exists, causing the new null check to fail with "Not Found tablet" error.
Why I'm doing:
This PR fixes metadata access errors in Primary Key tables when the same tablet is duplicated across different disks on the same BE node.
Problem
For Primary Key tables, rowset data includes not only data files but also metadata in KVStore (Delete Vectors and Delta Column Groups). Previously, KVStore references were passed as parameters through multiple method calls, which could lead to accessing the wrong KVStore when:
This resulted in:
What I'm doing:
1. Store KVStore reference in Rowset
_kvstoremember variable to the Rowset class2. Simplify method signatures
Removed KVStore parameters from methods that now use the internal reference:
remove_delta_column_group()- no longer needs KVStore parameterlink_files_to()- simplified signaturecopy_files_to()- simplified signatureget_segment_iterators2()- replaced separate meta/dcg_meta params with MetaLoadMode3. Introduce MetaLoadMode enum
Added explicit control for metadata loading:
NONE: No metadata loadedDELETE_VEC_ONLY: Load only Delete VectorsDCG_ONLY: Load only Delta Column GroupsALL: Load both Delete Vectors and DCGsThis makes the API clearer and prevents ambiguity about which metadata to load.
Changes
Core files:
be/src/storage/rowset/rowset.h- Added _kvstore member and MetaLoadMode enumbe/src/storage/rowset/rowset.cpp- Updated methods to use internal _kvstorebe/src/storage/rowset/segment_iterator.cpp- Simplified DCG loading logicbe/src/storage/rowset/segment_options.h- Removed redundant flagCallers updated:
tablet_updates.cpp- Use MetaLoadMode::ALL for schema conversionprimary_index.cpp- Use MetaLoadMode::ALL for index constructionlocal_primary_key_recover.cpp- Use MetaLoadMode::NONE for recoveryprimary_key_dump.cpp- Use MetaLoadMode::NONE for key dumpingBenefits
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Addresses PK-table metadata correctness by centralizing metadata access in
Rowsetand making loading explicit._kvstoretoRowset; constructor/factory updated and used across tablet load, writer, clone, snapshot, migration, and testsget_segment_iterators2,link_files_to,copy_files_to,remove_delta_column_groupwith internal_kvstoreMetaLoadMode {NONE, DELETE_VEC_ONLY, DCG_ONLY, ALL}and wire through segment iteration; removeread_by_generated_column_adding_kvstoreto avoid cross-disk metadata mix-upsALLfor index/compaction/schema-change;NONEfor dumps/recovery)segment_iterator.cpp; tidy related optionsWritten by Cursor Bugbot for commit df26c5e. This will update automatically on new commits. Configure here.