V2 serialization format for wide columns with blob references#14314
V2 serialization format for wide columns with blob references#14314xingbowang wants to merge 8 commits intofacebook:mainfrom
Conversation
aebf39b to
09fdfab
Compare
003bfee to
578db34
Compare
Introduce a new V2 serialization format for wide column entities that supports storing individual column values in blob files. The V2 format adds a column type section that marks each column as either inline or blob-index, enabling per-column blob storage for large values. Key additions: - SerializeWithBlobIndices(): serialize entities with blob column references - DeserializeColumns(): deserialize V2 entities extracting blob column metadata - HasBlobColumns(): lightweight check for V2 blob references - GetVersion(): return the format version of a serialized entity - ResolveEntityBlobColumns(): resolve all blob references in an entity - Updated Deserialize() and GetValueOfDefaultColumn() for V2 compatibility
578db34 to
154c72a
Compare
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066. |
pdillinger
left a comment
There was a problem hiding this comment.
My primary concern about this approach is that the natural evolution of the data model is to have columns function as "subkeys" and pick up more and more features per-column that you get with keys and no wide columns, like merge, single column overwrite, single column delete, TimedPut, and likely more (kTypeValuePreferredSeqno). And we eventually end up re-implementing the cross product of all those features rather than making the data model leverage the features (almost) without re-implementation, by having a wide column ValueType entry itself contain ValueType entries for each column.
|
Updated the code to use ValueType in wide column type. |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066. |
pdillinger
left a comment
There was a problem hiding this comment.
Thanks for considering my feedback. LGTM overall
db/wide/wide_column_serialization.h
Outdated
| // Future per-column types (kTypeMerge, kTypeDeletion, etc.) can be | ||
| // added without format changes. | ||
| // | ||
| // Section 3: SKIP INFO (3 varints) |
There was a problem hiding this comment.
Suppose we write some SIMD code that's good at decoding a sequence of varints faster. In that case there might be an advantage to swapping the order of Section 2 and Section 3. Something to consider.
| // | bytes| bytes| | bytes | | ||
| // +------+------+---...---+--------+ | ||
| // | ||
| // When ct = kTypeBlobIndex, the cv contains a serialized BlobIndex. |
There was a problem hiding this comment.
Suppose we wanted to add a kind of index to speed up looking up a known column name (such as a perfect hash from column name to [0,N) index). Do you think we should have a provision in this V2 schema for that (currently rejected as a future feature) or save that for next schema version?
There was a problem hiding this comment.
yes, good point. Right now, it is only optimized for first column look up. There are lots of optimization we could do in future. E.g. given most of the record would have similar wide column schema, we could store common schema at table level to reduce the overhead of storing key repeatedly. I would save this for future optimization for now..
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
|
| Check | Count |
|---|---|
bugprone-argument-comment |
4 |
cert-err58-cpp |
1 |
cppcoreguidelines-avoid-non-const-global-variables |
1 |
cppcoreguidelines-pro-type-member-init |
4 |
cppcoreguidelines-special-member-functions |
1 |
google-explicit-constructor |
1 |
modernize-use-override |
1 |
performance-move-const-arg |
1 |
performance-unnecessary-value-param |
1 |
readability-braces-around-statements |
3 |
readability-isolate-declaration |
4 |
| Total | 22 |
Details
db/compaction/compaction_picker_fifo.cc (1 warning(s))
db/compaction/compaction_picker_fifo.cc:227:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
db/compaction/compaction_picker_test.cc (10 warning(s))
db/compaction/compaction_picker_test.cc:5213:64: warning: the parameter 'configure' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
db/compaction/compaction_picker_test.cc:5467:25: warning: argument name 'max_table' in comment does not match parameter name 'max_table_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5468:25: warning: argument name 'max_data' in comment does not match parameter name 'max_data_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5522:25: warning: argument name 'max_table' in comment does not match parameter name 'max_table_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5523:25: warning: argument name 'max_data' in comment does not match parameter name 'max_data_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5676:21: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5722:34: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5746:23: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5749:5: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
db/compaction/compaction_picker_test.cc:5886:7: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
table/block_based/block.cc (1 warning(s))
table/block_based/block.cc:878:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
table/block_based/block_based_table_factory.cc (2 warning(s))
table/block_based/block_based_table_factory.cc:189:5: warning: initialization of 'block_base_table_index_search_type_string_map' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
table/block_based/block_based_table_factory.cc:189:5: warning: variable 'block_base_table_index_search_type_string_map' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
table/table_test.cc (5 warning(s))
table/table_test.cc:701:15: warning: uninitialized record type: 'one_arg' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:2298:26: warning: passing result of std::move() as a const reference argument; no move will actually happen [performance-move-const-arg]
table/table_test.cc:5112:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:5131:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:5151:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
utilities/merge_operators/string_append/stringappend_test.cc (1 warning(s))
utilities/merge_operators/string_append/stringappend_test.cc:76:3: warning: single-argument constructors must be marked explicit to avoid unintentional implicit conversions [google-explicit-constructor]
utilities/transactions/write_prepared_transaction_test_seqno.cc (2 warning(s))
utilities/transactions/write_prepared_transaction_test_seqno.cc:30:7: warning: class 'WritePreparedTransactionSeqnoTest' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
utilities/transactions/write_prepared_transaction_test_seqno.cc:56:3: warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]
…y current PR" This reverts commit 80e59f1.
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066. |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066. |
Introduce a new V2 serialization format for wide column entities that supports storing individual column values in blob files. The V2 format adds a column type section that marks each column as either inline or blob-index, enabling per-column blob storage for large values.