Skip to content

Comments

V2 serialization format for wide columns with blob references#14314

Open
xingbowang wants to merge 8 commits intofacebook:mainfrom
xingbowang:2026_02_06_blob_wide_column
Open

V2 serialization format for wide columns with blob references#14314
xingbowang wants to merge 8 commits intofacebook:mainfrom
xingbowang:2026_02_06_blob_wide_column

Conversation

@xingbowang
Copy link
Contributor

@xingbowang xingbowang commented Feb 6, 2026

Introduce a new V2 serialization format for wide column entities that supports storing individual column values in blob files. The V2 format adds a column type section that marks each column as either inline or blob-index, enabling per-column blob storage for large values.

@meta-cla meta-cla bot added the CLA Signed label Feb 6, 2026
@xingbowang xingbowang force-pushed the 2026_02_06_blob_wide_column branch from aebf39b to 09fdfab Compare February 7, 2026 14:20
@xingbowang xingbowang force-pushed the 2026_02_06_blob_wide_column branch 5 times, most recently from 003bfee to 578db34 Compare February 8, 2026 23:31
@xingbowang xingbowang marked this pull request as ready for review February 9, 2026 17:26
Introduce a new V2 serialization format for wide column entities that
supports storing individual column values in blob files. The V2 format
adds a column type section that marks each column as either inline or
blob-index, enabling per-column blob storage for large values.

Key additions:
- SerializeWithBlobIndices(): serialize entities with blob column references
- DeserializeColumns(): deserialize V2 entities extracting blob column metadata
- HasBlobColumns(): lightweight check for V2 blob references
- GetVersion(): return the format version of a serialized entity
- ResolveEntityBlobColumns(): resolve all blob references in an entity
- Updated Deserialize() and GetValueOfDefaultColumn() for V2 compatibility
@xingbowang xingbowang force-pushed the 2026_02_06_blob_wide_column branch from 578db34 to 154c72a Compare February 9, 2026 17:38
@meta-codesync
Copy link

meta-codesync bot commented Feb 10, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern about this approach is that the natural evolution of the data model is to have columns function as "subkeys" and pick up more and more features per-column that you get with keys and no wide columns, like merge, single column overwrite, single column delete, TimedPut, and likely more (kTypeValuePreferredSeqno). And we eventually end up re-implementing the cross product of all those features rather than making the data model leverage the features (almost) without re-implementation, by having a wide column ValueType entry itself contain ValueType entries for each column.

@xingbowang
Copy link
Contributor Author

Updated the code to use ValueType in wide column type.

@meta-codesync
Copy link

meta-codesync bot commented Feb 17, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering my feedback. LGTM overall

// Future per-column types (kTypeMerge, kTypeDeletion, etc.) can be
// added without format changes.
//
// Section 3: SKIP INFO (3 varints)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we write some SIMD code that's good at decoding a sequence of varints faster. In that case there might be an advantage to swapping the order of Section 2 and Section 3. Something to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped

// | bytes| bytes| | bytes |
// +------+------+---...---+--------+
//
// When ct = kTypeBlobIndex, the cv contains a serialized BlobIndex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we wanted to add a kind of index to speed up looking up a known column name (such as a perfect hash from column name to [0,N) index). Do you think we should have a provision in this V2 schema for that (currently rejected as a future feature) or save that for next schema version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point. Right now, it is only optimized for first column look up. There are lots of optimization we could do in future. E.g. given most of the record would have similar wide column schema, we could store common schema at table level to reduce the overhead of storing key repeatedly. I would save this for future optimization for now..

@github-actions
Copy link

github-actions bot commented Feb 21, 2026

⚠️ clang-tidy: 22 warning(s) on changed lines

Completed in 3850.9s.

Summary by check

Check Count
bugprone-argument-comment 4
cert-err58-cpp 1
cppcoreguidelines-avoid-non-const-global-variables 1
cppcoreguidelines-pro-type-member-init 4
cppcoreguidelines-special-member-functions 1
google-explicit-constructor 1
modernize-use-override 1
performance-move-const-arg 1
performance-unnecessary-value-param 1
readability-braces-around-statements 3
readability-isolate-declaration 4
Total 22

Details

db/compaction/compaction_picker_fifo.cc (1 warning(s))
db/compaction/compaction_picker_fifo.cc:227:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
db/compaction/compaction_picker_test.cc (10 warning(s))
db/compaction/compaction_picker_test.cc:5213:64: warning: the parameter 'configure' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]
db/compaction/compaction_picker_test.cc:5467:25: warning: argument name 'max_table' in comment does not match parameter name 'max_table_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5468:25: warning: argument name 'max_data' in comment does not match parameter name 'max_data_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5522:25: warning: argument name 'max_table' in comment does not match parameter name 'max_table_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5523:25: warning: argument name 'max_data' in comment does not match parameter name 'max_data_files_size' [bugprone-argument-comment]
db/compaction/compaction_picker_test.cc:5676:21: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5722:34: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5746:23: warning: statement should be inside braces [readability-braces-around-statements]
db/compaction/compaction_picker_test.cc:5749:5: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
db/compaction/compaction_picker_test.cc:5886:7: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
table/block_based/block.cc (1 warning(s))
table/block_based/block.cc:878:3: warning: multiple declarations in a single statement reduces readability [readability-isolate-declaration]
table/block_based/block_based_table_factory.cc (2 warning(s))
table/block_based/block_based_table_factory.cc:189:5: warning: initialization of 'block_base_table_index_search_type_string_map' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
table/block_based/block_based_table_factory.cc:189:5: warning: variable 'block_base_table_index_search_type_string_map' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
table/table_test.cc (5 warning(s))
table/table_test.cc:701:15: warning: uninitialized record type: 'one_arg' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:2298:26: warning: passing result of std::move() as a const reference argument; no move will actually happen [performance-move-const-arg]
table/table_test.cc:5112:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:5131:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
table/table_test.cc:5151:5: warning: uninitialized record type: 'fake_footer' [cppcoreguidelines-pro-type-member-init]
utilities/merge_operators/string_append/stringappend_test.cc (1 warning(s))
utilities/merge_operators/string_append/stringappend_test.cc:76:3: warning: single-argument constructors must be marked explicit to avoid unintentional implicit conversions [google-explicit-constructor]
utilities/transactions/write_prepared_transaction_test_seqno.cc (2 warning(s))
utilities/transactions/write_prepared_transaction_test_seqno.cc:30:7: warning: class 'WritePreparedTransactionSeqnoTest' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
utilities/transactions/write_prepared_transaction_test_seqno.cc:56:3: warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]

@meta-codesync
Copy link

meta-codesync bot commented Feb 21, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066.

@meta-codesync
Copy link

meta-codesync bot commented Feb 21, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D92832066.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants