-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Enhancement] optimize variant key query #67235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
a742bfe to
c2527fa
Compare
alvin-celerdata
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cursor review
|
@cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
Signed-off-by: yan zhang <[email protected]>
| } | ||
|
|
||
| keys_builder.append(Slice(std::string(key))); | ||
| keys_builder.append(Slice(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove memcpy
| for (size_t seg_idx = 0; seg_idx < variant_path->segments.size(); ++seg_idx) { | ||
| const auto& segment = variant_path->segments[seg_idx]; | ||
|
|
||
| StatusOr<VariantValue> sub; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce ctor of StatusOr, which has memory allocation operations inside.
| _metadata(_metadata_raw), | ||
| _value(_value_raw) {} | ||
| : _raw(), _metadata_size(metadata.size()), _metadata(), _value() { | ||
| if (metadata.data() != VariantMetadata::kEmptyMetadata.data() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastpath for creating empty row value, to avoid memory allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and cocnate metadata and value into a continous memory area.
| return (header() & kSortedStringMask) != 0; | ||
| } | ||
| Status VariantMetadata::_build_lookup_index() const { | ||
| uint32_t dict_sz = dict_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build dict_strings in a single path. in this building process, we only need to call read_little_endian_unsigned N times and use speicliaize template methods.
and during get_index method, we can use linear scan or binary search in a lightweight way.
| } | ||
|
|
||
| StatusOr<std::string> VariantMetadata::get_key(uint32_t index) const { | ||
| StatusOr<std::string_view> VariantMetadata::get_key(uint32_t index) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to return string, just return string_view suffices.
|
|
||
| if (is_sorted_unique) { | ||
| if (dict_sz > kBinarySearchThreshold) { | ||
| auto it = std::lower_bound(dict_strings.begin(), dict_strings.end(), key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use off-the-shelf algorithm.
|
|
||
| std::vector<uint32_t> VariantMetadata::get_index(std::string_view key) const { | ||
| Status VariantMetadata::_get_index(std::string_view key, void* _indexes, int hint) const { | ||
| KeyIndexVector& indexes = *static_cast<KeyIndexVector*>(_indexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use specialized vector index to avoid mmeory allocation.
| case VariantType::INT8: | ||
| json_str << std::to_string(*variant.get_int8()); | ||
| case VariantType::INT8: { | ||
| ASSIGN_OR_RETURN(int8_t value, variant.get_int8()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check status right here?
| return str.status(); | ||
| } | ||
|
|
||
| result.append(Slice(std::string(str.value()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use string_view instead of string to avoid extra memory operation.
Signed-off-by: yan zhang <[email protected]>
|
@cursor review |
Signed-off-by: yan zhang <[email protected]>
|
@cursor review |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]❌ fail : 2 / 4 (50.00%) file detail
|
Why I'm doing
What I'm doing
VariantRowValue Memory Layout Optimization
VariantMetadata Lookup Index Caching
Fast Inline Little-Endian Readers
VariantValue Access Optimizations
CastVariantToMap Improvements
Naming Refactoring
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Focuses on performance and copy-reduction across Variant handling.
VariantRowValuenow stores[metadata][value]in a single_rawbuffer with a predefined null binary;serialize/serialize_sizecopy one blockVariantMetadatalazily computes dict size, caches dict string views via_build_lookup_index, andget_keyreturnsstd::string_view; fast inline LE readers replace previous helpersget_object_by_keyusing cached indexes and fixed-size reads; adds bounds/overflow checks and UNLIKELY pathsVariantObjectExtraction/VariantArrayExtraction;VariantPath::seekusesstd::visitwithASSIGN_OR_RETURNCastVariantToMapusesstring_viewkeys andSlice(key); improved error messages;variant_converterreduces copies and simplifies boolean/string handlingMemTrackerwith unreleased childrenvariant_path_parser_testandvariant_value_testto new types/behaviorsWritten by Cursor Bugbot for commit 4435c60. This will update automatically on new commits. Configure here.