Skip to content

Conversation

@dirtysalt
Copy link
Contributor

@dirtysalt dirtysalt commented Dec 25, 2025

Why I'm doing

What I'm doing

  1. VariantRowValue Memory Layout Optimization

    • Changed from storing [metadata] and [value] in separate std::string objects to a single contiguous buffer _raw
    • Reduces memory allocations from 2 to 1, improves cache locality
    • Added predefined null-binary default for empty variants
    • serialize()/serialize_size() now copy single block instead of two separate copies
  2. VariantMetadata Lookup Index Caching

    • Added lazy dict_size calculation and _lookup_index struct to cache parsed dictionary string views
    • get_key() now returns std::string_view instead of std::string to avoid allocations
    • New _build_lookup_index() builds dictionary string view cache once
    • Fast _get_index() using cached views with std::lower_bound for sorted unique dictionaries
  3. Fast Inline Little-Endian Readers

    • Added inline_read_little_endian_unsigned32_fixed_size() template for 1-4 byte reads
    • Uses __builtin_memcpy for zero-overhead byte swapping via arrow::bit_util::FromLittleEndian
    • Replaces slower VariantUtil::read_little_endian_unsigned32() calls throughout
  4. VariantValue Access Optimizations

    • Inlined value_header() and basic_type() accessors
    • Cached dictionary index scan in get_object_by_key() with hint parameter
    • Template-based field ID search (SEARCH_DICT_INDEX_SZ) for different sizes (1-4 bytes)
    • Added UNLIKELY hints for error path optimization
  5. CastVariantToMap Improvements

    • get_key() returns std::string_view, uses Slice(key) directly to avoid key copies
    • Error messages use fmt::format for better formatting
  6. Naming Refactoring

    • Renamed ObjectExtraction → VariantObjectExtraction
    • Renamed ArrayExtraction → VariantArrayExtraction

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Focuses on performance and copy-reduction across Variant handling.

  • Variant storage/layout: VariantRowValue now stores [metadata][value] in a single _raw buffer with a predefined null binary; serialize/serialize_size copy one block
  • Dictionary lookup: VariantMetadata lazily computes dict size, caches dict string views via _build_lookup_index, and get_key returns std::string_view; fast inline LE readers replace previous helpers
  • Variant access: Inlines headers/type checks; accelerates get_object_by_key using cached indexes and fixed-size reads; adds bounds/overflow checks and UNLIKELY paths
  • Path parsing: Renames to VariantObjectExtraction/VariantArrayExtraction; VariantPath::seek uses std::visit with ASSIGN_OR_RETURN
  • Casting/conversion: CastVariantToMap uses string_view keys and Slice(key); improved error messages; variant_converter reduces copies and simplifies boolean/string handling
  • Misc: DEBUG-only warning on destroying MemTracker with unreleased children
  • Tests: Updated variant_path_parser_test and variant_value_test to new types/behaviors

Written by Cursor Bugbot for commit 4435c60. This will update automatically on new commits. Configure here.

@dirtysalt dirtysalt requested review from a team as code owners December 25, 2025 21:15
@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@dirtysalt dirtysalt marked this pull request as draft December 26, 2025 08:23
@dirtysalt dirtysalt marked this pull request as ready for review December 26, 2025 14:34
Copy link
Contributor

@alvin-celerdata alvin-celerdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cursor review

@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

}

keys_builder.append(Slice(std::string(key)));
keys_builder.append(Slice(key));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove memcpy

for (size_t seg_idx = 0; seg_idx < variant_path->segments.size(); ++seg_idx) {
const auto& segment = variant_path->segments[seg_idx];

StatusOr<VariantValue> sub;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce ctor of StatusOr, which has memory allocation operations inside.

_metadata(_metadata_raw),
_value(_value_raw) {}
: _raw(), _metadata_size(metadata.size()), _metadata(), _value() {
if (metadata.data() != VariantMetadata::kEmptyMetadata.data() ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fastpath for creating empty row value, to avoid memory allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and cocnate metadata and value into a continous memory area.

return (header() & kSortedStringMask) != 0;
}
Status VariantMetadata::_build_lookup_index() const {
uint32_t dict_sz = dict_size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build dict_strings in a single path. in this building process, we only need to call read_little_endian_unsigned N times and use speicliaize template methods.

and during get_index method, we can use linear scan or binary search in a lightweight way.

}

StatusOr<std::string> VariantMetadata::get_key(uint32_t index) const {
StatusOr<std::string_view> VariantMetadata::get_key(uint32_t index) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to return string, just return string_view suffices.


if (is_sorted_unique) {
if (dict_sz > kBinarySearchThreshold) {
auto it = std::lower_bound(dict_strings.begin(), dict_strings.end(), key);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use off-the-shelf algorithm.


std::vector<uint32_t> VariantMetadata::get_index(std::string_view key) const {
Status VariantMetadata::_get_index(std::string_view key, void* _indexes, int hint) const {
KeyIndexVector& indexes = *static_cast<KeyIndexVector*>(_indexes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use specialized vector index to avoid mmeory allocation.

case VariantType::INT8:
json_str << std::to_string(*variant.get_int8());
case VariantType::INT8: {
ASSIGN_OR_RETURN(int8_t value, variant.get_int8());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check status right here?

return str.status();
}

result.append(Slice(std::string(str.value())));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use string_view instead of string to avoid extra memory operation.

@dirtysalt dirtysalt enabled auto-merge (squash) December 27, 2025 10:49
Signed-off-by: yan zhang <[email protected]>
@alvin-celerdata
Copy link
Contributor

@cursor review

Signed-off-by: yan zhang <[email protected]>
@alvin-celerdata
Copy link
Contributor

@cursor review

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[BE Incremental Coverage Report]

fail : 2 / 4 (50.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exprs/cast_expr_map.cpp 2 4 50.00% [123, 124]

@alvin-celerdata alvin-celerdata merged commit 1b0af66 into StarRocks:main Dec 28, 2025
63 of 64 checks passed
@dirtysalt dirtysalt deleted the optimize-variant branch December 28, 2025 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants