Skip to content

[perf][cp] Optimize deserialize UnsafeRows to RowVector#385

Open
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_opt_row_deserialize
Open

[perf][cp] Optimize deserialize UnsafeRows to RowVector#385
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_opt_row_deserialize

Conversation

@guhaiyan0221
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #191

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Summary:
The old API extracts the single data to std::string_view, which is not efficient, This new approach aligns with CompactRow deserialization, the benchmark shows it accelerates 0% to 5x times depending on the row type. For backward compatibility, old UnsafeRowDeserializer API calls the new API. The benchmark is not stable, the benchmark result is different depending on the machine status.

/mnt/DP_disk1/code/velox/build/velox/row/benchmarks# ./velox_unsafe_row_serialize_benchmark
============================================================================
[...]marks/UnsafeRowSerializeBenchmark.cpp     relative  time/iter   iters/s
============================================================================
unsafe_deserialize_fixedWidth5                            112.69us     8.87K
unsafe_deserialize_fast_fixedWidth5             488.34%    23.08us    43.33K
unsafe_deserialize_fixedWidth10                           124.26us     8.05K
unsafe_deserialize_fast_fixedWidth10            175.44%    70.83us    14.12K
unsafe_deserialize_fixedWidth20                           182.74us     5.47K
unsafe_deserialize_fast_fixedWidth20            99.864%   182.98us     5.46K
decimalsSerialize                                          49.70us    20.12K
decimalsDeserialize                                        70.44us    14.20K
unsafe_deserialize_strings1                                61.31us    16.31K
unsafe_deserialize_fast_strings1                180.84%    33.90us    29.50K
unsafe_deserialize_strings5                               114.08us     8.77K
unsafe_deserialize_fast_strings5                113.19%   100.78us     9.92K
unsafe_deserialize_arrays                                 126.41us     7.91K
unsafe_deserialize_fast_arrays                  373.10%    33.88us    29.51K
unsafe_deserialize_nestedArrays                           350.44us     2.85K
unsafe_deserialize_fast_nestedArrays            259.99%   134.79us     7.42K
unsafe_deserialize_maps                                   536.18us     1.87K
unsafe_deserialize_fast_maps                    500.97%   107.03us     9.34K
unsafe_deserialize_structs                                 87.50us    11.43K
unsafe_deserialize_fast_structs                 147.26%    59.42us    16.83K

Corresponding PR: facebookincubator/velox#11936

But 11936 is reverted by 12978, which has been reverted introduces a core dump when the schema is Row(MAP(INTEGER(), ROW({ARRAY(INTEGER()), INTEGER()}))) and the nested row is null, we should check the null buffer first, and then slice the ARRAY's buffer.

Corresponding PR: facebookincubator/velox#12841

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
  • Positive Impact: I have run benchmarks.
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

Summary:
The old API extracts the single data to std::string_view, which is not efficient, This new approach aligns with CompactRow deserialization, the benchmark shows it accelerates 0% to 5x times depending on the row type.
For backward compatibility, old UnsafeRowDeserializer API calls the new API. The benchmark is not stable, the benchmark result is different depending on the machine status.
```
/mnt/DP_disk1/code/velox/build/velox/row/benchmarks# ./velox_unsafe_row_serialize_benchmark
============================================================================
[...]marks/UnsafeRowSerializeBenchmark.cpp     relative  time/iter   iters/s
============================================================================
unsafe_deserialize_fixedWidth5                            112.69us     8.87K
unsafe_deserialize_fast_fixedWidth5             488.34%    23.08us    43.33K
unsafe_deserialize_fixedWidth10                           124.26us     8.05K
unsafe_deserialize_fast_fixedWidth10            175.44%    70.83us    14.12K
unsafe_deserialize_fixedWidth20                           182.74us     5.47K
unsafe_deserialize_fast_fixedWidth20            99.864%   182.98us     5.46K
decimalsSerialize                                          49.70us    20.12K
decimalsDeserialize                                        70.44us    14.20K
unsafe_deserialize_strings1                                61.31us    16.31K
unsafe_deserialize_fast_strings1                180.84%    33.90us    29.50K
unsafe_deserialize_strings5                               114.08us     8.77K
unsafe_deserialize_fast_strings5                113.19%   100.78us     9.92K
unsafe_deserialize_arrays                                 126.41us     7.91K
unsafe_deserialize_fast_arrays                  373.10%    33.88us    29.51K
unsafe_deserialize_nestedArrays                           350.44us     2.85K
unsafe_deserialize_fast_nestedArrays            259.99%   134.79us     7.42K
unsafe_deserialize_maps                                   536.18us     1.87K
unsafe_deserialize_fast_maps                    500.97%   107.03us     9.34K
unsafe_deserialize_structs                                 87.50us    11.43K
unsafe_deserialize_fast_structs                 147.26%    59.42us    16.83K
```

Corresponding PR: facebookincubator/velox#11936

But 11936 is reverted by 12978, which has been reverted introduces a core dump when the schema is `Row(MAP(INTEGER(), ROW({ARRAY(INTEGER()), INTEGER()})))`
and the nested row is null, we should check the null buffer first, and then slice the ARRAY's buffer.

Corresponding PR: facebookincubator/velox#12841
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the UnsafeRow deserialization path to avoid extracting rows as std::string_view and instead pass raw row pointers into a new UnsafeRowFast::deserialize implementation, aiming to improve performance and fix a reported crash in certain nested-null schemas.

Changes:

  • Switch UnsafeRow vector serde and benchmarks to deserialize via UnsafeRowFast::deserialize(std::vector<char*>).
  • Introduce UnsafeRowFast::deserialize(...) and serializedRowSizes(...) APIs and implement a new multi-row deserializer in UnsafeRowFast.cpp.
  • Update/replace UnsafeRow tests and adjust build targets accordingly.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
bolt/serializers/UnsafeRowSerializer.cpp Uses RowDeserializer<char*> and UnsafeRowFast::deserialize for vector serde deserialization.
bolt/serializers/RowSerializer.h Extends RowDeserializer to emit either std::string_view or raw pointers.
bolt/row/tests/UnsafeRowTest.cpp Reworks fuzz/roundtrip tests and adds a nested map/row regression test case.
bolt/row/tests/CMakeLists.txt Renames the UnsafeRow test source in the test target.
bolt/row/benchmark/UnsafeRowSerializeBenchmark.cpp Updates benchmark to deserialize using UnsafeRowFast.
bolt/row/benchmark/DynamicRowVectorDeserializeBenchmark.cpp Removes an older benchmark.
bolt/row/UnsafeRowFast.h Adds new APIs for batch size computation and batch deserialization.
bolt/row/UnsafeRowFast.cpp Implements new batch deserializer logic for UnsafeRow into vectors.
bolt/row/UnsafeRowDeserializers.h Replaces legacy implementation with a compatibility wrapper over UnsafeRowFast.
bolt/row/CompactRow.cpp Minor shared typedef alignment (TRowSize).
Comments suppressed due to low confidence (2)

bolt/row/tests/UnsafeRowTest.cpp:131

  • In testRoundTrip(), serialized.push_back(rawBuffer + offset) relies on implicit conversion from char* to T. If T is std::optional<std::string_view>, this will use the string_view(const char*) ctor and compute length via strlen, which is incorrect for binary UnsafeRow data (and can read past the buffer). Construct the string_view with an explicit length (size) when testing the legacy UnsafeRowDeserializer path.
    bolt/row/tests/UnsafeRowTest.cpp:81
  • doFuzzTest() currently generates randomized input vectors but never exercises serialization/deserialization or asserts equality (e.g., via testRoundTrip(inputVector)). As written, TEST_F(UnsafeRowTest, fast) can pass without validating anything, which weakens coverage for the new deserializer logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 178 to +183
serializedBuffers.emplace_back(std::move(serializedBuffer));
serializedRows.push_back(std::string_view(
serializedBuffers.back().data(), serializedBuffers.back().size()));
if constexpr (std::is_same_v<SerializeView, std::string_view>) {
serializedRows.push_back(std::string_view(
serializedBuffers.back().data(), serializedBuffers.back().size()));
} else {
serializedRows.push_back(serializedBuffers.back().data());
Comment on lines +1073 to +1076
auto* serializedNulls = readNulls(data[row] + offsets[row]);
const auto isNull =
(rawNulls != nullptr && bits::isBitNull(rawNulls, row)) ||
bits::isBitSet(serializedNulls, i);
Comment on lines +67 to +72
std::vector<char*> dataPtrs;
dataPtrs.reserve(data.size());
for (const auto& row : data) {
const char* ptr = row.value().data();
dataPtrs.emplace_back(const_cast<char*>(ptr));
}
Comment on lines +69 to 74
for (const auto& row : data) {
const char* ptr = row.value().data();
dataPtrs.emplace_back(const_cast<char*>(ptr));
}
return UnsafeRowFast::deserialize(dataPtrs, asRowType(type), pool);
}
Comment on lines 72 to +73
*result = std::dynamic_pointer_cast<RowVector>(
bolt::row::UnsafeRowDeserializer::deserialize(
serializedRows, type, pool));
bolt::row::UnsafeRowFast::deserialize(serializedRows, type, pool));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants