Add PartitionedVector benchmark#1655
Add PartitionedVector benchmark#1655xin-zhang2 wants to merge 1 commit intoIBM:optimized_partitionedoutputfrom
Conversation
18b20cf to
216a2fa
Compare
|
|
||
| void run(const RowVectorPtr& vector, int32_t numPartitions) { | ||
| folly::BenchmarkSuspender suspender; | ||
| // roundRobinPartitionFunction(vector, numPartitions, partitions_); |
There was a problem hiding this comment.
Remove the commented out line, or make the distribution a dimension as well
There was a problem hiding this comment.
We can add a non-uniform distribution to test the skew case.
| const auto vectorCopy = std::static_pointer_cast<RowVector>( | ||
| BaseVector::copy(*vector, bm->getPool())); | ||
| suspender.dismiss(); | ||
| bm->run(vectorCopy, 10000); |
There was a problem hiding this comment.
10000 partitions?
#partitions also need to be one test dimension. The value of intcan be {4, 8, 16, 32, 64, 128, 256, 512, 1024}
There was a problem hiding this comment.
Made #partition a dimension, and currently use {4, 16, 64, 256, 1024} as the values.
| rawEndPartitionOffsets[i] = offset; | ||
| } | ||
| endPartitionOffsets_->setSize(numPartitions * sizeof(vector_size_t)); | ||
| } |
There was a problem hiding this comment.
Where did you set beginPartitionOffsets_?
There was a problem hiding this comment.
beginPartitionOffsets_ is set through initializeBeginPartitionOffsets method during the execution of PartitionedVector::create, so we don't set it here.
| BufferPtr topRowOffsets_; | ||
| BufferPtr beginPartitionOffsets_; | ||
| BufferPtr endPartitionOffsets_; | ||
| BufferPtr swappingBuffer_; |
There was a problem hiding this comment.
Never allocated or sized before use.
There was a problem hiding this comment.
swappingBuffer_ is also sized in partitionFixedWidthValues during the execution of PartitionedVector::create
| folly::runBenchmarks(); | ||
| bm.reset(); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Can you please add the run output as a comment at the end of the file?
There was a problem hiding this comment.
Added the output in the comment of this PR.
3fc79b1 to
d196f5a
Compare
dcfb00e to
208520c
Compare
2effd4d to
0292224
Compare
0b69a29 to
a3cf288
Compare
a3cf288 to
f34999b
Compare
| } | ||
|
|
||
| void calculatePartitionOffsets(vector_size_t numRows, int32_t numPartitions) { | ||
| ensureCapacity<vector_size_t>(topRowOffsets_, numRows, pool()); |
|
|
||
| protected: | ||
| std::vector<uint32_t> partitions_; | ||
| BufferPtr topRowOffsets_; |
| void calculatePartitionOffsets(vector_size_t numRows, int32_t numPartitions) { | ||
| ensureCapacity<vector_size_t>(topRowOffsets_, numRows, pool()); | ||
| ensureCapacity<vector_size_t>( | ||
| beginPartitionOffsets_, numPartitions, pool()); |
| auto vector = | ||
| bm->createTestVector(rowTypeGenerator, numRows, numColumns, isNullAt); | ||
| for (uint32_t i = 0; i < iterations; ++i) { | ||
| const auto vectorCopy = std::static_pointer_cast<RowVector>( |
There was a problem hiding this comment.
suspender active for the copy. This is not supposed to be measured.
There was a problem hiding this comment.
My understanding is that the copy is not measured, as it happens before suspender.dismiss(). Since the suspender is defined at the beginning of runBM, only the section between dismiss() and rehire() will be timed.
Please correct me if I’m misunderstanding how the suspender works.
| #include <numeric> | ||
|
|
||
| #include "dwio/common/tests/utils/BatchMaker.h" | ||
| #include "vector/VectorPrinter.h" |
|
|
||
| namespace { | ||
|
|
||
| auto gen_ = std::mt19937(std::random_device{}()); |
There was a problem hiding this comment.
auto -> thread_local auto
Folly's benchmark framework can run benchmarks concurrently. The shared gen_ in randomPartitionFunction and mixedFlatTypeGenerator would cause data races.
| BENCHMARK_SCALAR(HUGEINT); | ||
| BENCHMARK_SCALAR(REAL); | ||
| BENCHMARK_SCALAR(DOUBLE); | ||
| BENCHMARK_SCALAR(TIMESTAMP); |
There was a problem hiding this comment.
what about DATE and DECIMAL?
|
@xin-zhang2 After updating the PR, could you please update the results here? The previous values were including vector copies and not accurate. |
f34999b to
fdbaffc
Compare
|
@yingsu00 |
Add benchmark for the creation of PartitionedVectors (Only flat vectors now).
The following benchmark data was collected on an Apple machine with:
Chip: Apple M3 Max
Total Number of Cores: 16 (12 Performance and 4 Efficiency)
Memory: 64 GB
OS: macOS 26.3 (25D125)
Benchmark Results