feat(joblist,sorting): MCOL-3758 new ORDER BY algo#3790
Conversation
|
/gemini review |
1 similar comment
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces new sorting algorithms, including PDQ sort and a heap-based merge sort, to enhance ORDER BY performance, particularly for large datasets. The changes are substantial and well-structured, with new sorting implementations, a factory method in TupleAnnexStep to select the appropriate algorithm, and supporting data structures. While this is a valuable feature, I've identified a couple of critical issues that need to be addressed before merging. Specifically, there's a memory leak in HeapOrderBy due to incorrect std::unique_ptr usage and a significant bug in the sorting key generation for unsigned integers that will lead to incorrect results. I have also included several medium-severity recommendations to improve code quality and maintainability.
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x80; |
There was a problem hiding this comment.
The logic for creating a sortable key for unsigned integers is incorrect. The code performs a sign-bit flip (*valPtr ^= 0x80;), which is a technique for signed integers to make them comparable as unsigned. For unsigned integers, this will produce an incorrect sort order. For example, for uint8_t, 128 (binary 10000000) would become 0, and 0 would become 128, causing 128 to be sorted before 0 in an ascending sort. This line should be removed. This issue also exists for USMALLINT, UINT, and UBIGINT types in this file.
| joblist::OrderByKeysType jobListorderByRGColumnIDs_; | ||
| rowgroup::RowGroup rg_; | ||
| std::unique_ptr<joblist::MemManager> mm_; | ||
| std::unique_ptr<uint8_t> keyBuf_; |
There was a problem hiding this comment.
keyBuf_ is declared as std::unique_ptr<uint8_t>, but it's allocated as an array using new uint8_t[...] in heaporderby.cpp. This will lead to a memory leak because the default deleter for std::unique_ptr<T> calls delete instead of delete[], deallocating only the first element of the array. To fix this, you should declare keyBuf_ as std::unique_ptr<uint8_t[]>.
| std::unique_ptr<uint8_t> keyBuf_; | |
| std::unique_ptr<uint8_t[]> keyBuf_; |
| // CHAR_1BYTE, | ||
| // CHAR_2BYTES, | ||
| // CHAR_4BYTES, | ||
| // CHAR_8BYTES, | ||
| // VARCHAR_1BYTE, | ||
| // VARCHAR_2BYTES, | ||
| // VARCHAR_4BYTES, | ||
| // VARCHAR_8BYTES, | ||
| }; |
| { | ||
| return s128Value == static_cast<int128_t>(x); | ||
| } | ||
| inline bool operator==(const TSInt128 x) const |
| KeyType::KeyType(rowgroup::RowGroup& rg, const joblist::OrderByKeysType& colsAndDirection, | ||
| const sorting::PermutationType p, uint8_t* buf) | ||
| { |
There was a problem hiding this comment.
The KeyType constructor is very long and contains a lot of duplicated code for handling different integer types (e.g., TINYINT, SMALLINT, INT, BIGINT and their unsigned versions). This reduces maintainability. Consider refactoring this into a template function that can handle all integer types, parameterized by the type and size, to reduce code duplication.
|
|
||
| auto end = std::chrono::steady_clock::now(); | ||
| std::chrono::duration<double> elapsed_seconds = end - start; | ||
| std::cout << "keys " + std::to_string(id) + " elapsed time: " << elapsed_seconds.count() << "s\n"; |
| utils::setThreadName(threadName.c_str()); | ||
| // std::cout << "enter " << threadName << std::endl; | ||
| RGData rgDataOut; | ||
| sorting::PermutationVec perm; |
There was a problem hiding this comment.
The perm vector is populated inside a loop using perm.insert(). This can be inefficient as it may lead to multiple reallocations if the final size is large. It would be more performant to first calculate the total number of elements that will be inserted, call perm.reserve() to allocate the required memory at once, and then use perm.insert() or std::copy.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new parallel sorting mechanism for ORDER BY operations, leveraging PDQOrderBy for initial per-thread sorting and HeapOrderBy for merging results. Key changes include adding cstdint for integral type definitions, refining decimal width constants, and implementing TSInt128 comparison. The core sorting logic in heaporderby.cpp and pdqorderby.cpp handles various data types, null values, and byte-swapping for correct ordering. The TupleAnnexStep is updated to integrate these new sorting strategies, dynamically choosing between single-threaded LimitedOrderBy, single-threaded PDQOrderBy, or a two-phase parallel PDQOrderBy and HeapOrderBy approach based on the number of sorting threads and data size. Review comments highlight a critical memory leak in HeapOrderBy due to incorrect unique_ptr usage for arrays, potential complexity and bugs in ranges_ vector population, inconsistent formatting in template specializations, significant code duplication in integer type handling within heaporderby.cpp, inefficient use of std::mutex for a counter that could be atomic, and performance concerns with perm.insert() in a loop.
| joblist::OrderByKeysType jobListorderByRGColumnIDs_; | ||
| rowgroup::RowGroup rg_; | ||
| std::unique_ptr<joblist::MemManager> mm_; | ||
| std::unique_ptr<uint8_t> keyBuf_; |
There was a problem hiding this comment.
This unique_ptr is managing a dynamically allocated array created with new[] in heaporderby.cpp. However, it is not using the array form of unique_ptr (std::unique_ptr<uint8_t[]>). This will result in delete being called instead of delete[], leading to a memory leak and undefined behavior. This is a critical issue that must be fixed.
| std::unique_ptr<uint8_t> keyBuf_; | |
| std::unique_ptr<uint8_t[]> keyBuf_; |
| ranges_.push_back({ImpossiblePermute}); | ||
| ranges_.back().reserve(rangeSize); | ||
| if (rangeSize) | ||
| { | ||
| ranges_.back().insert(ranges_.back().end(), srcPermBegin, srcPermEnd); | ||
| } |
There was a problem hiding this comment.
The logic for populating the ranges_ vector seems overly complex and potentially buggy. If rangeSize > 0, the PermutationVec will contain ImpossiblePermute at the beginning, followed by the actual permutations from srcPermBegin. This is likely not the intended behavior. A cleaner and more correct approach would be to create the vector with the correct elements from the start.
if (rangeSize)
{
ranges_.emplace_back(srcPermBegin, srcPermEnd);
}
else
{
ranges_.push_back({ImpossiblePermute});
}| // CHAR_1BYTE, | ||
| // CHAR_2BYTES, | ||
| // CHAR_4BYTES, | ||
| // CHAR_8BYTES, | ||
| // VARCHAR_1BYTE, | ||
| // VARCHAR_2BYTES, | ||
| // VARCHAR_4BYTES, | ||
| // VARCHAR_8BYTES, |
There was a problem hiding this comment.
These commented-out enum values seem to be for future work. Leaving commented-out code can be confusing for future maintenance. If these are not yet implemented, it would be better to remove them and add them back in a future PR when they are used. If they are intended as placeholders, a comment explaining their purpose would be helpful.
| struct ColDataTypeToIntegralType<SystemCatalog::TINYINT> | ||
| : _ColDataTypeToIntegralType<SystemCatalog::TINYINT, int8_t> | ||
| { | ||
| }; | ||
| template <> | ||
| struct ColDataTypeToIntegralType<SystemCatalog::SMALLINT> | ||
| : _ColDataTypeToIntegralType<SystemCatalog::SMALLINT, int16_t> | ||
| { | ||
| }; | ||
| template <> | ||
| struct ColDataTypeToIntegralType<SystemCatalog::MEDINT> | ||
| : _ColDataTypeToIntegralType<SystemCatalog::MEDINT, int32_t> | ||
| { | ||
| }; | ||
| template <> | ||
| struct ColDataTypeToIntegralType<SystemCatalog::INT> : _ColDataTypeToIntegralType<SystemCatalog::INT, int32_t> | ||
| { | ||
| }; |
There was a problem hiding this comment.
The formatting of the template specializations is inconsistent. In some cases, the base class is on a new line (e.g., line 585), while in others it's on the same line (e.g., line 599). Adopting a consistent style would improve readability. For example, placing the base class on a new line for all specializations.
| case execplan::CalpontSystemCatalog::TINYINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::UTINYINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::TINYINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x80; | ||
| *valPtr = (!isAsc) ? ~*valPtr : *valPtr; | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::SMALLINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::SMALLINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::SMALLINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x8000; | ||
| *valPtr = (!isAsc) ? ~htons(*valPtr) : htons(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::INT: | ||
| { | ||
| using StorageType = datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::INT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::INTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x80000000; | ||
| *valPtr = (!isAsc) ? ~htonl(*valPtr) : htonl(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::BIGINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::BIGINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::BIGINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x8000000000000000; | ||
| *valPtr = (!isAsc) ? ~htonll(*valPtr) : htonll(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::UTINYINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::UTINYINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::UTINYINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x80; | ||
| *valPtr = (!isAsc) ? ~*valPtr : *valPtr; | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
|
|
||
| case execplan::CalpontSystemCatalog::USMALLINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::USMALLINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::USMALLINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x8000; | ||
| *valPtr = (!isAsc) ? ~htons(*valPtr) : htons(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::UINT: | ||
| { | ||
| using StorageType = datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::UINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::UINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x80000000; | ||
| *valPtr = (!isAsc) ? ~htonl(*valPtr) : htonl(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } | ||
| case execplan::CalpontSystemCatalog::UBIGINT: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::UBIGINT>::type; | ||
| const uint8_t* valueBuf = rg.getColumnValueBuf(columnId, p.rowID); | ||
| const uint8_t* nullValuePtr = reinterpret_cast<const uint8_t*>(&joblist::UBIGINTNULL); | ||
| bool isNotNull = memcmp(nullValuePtr, valueBuf, columnWidth) != 0; | ||
| *pos++ = (!isAsc) ? static_cast<uint8_t>(!isNotNull) : static_cast<uint8_t>(isNotNull); | ||
| std::memcpy(pos, valueBuf, sizeof(StorageType)); | ||
| StorageType* valPtr = reinterpret_cast<StorageType*>(pos); | ||
| *valPtr ^= 0x8000000000000000; | ||
| *valPtr = (!isAsc) ? ~htonll(*valPtr) : htonll(*valPtr); | ||
| pos += columnWidth; | ||
| break; | ||
| } |
There was a problem hiding this comment.
There is significant code duplication in the switch statement for handling various integer types (TINYINT, SMALLINT, INT, BIGINT, and their unsigned counterparts). The logic for building the sort key for these types is nearly identical, differing only in type names, null values, and byte-swapping functions. This could be refactored into a template helper function to reduce duplication and improve maintainability.
| parallelOrderByMutex_.lock(); | ||
| ++fFinishedThreads; | ||
| if (fFinishedThreads == fMaxThreads) | ||
| { | ||
| // WIP Replace this vector with vector of RGDatas? | ||
| // INV: firstPhaseThreads ref scope is shorter than TAS scope. | ||
| const sorting::SortingThreads& firstPhaseThreads = firstPhaseflatOrderBys_; | ||
|
|
||
| size_t numberOfSortingWithData = std::accumulate(firstPhaseThreads.begin(), firstPhaseThreads.end(), 0, | ||
| [](size_t acc, const sorting::PDQSortingThread& sorting) | ||
| { return acc + !sorting->getRGDatas().empty(); }); | ||
| if (numberOfSortingWithData <= 1) | ||
| { | ||
| RGData rgDataOut; | ||
| rowgroup::RowGroup rowGroupOut{fRowGroupOut}; | ||
| for (const sorting::PDQSortingThread& sorting : firstPhaseThreads) | ||
| { | ||
| if (!sorting->getRGDatas().empty()) | ||
| { | ||
| while (sorting->getData(rgDataOut, firstPhaseflatOrderBys_) && !cancelled()) | ||
| { | ||
| rowGroupOut.setData(&rgDataOut); | ||
| auto rows = rowGroupOut.getRowCount(); | ||
| if (rows > 0) | ||
| { | ||
| fOutputDL->insert(rgDataOut); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| fOutputDL->endOfInput(); | ||
| } | ||
| else | ||
| { | ||
| auto perThreadRangesMatrix = calculatePivots4phase2(firstPhaseThreads); | ||
| assert(perThreadRangesMatrix.size() == fMaxThreads); | ||
| // push 1st of the 2nd phase threads as HeapOrderBy with the first range | ||
| fRunnersList.push_back(jobstepThreadPool.invoke( | ||
| [this, perThreadRangesMatrix, &firstPhaseThreads]() | ||
| { this->finalizeHeapOrderBy(1, perThreadRangesMatrix[0], firstPhaseThreads); })); | ||
| for (uint32_t i = 2; i <= fMaxThreads; ++i) | ||
| { | ||
| fRunnersList.push_back(jobstepThreadPool.invoke( | ||
| [this, i, perThreadRangesMatrix, &firstPhaseThreads]() | ||
| { this->finalizePDQOrderBy(i, perThreadRangesMatrix, firstPhaseThreads); })); | ||
| } | ||
| // Merge finalizer threads' rgDatas from separate outputDLs into a common outputDL | ||
| fRunnersList.push_back(jobstepThreadPool.invoke([this]() { this->joinOutputDLs(); })); | ||
| } | ||
| } | ||
| parallelOrderByMutex_.unlock(); |
There was a problem hiding this comment.
Using std::mutex to protect a simple counter (fFinishedThreads) is less efficient than using an atomic variable. The lock is held while calculatePivots4phase2 and other operations are performed, which could be a significant amount of time. Consider changing fFinishedThreads to a std::atomic<size_t> and using an atomic fetch-and-add operation to avoid the overhead of locking.
| // std::cout << threadName << " empty perm range" << std::endl; | ||
| continue; | ||
| } | ||
| perm.insert(perm.end(), srcPermBegin, srcPermEnd); |
There was a problem hiding this comment.
The perm.insert() call is inside a loop, which can lead to multiple reallocations and copies of the vector's contents, impacting performance. It would be more efficient to pre-calculate the total number of elements to be inserted and call perm.reserve() once before the loop to allocate the required memory upfront.
There was a problem hiding this comment.
Code Review
This pull request introduces a new two-phase parallel sorting mechanism using a modified quicksort (PDQOrderBy) and a heap-based merge (HeapOrderBy) for handling large datasets and parallel execution. The changes include adding new PDQOrderBy and HeapOrderBy classes, integrating them into the TupleAnnexStep for query execution, and updating JobInfo to support the new sorting types. The TupleAnnexStep now dynamically selects between LimitedOrderBy, PDQOrderBy (single-threaded), or a two-phase parallel PDQOrderBy and HeapOrderBy based on the limit and thread configuration. Several critical issues were identified in the new sorting logic, including incorrect type handling for string columns in HeapOrderBy and PDQOrderBy, potential out-of-bounds access in modified_pdqsort, uninitialized KeyType objects in HeapOrderBy, and a double function call in modified_pdqsort. Concurrency issues with fRowsReturned and fFinishedThreads were also noted, along with several WIP comments indicating incomplete functionality for string sorting, null handling, and LIMIT/OFFSET in the parallel sorting paths. Additionally, the test for PDQOrderBy was found to be incorrectly configured to test LimitedOrderBy instead.
| ); | ||
| rowgroup::RowGroup jobInfoRG(inRG); | ||
| joblist::TupleAnnexStep tns = joblist::TupleAnnexStep(jobInfo); | ||
| tns.addOrderBy(new joblist::LimitedOrderBy()); |
There was a problem hiding this comment.
The test file pqorderby-test.cpp is intended to test PDQOrderBy, but tns.addOrderBy(new joblist::LimitedOrderBy()); is being called, which initializes LimitedOrderBy. This means the test is not actually exercising the PDQOrderBy functionality as intended by the file name. This is a critical mismatch and needs to be corrected to test PDQOrderBy.
| } | ||
| else | ||
| mod_sort3(begin + s2, begin, end - 1, perm_begin + s2, perm_begin, perm_end - 1, comp); | ||
|
|
There was a problem hiding this comment.
| while (first < last && --perm_last != perm_last_2 && !comp(*--last, pivot)) | ||
| ; | ||
| else | ||
| while (--perm_last != perm_last_2 && !comp(*--last, pivot)) |
There was a problem hiding this comment.
The condition --perm_last != perm_last_2 allows perm_last to potentially go two elements past perm_end (in the reverse direction). This is an out-of-bounds access and will lead to undefined behavior. The loop condition should ensure perm_last stays within valid bounds, typically perm_last >= perm_begin.
| else if (!sorting::isDictColumn(columnType, columnWidth)) | ||
| { | ||
| switch (columnWidth) | ||
| { | ||
| case 1: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::TINYINT>::type; | ||
| value = rg.getColumnValue<execplan::CalpontSystemCatalog::VARCHAR, StorageType, EncodedKeyType>( | ||
| columnId, p.rowID); | ||
| *pos++ = (!isAsc) ? 0 : 1; | ||
| break; | ||
| } | ||
|
|
||
| case 2: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::SMALLINT>::type; | ||
| value = rg.getColumnValue<execplan::CalpontSystemCatalog::VARCHAR, StorageType, EncodedKeyType>( | ||
| columnId, p.rowID) | ||
| .rtrimZero(); | ||
| *pos++ = (!isAsc) ? 0 : 1; | ||
| break; | ||
| }; | ||
|
|
||
| case 4: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::INT>::type; | ||
| value = rg.getColumnValue<execplan::CalpontSystemCatalog::VARCHAR, StorageType, EncodedKeyType>( | ||
| columnId, p.rowID) | ||
| .rtrimZero(); | ||
| *pos++ = (!isAsc) ? 0 : 1; | ||
| break; | ||
| }; | ||
|
|
||
| case 8: | ||
| { | ||
| using StorageType = | ||
| datatypes::ColDataTypeToIntegralType<execplan::CalpontSystemCatalog::BIGINT>::type; | ||
| value = rg.getColumnValue<execplan::CalpontSystemCatalog::VARCHAR, StorageType, EncodedKeyType>( | ||
| columnId, p.rowID) | ||
| .rtrimZero(); | ||
| *pos++ = (!isAsc) ? 0 : 1; | ||
| break; | ||
| }; | ||
| default: idbassert(0); |
There was a problem hiding this comment.
This switch (columnWidth) block is within the VARCHAR, CHAR, TEXT case, but it attempts to use datatypes::ColDataTypeToIntegralType for these string types. This is fundamentally incorrect as VARCHAR, CHAR, and TEXT are string types, not integral types. This will lead to type mismatches, undefined behavior, or crashes. String types should be handled using string comparison functions, potentially with collation, not by reinterpreting their bytes as integers.
| // Find the first element strictly smaller than the pivot. We have to guard this search if | ||
| // there was no element before *first. | ||
| if (first - 1 == begin) | ||
| while (first < last && --perm_last != perm_last_2 && !comp(*--last, pivot)) |
There was a problem hiding this comment.
The condition --perm_last != perm_last_2 allows perm_last to potentially go two elements past perm_end (in the reverse direction). This is an out-of-bounds access and will lead to undefined behavior. The loop condition should ensure perm_last stays within valid bounds, typically perm_last >= perm_begin.
| size_t offset2stringStoreOffset = | ||
| RowGroup::getHeaderSize() + rowID * getRowSize() + getOffsetArray()[columnID]; | ||
| // bool isNull = strings->isNullValue(offset2stringStoreOffset); | ||
| // if (isNull) | ||
| // return nullValue; |
There was a problem hiding this comment.
The commented-out code for null value handling (// bool isNull = strings->isNullValue(offset2stringStoreOffset); // if (isNull) // return nullValue;) suggests that null values might not be correctly handled for IsVariadicType. If null strings are possible, this logic should be re-enabled or an alternative null handling mechanism should be in place to prevent incorrect behavior.
| // bool isNull = strings->isNullValue(offset2stringStoreOffset); | ||
| // if (isNull) | ||
| // return nullValue; | ||
| // check the out of bounds invariant somehow |
| size_t offset2stringStoreOffset = | ||
| RowGroup::getHeaderSize() + rowID * getRowSize() + getOffsets()[columnID]; | ||
| // bool isNull = strings->isNullValue(offset2stringStoreOffset); | ||
| // if (isNull) | ||
| // return nullValue; |
There was a problem hiding this comment.
The commented-out code for null value handling (// bool isNull = strings->isNullValue(offset2stringStoreOffset); // if (isNull) // return nullValue;) suggests that null values might not be correctly handled for IsShortString. If null strings are possible, this logic should be re-enabled or an alternative null handling mechanism should be in place to prevent incorrect behavior.
| // bool isNull = strings->isNullValue(offset2stringStoreOffset); | ||
| // if (isNull) | ||
| // return nullValue; | ||
| // check the out of bounds invariant somehow |
| assert(data); | ||
| size_t valueOffset = RowGroup::getHeaderSize() + getOffsets()[columnID] + rowID * getRowSize(); | ||
| // check the out of bounds invariant somehow | ||
| const char* valuePtr = reinterpret_cast<const char*>(&data[valueOffset]); |
No description provided.