-
Notifications
You must be signed in to change notification settings - Fork 155
[18343808662] MemBlock refactor #2792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3c5044d to
de733b8
Compare
| pos_ += type_size_; | ||
|
|
||
| if (pos_ >= block_->bytes()) { | ||
| if (pos_ >= block_->logical_bytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting a tiny behavior change. The Column::Iterator was previously broken for blocks with extra_bytes>0
fd7c6a3 to
7822686
Compare
7822686 to
21acd7d
Compare
This is only a refactor to prepare for parallel bool unpacking and should not change existing behavior. It splits up the `MemBlock` into two separate classes inheriting a common interface `IMemBlock`: - `DynamicMemBlock` is responsible for the `AllocationType::DYNAMIC` and has a capacity which can gradually be filled up with resizes. This is the type of memory blocks we use for in memory operations with `ChunkedBuffer`s e.g. when tick streaming or when performing appends etc. - `ExternalMemBlock` is responsible for the `AllocationType::DETACHABLE` and for external pointers to views. It just holds a pointer to external memory which can be owning or not. This is what we use when either when returning Segments to python or when receiving views from python This change also adds the scaffolding needed for the packed bits buffer via `ExternalPackedMemBlock`. It will be used in a follow up PR to parallelise bool unpacking.
21acd7d to
1753b0d
Compare
| auto input_block = input_blocks.at(idx); | ||
| auto source_pos = idx == start_idx ? start_block_and_offset.offset_ : 0u; | ||
| auto source_bytes = std::min(remaining_bytes, input_block->bytes() - source_pos); | ||
| auto source_bytes = std::min(remaining_bytes, input_block->logical_bytes() - source_pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this would be correct for an ExternalMemBlock? I think this will only be called at the moment with DYNAMIC mem blocks, so maybe use physical_bytes() here and add a util::check on entry to the function that the input mem blocks are dynamic?
|
|
||
| MemBlockType ExternalPackedMemBlock::get_type() const { return MemBlockType::EXTERNAL_PACKED; } | ||
|
|
||
| size_t ExternalPackedMemBlock::logical_bytes() const { return logical_size_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing, as it is actually the number of logical bits in this case
| virtual void resize(size_t bytes) = 0; | ||
| virtual void check_magic() const = 0; | ||
| // External block specific methods | ||
| [[nodiscard]] virtual uint8_t* release() = 0; | ||
| virtual void abandon() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a code smell to have interface methods for specific subclasses. The alternative is to have callers of these methods check the return value of get_type and do a reinterpret_cast as appropriate though, which also feels clunky
| (*output.blocks_.rbegin())->copy_from(block->data(), block->bytes(), 0); | ||
| (*output.blocks_.rbegin())->resize(block->bytes()); | ||
| output.add_block(block->capacity(), block->offset()); | ||
| (*output.blocks_.rbegin())->resize(block->physical_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means clone() is never called with anything other than dynamic memory at the moment?
We could make it work for external memory, although the semantics of an owning external buffer mean that the clone wouldn't own it, and so wouldn't be identical to the source. I think just raising if someone tries to use clone() with external memory is fine for now, and if we need it in the future we can add a view() method or similar.
| (*output.blocks_.rbegin())->resize(block->bytes()); | ||
| output.add_block(block->capacity(), block->offset()); | ||
| (*output.blocks_.rbegin())->resize(block->physical_bytes()); | ||
| (*output.blocks_.rbegin())->copy_from(block->data(), block->physical_bytes(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (*output.blocks_.rbegin())->copy_from(block->data(), block->physical_bytes(), 0); | |
| output.blocks_.back().copy_from(block->data(), block->physical_bytes(), 0); |
|
Worth running all the Arrow tests and a smokescreen of non-Arrow tests with valgrind to make sure we haven't introduced any leaks. |
Reference Issues/PRs
Monday ref: 18343808662
What does this implement or fix?
This is only a refactor to prepare for parallel bool unpacking and should not change existing behavior.
It splits up the
MemBlockinto two separate classes inheriting a common interfaceIMemBlock:DynamicMemBlockis responsible for theAllocationType::DYNAMICand has a capacity which can gradually be filled up with resizes. This is the type of memory blocks we use for in memory operations withChunkedBuffers e.g. when tick streaming or when performing appends etc.ExternalMemBlockis responsible for theAllocationType::DETACHABLEand for external pointers to views. It just holds a pointer to external memory which can be owning or not. This is what we use when either when returning Segments to python or when receiving views from pythonThis change also adds the scaffolding needed for the packed bits buffer via
ExternalPackedMemBlock. It will be used in a follow up PR to parallelise bool unpacking.Any other comments?
Checklist
Checklist for code changes...