Skip to content

Conversation

@IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Dec 3, 2025

Reference Issues/PRs

Monday ref: 18343808662

What does this implement or fix?

This is only a refactor to prepare for parallel bool unpacking and should not change existing behavior.

It splits up the MemBlock into two separate classes inheriting a common interface IMemBlock:

  • DynamicMemBlock is responsible for the AllocationType::DYNAMIC and has a capacity which can gradually be filled up with resizes. This is the type of memory blocks we use for in memory operations with ChunkedBuffers e.g. when tick streaming or when performing appends etc.
  • ExternalMemBlock is responsible for the AllocationType::DETACHABLE and for external pointers to views. It just holds a pointer to external memory which can be owning or not. This is what we use when either when returning Segments to python or when receiving views from python

This change also adds the scaffolding needed for the packed bits buffer via ExternalPackedMemBlock. It will be used in a follow up PR to parallelise bool unpacking.

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@IvoDD IvoDD added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 3, 2025
@IvoDD IvoDD changed the title MemBlock refactor [18343808662] MemBlock refactor Dec 3, 2025
@IvoDD IvoDD force-pushed the mem-block-refactor branch 6 times, most recently from 3c5044d to de733b8 Compare December 4, 2025 11:13
pos_ += type_size_;

if (pos_ >= block_->bytes()) {
if (pos_ >= block_->logical_bytes()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting a tiny behavior change. The Column::Iterator was previously broken for blocks with extra_bytes>0

@IvoDD IvoDD force-pushed the mem-block-refactor branch 2 times, most recently from fd7c6a3 to 7822686 Compare December 4, 2025 13:15
@IvoDD IvoDD marked this pull request as ready for review December 4, 2025 13:16
@IvoDD IvoDD force-pushed the mem-block-refactor branch from 7822686 to 21acd7d Compare December 4, 2025 14:24
This is only a refactor to prepare for parallel bool unpacking and
should not change existing behavior.

It splits up the `MemBlock` into two separate classes inheriting a common
interface `IMemBlock`:
- `DynamicMemBlock` is responsible for the `AllocationType::DYNAMIC` and
  has a capacity which can gradually be filled up with resizes. This is
the type of memory blocks we use for in memory operations with
`ChunkedBuffer`s e.g. when tick streaming or when performing appends
etc.
- `ExternalMemBlock` is responsible for the `AllocationType::DETACHABLE`
  and for external pointers to views. It just holds a pointer to
external memory which can be owning or not. This is what we use when
either when returning Segments to python or when receiving views from
python

This change also adds the scaffolding needed for the packed bits buffer
via `ExternalPackedMemBlock`. It will be used in a follow up PR to
parallelise bool unpacking.
@IvoDD IvoDD force-pushed the mem-block-refactor branch from 21acd7d to 1753b0d Compare December 8, 2025 08:08
auto input_block = input_blocks.at(idx);
auto source_pos = idx == start_idx ? start_block_and_offset.offset_ : 0u;
auto source_bytes = std::min(remaining_bytes, input_block->bytes() - source_pos);
auto source_bytes = std::min(remaining_bytes, input_block->logical_bytes() - source_pos);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would be correct for an ExternalMemBlock? I think this will only be called at the moment with DYNAMIC mem blocks, so maybe use physical_bytes() here and add a util::check on entry to the function that the input mem blocks are dynamic?


MemBlockType ExternalPackedMemBlock::get_type() const { return MemBlockType::EXTERNAL_PACKED; }

size_t ExternalPackedMemBlock::logical_bytes() const { return logical_size_; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, as it is actually the number of logical bits in this case

Comment on lines +35 to +39
virtual void resize(size_t bytes) = 0;
virtual void check_magic() const = 0;
// External block specific methods
[[nodiscard]] virtual uint8_t* release() = 0;
virtual void abandon() = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a code smell to have interface methods for specific subclasses. The alternative is to have callers of these methods check the return value of get_type and do a reinterpret_cast as appropriate though, which also feels clunky

(*output.blocks_.rbegin())->copy_from(block->data(), block->bytes(), 0);
(*output.blocks_.rbegin())->resize(block->bytes());
output.add_block(block->capacity(), block->offset());
(*output.blocks_.rbegin())->resize(block->physical_bytes());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means clone() is never called with anything other than dynamic memory at the moment?
We could make it work for external memory, although the semantics of an owning external buffer mean that the clone wouldn't own it, and so wouldn't be identical to the source. I think just raising if someone tries to use clone() with external memory is fine for now, and if we need it in the future we can add a view() method or similar.

(*output.blocks_.rbegin())->resize(block->bytes());
output.add_block(block->capacity(), block->offset());
(*output.blocks_.rbegin())->resize(block->physical_bytes());
(*output.blocks_.rbegin())->copy_from(block->data(), block->physical_bytes(), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(*output.blocks_.rbegin())->copy_from(block->data(), block->physical_bytes(), 0);
output.blocks_.back().copy_from(block->data(), block->physical_bytes(), 0);

@alexowens90
Copy link
Collaborator

Worth running all the Arrow tests and a smokescreen of non-Arrow tests with valgrind to make sure we haven't introduced any leaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants