Adding `ChunkBatch` interface #181

nirandaperera · 2025-04-08T07:40:31Z

This PR adds the ChunkBatch interface.

It serializes each chunk data into a metadata buffer and a payload buffer. Additional information such as chunk batch ID, etc are also injected at the front of the metadata buffer.

Metadata buffer format:

| BatchHeader | [[MetadataMessageHeader, Metadata], ...] |

Payload buffer format:

| [[Data, ...] |

waiting on #178

Closes #175

…nto batch-input-partitions

nirandaperera · 2025-04-08T08:48:56Z

@madsbk @wence- this has the ChunkBatch outline. I will add the tests later today.

cpp/include/rapidsmp/buffer/buffer.hpp

wence- · 2025-04-08T09:27:43Z

cpp/include/rapidsmp/buffer/buffer.hpp

+     * @throws std::logic_error if copy violates the bounds of the destination buffer.
+     */
+    [[nodiscard]] size_t copy_to(
+        Buffer& dest, size_t offset, rmm::cuda_stream_view stream


This seems like we want to introduce a BufferView object that encapsulates the Buffer, offset pair.

Yes. @madsbk and I discussed this and decided to punt it for later.

cpp/include/rapidsmp/shuffler/chunk.hpp

wence- · 2025-04-08T09:40:58Z

cpp/include/rapidsmp/shuffler/chunk_batch.hpp

+     *      size_t metadata_offset,
+     *      Buffer const& payload_buf,
+     *      size_t payload_offset)
+     * @param visitor visitor function


question/thought: This looks like what you want is an iterator over the chunkbatch, so you can do something like:

for (Chunk c = batch.begin(); c != c.end(); c++) { ... }

To process chunks.

I think it should be possible to write such an iterator, and maybe it is more generic.

Is this what #202 is doing?

@pentschev yes, I added the iterator after @wence- 's comment

wence- · 2025-04-08T09:52:25Z

cpp/include/rapidsmp/shuffler/chunk_batch.hpp

+    /// chunk.
+    /// |BatchHeader|[[MetadataMessageHeader, Metadata], ...]|
+    ///
+    /// TODO: change the format to have thhe MetadataMessageHeaders at the front (after


Suggested change

/// TODO: change the format to have thhe MetadataMessageHeaders at the front (after

/// TODO: change the format to have the MetadataMessageHeaders at the front (after

Still missing, also switch to /* ... */ to keep style.

Still missing.

wence- · 2025-04-08T09:53:55Z

cpp/src/buffer/buffer.cpp

+    case MemoryType::DEVICE:
+        return std::make_unique<Buffer>(Buffer{
+            std::make_unique<rmm::device_buffer>(
+                static_cast<cuda::std::byte const*>(device()->data()) + offset,


Nit: can just use std::byte here. Also, either way, IWYU.

It was suggested to use cuda::std::byte in this slack thread.

wence- · 2025-04-08T09:54:49Z

cpp/src/shuffler/chunk.cpp

+    // We need at least (sizeof(MetadataMessageHeader) + metadata_size) amount of space
+    // from the offset
+    msg.resize(offset + sizeof(MetadataMessageHeader) + metadata_size);


question: At least or exactly?

Also, this looks like a performance antipattern because if we're doing this to serialise a batch of chunks, we'll have quadratic reallocation behaviour.

I thought about it too. If the msg.size() is already higher than the resize count, it should return immediately with no-op, isnt it?
For batching, this is already preallocated. So, it should not do any reallocations within resize.

No

Complexity
Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count.

Yeah. My bad! 🙁

I removed the resize now

wence- · 2025-04-08T09:57:29Z

cpp/src/shuffler/chunk_batch.cpp

+    size_t batch_metadata_size =
+        batch_header_size + chunk_metadata_header_size * chunks.size();
+    size_t batch_payload_size = 0;
+    MemoryType mem_type = chunks[0].gpu_data->mem_type;


nit: UB if there are no chunks.

Thanks @wence- . Good catch

wence- · 2025-04-08T09:58:45Z

cpp/src/shuffler/chunk_batch.cpp

+    size_t payload_offset = 0;
+    for (auto&& chunk : chunks) {
+        // copy metadata
+        metadata_offset += chunk.to_metadata_message(*metadata_buffer, metadata_offset);


On line 44 we made the buffer the right size for all the chunks. But here we resize it every time to the "current" size. Which means we will (potentially) reallocate many times.

It seems like we want to define a new metadata format for a batched chunk that would also allow random access into the serialised wire format. Right now it seems like I must iterator through all the metadata to pick out a given chunk.

What is the information we need to send and how are we currently packing, and what would be the best way to pack?

I think, for random access. we would need a prefix sum array for both metadata and gpu data. Then, we should be able to stride to the corresponding location. This could be a part of the BatchHeader.

…nto batch-input-partitions

cpp/include/rapidsmp/shuffler/chunk.hpp

…nto batch-input-partitions

pentschev

I haven't been able to review all of it yet and will complete tomorrow. Left a few comments for now.

cpp/include/rapidsmp/buffer/buffer.hpp

cpp/include/rapidsmp/shuffler/chunk.hpp

cpp/src/buffer/buffer.cpp

pentschev

Left some more comments. However, I'm not really sure what is ChunkBatch planned to be used for. I think you have discussed this with Mads and Lawrence previously who may know the context already. Could you write down in the description of the issue what is the planned goal for ChunkBatch? A good way to deal with is to write a proper description in issues instead of just the title like #175.

Naively, I think the purpose here is just packing more data in a single-message, but I don't know if that's correct, if it is, have you considered things like the increased memory pressure this will cause since now there will be, at least temporarily, multiple copies of data in the device/host?

pentschev · 2025-04-15T10:02:07Z

cpp/src/shuffler/chunk_batch.cpp

+            "unable to reserve gpu memory for batch"
+        );
+        payload_data = br->allocate(*mem_type, batch_payload_size, stream, reservation);
+        RAPIDSMP_EXPECTS(reservation.size() == 0, "didn't use all of the reservation");


When is this supposed to happen?

I think this could happen during a corruption.
https://github.com/rapidsai/rapidsmpf/blob/branch-25.06/cpp/src/buffer/resource.cpp#L59-L78

cpp/src/shuffler/chunk_batch.cpp

pentschev · 2025-04-15T10:19:59Z

cpp/src/shuffler/chunk_batch.cpp

+    // visit chunk data and verify if the given buffers adhere to the format
+    size_t visited_metadata_size = batch_header_size;
+    size_t visited_payload_size = 0;
+    batch.visit_chunk_data([&](Chunk::MetadataMessageHeader const* chunk_header,
+                               auto const& /* metadata_buf */,
+                               auto /* metadata_offset */,
+                               auto const& /* payload_buf */,
+                               auto /* payload_offset */) {
+        visited_metadata_size +=
+            (chunk_metadata_header_size + chunk_header->metadata_size);
+        visited_payload_size += chunk_header->gpu_data_size;
+    });
+    RAPIDSMP_EXPECTS(
+        visited_metadata_size == batch.metadata_buffer_->size(),
+        "visited metadata size doesn't match the metadata buffer size"
+    );
+    if (batch.payload_data_) {
+        RAPIDSMP_EXPECTS(
+            visited_payload_size == batch.payload_data_->size,
+            "visited payload size doesn't match the payload buffer size"
+        );
+    }


Why is this needed, IOW when would they not adhere?

This is to ensure that the given 2 buffers adhere to the format. For an example, we can give a valid metadata buffer, but the payload data buffer is empty (or zeros). The reason here is, both buffers are tied together. So, it needs to be validated.

cpp/tests/test_chunk_batch.cu

nirandaperera · 2025-04-15T16:37:10Z

@pentschev Thank you for the review. It was my bad, I should have added a detailed explanation under #175 issue. Let me add some more information.

…nto batch-input-partitions

pentschev · 2025-04-22T10:12:56Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+    /**
+     * @brief The size of the chunk metadata header in bytes.
+     */
+    static constexpr std::ptrdiff_t chunk_metadata_header_size =
+        sizeof(Chunk::MetadataMessageHeader);


Suggested change

/**

* @brief The size of the chunk metadata header in bytes.

*/

static constexpr std::ptrdiff_t chunk_metadata_header_size =

sizeof(Chunk::MetadataMessageHeader);

static constexpr std::ptrdiff_t chunk_metadata_header_size =

sizeof(Chunk::MetadataMessageHeader); ///< The size of the chunk metadata header in bytes.

Actually, I think this is just an attribute and not a method, instead of @brief we probably want to treat it as other attributes with ///<.

pentschev · 2025-04-22T10:13:47Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+    /** @brief The size of the batch header in bytes. */
+    static constexpr std::ptrdiff_t batch_header_size = sizeof(BatchHeader);


Suggested change

/** @brief The size of the batch header in bytes. */

static constexpr std::ptrdiff_t batch_header_size = sizeof(BatchHeader);

static constexpr std::ptrdiff_t batch_header_size = sizeof(BatchHeader); ///< The size of the batch header in bytes.

pentschev · 2025-04-22T11:15:03Z

cpp/include/rapidsmp/shuffler/chunk_batch.hpp

+    /// traversal pattern.
+    std::unique_ptr<std::vector<uint8_t>> metadata_buffer_;
+
+    /// GPU data buffer of the packed `cudf::table` associated with this chunk.


Still missing.

pentschev · 2025-04-22T11:15:03Z

cpp/include/rapidsmp/shuffler/chunk_batch.hpp

+    /// chunk.
+    /// |BatchHeader|[[MetadataMessageHeader, Metadata], ...]|
+    ///
+    /// TODO: change the format to have thhe MetadataMessageHeaders at the front (after


Still missing.

pentschev · 2025-04-22T11:42:56Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+        assert(metadata_buffer_);
+        assert(metadata_buffer_->size() >= batch_header_size);
+
+        std::ptrdiff_t metadata_offset = batch_header_size;
+        std::ptrdiff_t payload_offset = 0;
+
+        for (size_t i = 0; i < header()->num_chunks; ++i) {
+            assert(
+                std::ptrdiff_t(metadata_buffer_->size())
+                >= metadata_offset + chunk_metadata_header_size
+            );
+
+            auto const* chunk_header =
+                reinterpret_cast<Chunk::MetadataMessageHeader const*>(
+                    metadata_buffer_->data() + metadata_offset
+                );
+            metadata_offset += chunk_metadata_header_size;
+
+            assert(
+                metadata_buffer_->size()
+                >= size_t(metadata_offset) + chunk_header->metadata_size
+            );
+
+            if (chunk_header->gpu_data_size > 0) {
+                assert(payload_data_);
+                assert(
+                    payload_data_->size
+                    >= size_t(payload_offset) + chunk_header->gpu_data_size
+                );
+            }


Should we use RAPIDSMPF_EXPECTS instead of asserts here?

I was consciously added asserts here because I felt like we can ommit these in the release build.

pentschev · 2025-04-22T11:46:52Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+     * @brief Postfix increment of the iterator.
+     * @return Copy of the iterator before increment


Suggested change

* @brief Postfix increment of the iterator.

* @return Copy of the iterator before increment

* @brief Postfix increment of the iterator.

*

* @return Copy of the iterator before increment

pentschev · 2025-04-22T11:47:11Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+     * @brief Equality comparison of iterators.
+     * @param other The other iterator to compare with


Suggested change

* @brief Equality comparison of iterators.

* @param other The other iterator to compare with

* @brief Equality comparison of iterators.

*

* @param other The other iterator to compare with

pentschev · 2025-04-22T11:47:57Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+    /**
+     * @brief Inequality comparison of iterators.
+     * @param other The other iterator to compare with
+     * @return true if the iterators are not equal, false otherwise
+     */
+    bool operator!=(ChunkForwardIterator const& other) const;
+
+    /**
+     * @brief Copy constructor.
+     * @param other The other iterator to copy from.
+     */
+    ChunkForwardIterator(const ChunkForwardIterator& other) = default;
+
+    /**
+     * @brief Copy assignment operator.
+     * @param other The other iterator to copy from.
+     * @return Reference to the assigned iterator.
+     */
+    ChunkForwardIterator& operator=(const ChunkForwardIterator& other) = default;


Suggested change

/**

* @brief Inequality comparison of iterators.

* @param other The other iterator to compare with

* @return true if the iterators are not equal, false otherwise

*/

bool operator!=(ChunkForwardIterator const& other) const;

/**

* @brief Copy constructor.

* @param other The other iterator to copy from.

*/

ChunkForwardIterator(const ChunkForwardIterator& other) = default;

/**

* @brief Copy assignment operator.

* @param other The other iterator to copy from.

* @return Reference to the assigned iterator.

*/

ChunkForwardIterator& operator=(const ChunkForwardIterator& other) = default;

/**

* @brief Inequality comparison of iterators.

*

* @param other The other iterator to compare with

* @return true if the iterators are not equal, false otherwise

*/

bool operator!=(ChunkForwardIterator const& other) const;

/**

* @brief Copy constructor.

*

* @param other The other iterator to copy from.

*/

ChunkForwardIterator(const ChunkForwardIterator& other) = default;

/**

* @brief Copy assignment operator.

*

* @param other The other iterator to copy from.

* @return Reference to the assigned iterator.

*/

ChunkForwardIterator& operator=(const ChunkForwardIterator& other) = default;

pentschev · 2025-04-22T11:48:33Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+     * @brief Unwrap the current chunk header.
+     * @return Chunk header ptr.
+     */
+    inline Chunk::MetadataMessageHeader const* chunk_header() const {
+        return reinterpret_cast<Chunk::MetadataMessageHeader const*>(
+            batch_.metadata_buffer_->data() + metadata_offset_
+        );
+    }
+
+    /**
+     * @brief Check if current position contains a chunks.
+     * @return True, if metadata offset points to a valid chunk.
+     */
+    inline bool has_chunk() const {
+        return batch_.size() > 0
+               && (size_t(metadata_offset_) < batch_.metadata_buffer_->size());
+    }
+
+    /**
+     * @brief Make a chunk.
+     * @return Chunk wrapped in a shared ptr
+     */


Suggested change

* @brief Unwrap the current chunk header.

* @return Chunk header ptr.

*/

inline Chunk::MetadataMessageHeader const* chunk_header() const {

return reinterpret_cast<Chunk::MetadataMessageHeader const*>(

batch_.metadata_buffer_->data() + metadata_offset_

);

}

/**

* @brief Check if current position contains a chunks.

* @return True, if metadata offset points to a valid chunk.

*/

inline bool has_chunk() const {

return batch_.size() > 0

&& (size_t(metadata_offset_) < batch_.metadata_buffer_->size());

}

/**

* @brief Make a chunk.

* @return Chunk wrapped in a shared ptr

*/

* @brief Unwrap the current chunk header.

*

* @return Chunk header ptr.

*/

inline Chunk::MetadataMessageHeader const* chunk_header() const {

return reinterpret_cast<Chunk::MetadataMessageHeader const*>(

batch_.metadata_buffer_->data() + metadata_offset_

);

}

/**

* @brief Check if current position contains a chunks.

*

* @return True, if metadata offset points to a valid chunk.

*/

inline bool has_chunk() const {

return batch_.size() > 0

&& (size_t(metadata_offset_) < batch_.metadata_buffer_->size());

}

/**

* @brief Make a chunk.

*

* @return Chunk wrapped in a shared ptr

*/

Additionally, move implementations to .cpp file?

pentschev · 2025-04-22T11:51:31Z

cpp/tests/test_chunk_batch.cu

+        // std::vector<Chunk> const chunks = batch.get_chunks(stream);
+        // EXPECT_EQ(exp_chunks.size(), chunks.size());
+


Suggested change

// std::vector<Chunk> const chunks = batch.get_chunks(stream);

// EXPECT_EQ(exp_chunks.size(), chunks.size());

I think get_chunks was removed, and thus we could remove those lines.

nirandaperera · 2025-04-28T22:46:04Z

@wence- @pentschev @madsbk CI seems to have some trouble with this PR. I think it is due to repo name change.
Please refer to this updated PR #231

wence- · 2025-04-29T14:46:51Z

Closing in favour of #231.

nirandaperera added 6 commits March 31, 2025 07:20

wip

8b655ac

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

73a76ad

…nto batch-input-partitions

wip

bd2093d

wip

25f6117

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

5e5fbbf

…nto batch-input-partitions

chunk batch definition

a8ee7ba

nirandaperera requested a review from a team as a code owner April 8, 2025 07:40

nirandaperera marked this pull request as draft April 8, 2025 07:40

nirandaperera requested review from madsbk and wence- April 8, 2025 07:54

running precommit

a05df4e

nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 8, 2025

adding chunk visitor

960bf2e

wence- reviewed Apr 8, 2025

View reviewed changes

nirandaperera added 5 commits April 8, 2025 10:16

more comments

3080936

addressign comments

0f0468a

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

3c7695e

…nto batch-input-partitions

fix build

7f11cb2

adding test

62062ed

nirandaperera requested a review from wence- April 9, 2025 16:42

nirandaperera marked this pull request as ready for review April 9, 2025 16:42

nirandaperera added 3 commits April 9, 2025 10:11

adding more comments

1dfeda1

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

a90bf30

…nto batch-input-partitions

more tests

e0e6449

madsbk reviewed Apr 10, 2025

View reviewed changes

cpp/include/rapidsmp/shuffler/chunk.hpp Outdated Show resolved Hide resolved

nirandaperera added 2 commits April 10, 2025 15:51

using thrust equal

4a871c3

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

0507d05

…nto batch-input-partitions

nirandaperera requested a review from a team as a code owner April 10, 2025 22:52

nirandaperera added 2 commits April 10, 2025 16:07

precommit

708c67c

adding fwd iterator

9efc2e8

pentschev mentioned this pull request Apr 14, 2025

Adding a ChunkBatch iterator #202

Closed

pentschev reviewed Apr 14, 2025

View reviewed changes

pentschev reviewed Apr 15, 2025

View reviewed changes

nirandaperera added 6 commits April 15, 2025 13:16

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

95f84ad

…nto batch-input-partitions

merge conflicts

057917d

running precommit

1324063

special case for batches with 1 payload

32f1199

addressing comments

9f236af

Merge branch 'branch-25.06' into batch-input-partitions

29deb42

pentschev reviewed Apr 22, 2025

View reviewed changes

wence- closed this Apr 29, 2025

	/// TODO: change the format to have thhe MetadataMessageHeaders at the front (after
	/// TODO: change the format to have the MetadataMessageHeaders at the front (after

		/** @brief The size of the batch header in bytes. */
		static constexpr std::ptrdiff_t batch_header_size = sizeof(BatchHeader);

		* @brief Postfix increment of the iterator.
		* @return Copy of the iterator before increment

		* @brief Equality comparison of iterators.
		* @param other The other iterator to compare with

		// std::vector<Chunk> const chunks = batch.get_chunks(stream);
		// EXPECT_EQ(exp_chunks.size(), chunks.size());

Adding ChunkBatch interface #181

Adding ChunkBatch interface #181

Uh oh!

Conversation

nirandaperera commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirandaperera commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Adding `ChunkBatch` interface #181

Adding `ChunkBatch` interface #181

nirandaperera commented Apr 8, 2025 •

edited

Loading

nirandaperera commented Apr 8, 2025 •

edited

Loading