Add `ChunkBatch` interface to aggregate small messages before sending #231

nirandaperera · 2025-04-28T22:29:34Z

This PR adds the ChunkBatch interface.

It serializes each chunk data into a metadata buffer and a payload buffer. Additional information such as chunk batch ID, etc are also injected at the front of the metadata buffer.

Metadata buffer format:

| BatchHeader | [[MetadataMessageHeader, Metadata], ...] |

Payload buffer format:

| [[Data, ...] |

Closes #175

…nto batch-input-partitions

…ch-input-partitions

copy-pr-bot · 2025-04-28T22:29:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

nirandaperera · 2025-04-28T22:41:45Z

/ok to test

copy-pr-bot · 2025-04-28T22:41:48Z

/ok to test

@nirandaperera, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

nirandaperera · 2025-04-28T22:42:48Z

/ok to test bc2895a

Signed-off-by: niranda perera <[email protected]>

nirandaperera · 2025-04-28T22:48:15Z

/ok to test f9906cd

wence-

Partial review

wence- · 2025-04-29T14:33:06Z

cpp/include/rapidsmpf/buffer/buffer.hpp

+    /**
+     * @brief Copy the buffer to a destination buffer with a given offset.
+     *
+     * @param dest Destination buffer.
+     * @param offset Offset of the destination buffer.
+     * @param stream CUDA stream to use for the copy.
+     * @returns Number of bytes written to the destination buffer.
+     *
+     * @throws std::logic_error if copy violates the bounds of the destination buffer.
+     */
+    [[nodiscard]] std::ptrdiff_t copy_to(
+        Buffer& dest, std::ptrdiff_t offset, rmm::cuda_stream_view stream
+    ) const;


Are these slice-based interfaces the best approach, or should we accept span-like things. That is, we could have a slice method that returns a Span over a Buffer and then copy_to could accept a Span &dest, this would push validation of offsets to the Span construction. etc...

wence- · 2025-04-29T14:34:52Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+     * @returns The number of bytes written to the message buffer.
+     */
+    [[nodiscard]] std::ptrdiff_t to_metadata_message(
+        std::vector<uint8_t>& msg, std::ptrdiff_t offset


Here if we accepted a cuda::std::span (C++20 but backported to C++14) over the message we wouldn't need offset argument separately, I think.

wence- · 2025-04-29T14:38:41Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+                reinterpret_cast<Chunk::MetadataMessageHeader const*>(
+                    metadata_buffer_->data() + metadata_offset
+                );


question: I think accessing the metadata through this pointer might be undefined behaviour if metadata_offset % alignof(Chunk::MetadataMessageHeader) is not zero. Can you assuage my doubts?

wence- · 2025-04-29T14:44:42Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+     * @param visitor visitor function
+     */
+    template <typename VisitorFn>
+    void visit_chunk_data(VisitorFn visitor) const {


I think I don't really like this interface and would prefer to only have the iterator-like access. WDYT?

wence- · 2025-04-29T14:45:10Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+/**
+ * @brief Forward iterator of chunks in the chunk batch.
+ */
+class ChunkForwardIterator {


I think I would prefer to implement operator()[] and .at on the ChunkBatch class so that we could just have a RandomAccessIterator. I think this might require a few changes in the metadata we store in a ChunkBatch (basically we need to keep the exclusive scan of the offsets around).

wence- · 2025-04-29T14:46:07Z

cpp/src/buffer/buffer.cpp

+    );
+
+    if (mem_type() == target) {
+        return copy_slice(offset, length, stream);


nit: This is why I don't really like these "C-like" APIs. It's very easy to switch up offset and length since they are the same type.

madsbk

Consider implementing the Buffer slicing (or operator()[]/.at support) in a standalone PR with its own testing?

madsbk · 2025-04-29T14:44:04Z

cpp/include/rapidsmpf/shuffler/chunk_batch.hpp

+
+    /**
+     * @brief The structure of the batch header.
+     * @note This is allocated at the front of the the metadata buffer.


Suggested change

* @note This is allocated at the front of the the metadata buffer.

* @note This is allocated at the front of the metadata buffer.

madsbk · 2025-04-29T14:48:57Z

cpp/src/shuffler/chunk_batch.cpp

+    // visit chunk data and verify if the given buffers adhere to the format
+    size_t visited_metadata_size = batch_header_size;
+    size_t visited_payload_size = 0;
+    batch.visit_chunk_data([&](Chunk::MetadataMessageHeader const* chunk_header,


Could we use ChunkForwardIterator here?

wence- · 2025-04-29T14:53:42Z

cpp/src/shuffler/postbox.cpp

+        // reserve for the chunks size
+        ret.reserve(ret.size() + chunks.size());


This is quadratic since you're not growing geometrically so you have to do num_pigeonhole_entries * num_chunks_per_pigeonhole allocations. Either compute the total size up-front in a separate pass, or else don't both at all and just rely on the stdlib geometric allocation growth.

wence- · 2025-04-29T14:56:31Z

cpp/tests/test_chunk_batch.cu

+ *    b. Host data
+ */
+
+// Parametarized test for MemoryType and types of chunks


Suggested change

// Parametarized test for MemoryType and types of chunks

// Parametrized test for MemoryType and types of chunks

wence- · 2025-04-29T14:58:19Z

cpp/tests/test_chunk_batch.cu

+            chunks.emplace_back(5, 5, 101);
+            chunks.emplace_back(4, 4, 0, len, copy_metadata(), copy_data());
+        } else {
+            RAPIDSMPF_EXPECTS(chunks_type == "empty", "unkown chunk type " + chunks_type);


Suggested change

RAPIDSMPF_EXPECTS(chunks_type == "empty", "unkown chunk type " + chunks_type);

RAPIDSMPF_EXPECTS(chunks_type == "empty", "unknown chunk type " + chunks_type);

wence- · 2025-04-29T15:00:38Z

cpp/src/shuffler/chunk_batch.cpp

+        chunk_payload = payload_buf.copy_slice(
+            payload_offset, std::ptrdiff_t(chunk_header->gpu_data_size), stream
+        );
+    }


Is there a reason this iterator doesn't produce ChunkViews and then the consumer can decide whether to copy or not?

wence- · 2025-07-07T14:35:33Z

@nirandaperera is this one still relevant, or are the bits all incorporated?

nirandaperera · 2025-07-07T17:08:21Z

@wence- thanks. Let's close this. #291 succeeds this.

nirandaperera added 30 commits March 31, 2025 07:20

wip

8b655ac

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

73a76ad

…nto batch-input-partitions

wip

bd2093d

wip

25f6117

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

5e5fbbf

…nto batch-input-partitions

chunk batch definition

a8ee7ba

running precommit

a05df4e

adding chunk visitor

960bf2e

more comments

3080936

addressign comments

0f0468a

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

3c7695e

…nto batch-input-partitions

fix build

7f11cb2

adding test

62062ed

adding more comments

1dfeda1

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

a90bf30

…nto batch-input-partitions

more tests

e0e6449

using thrust equal

4a871c3

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

0507d05

…nto batch-input-partitions

precommit

708c67c

adding fwd iterator

9efc2e8

Merge branch 'branch-25.06' of github.com:rapidsai/rapids-multi-gpu i…

95f84ad

…nto batch-input-partitions

merge conflicts

057917d

running precommit

1324063

special case for batches with 1 payload

32f1199

addressing comments

9f236af

Merge branch 'branch-25.06' into batch-input-partitions

29deb42

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into bat…

bfb5578

…ch-input-partitions

running precommit

f868416

addressing comments

b17cee4

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into bat…

bc2895a

…ch-input-partitions

nirandaperera requested review from a team as code owners April 28, 2025 22:29

nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 28, 2025

nirandaperera mentioned this pull request Apr 28, 2025

Adding ChunkBatch interface #181

Closed

precommit

f9906cd

Signed-off-by: niranda perera <[email protected]>

nirandaperera requested review from madsbk, wence- and pentschev April 28, 2025 23:07

wence- changed the title ~~Adding ChunkBatch interface - Updated~~ Add ChunkBatch interface to aggregate small messages before sending Apr 29, 2025

wence- reviewed Apr 29, 2025

View reviewed changes

madsbk reviewed Apr 29, 2025

View reviewed changes

wence- reviewed Apr 29, 2025

View reviewed changes

nirandaperera closed this Jul 7, 2025

	* @note This is allocated at the front of the the metadata buffer.
	* @note This is allocated at the front of the metadata buffer.

		// reserve for the chunks size
		ret.reserve(ret.size() + chunks.size());

	// Parametarized test for MemoryType and types of chunks
	// Parametrized test for MemoryType and types of chunks

	RAPIDSMPF_EXPECTS(chunks_type == "empty", "unkown chunk type " + chunks_type);
	RAPIDSMPF_EXPECTS(chunks_type == "empty", "unknown chunk type " + chunks_type);

Add ChunkBatch interface to aggregate small messages before sending #231

Add ChunkBatch interface to aggregate small messages before sending #231

Uh oh!

Conversation

nirandaperera commented Apr 28, 2025 • edited by wence- Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Apr 28, 2025

Uh oh!

nirandaperera commented Apr 28, 2025

Uh oh!

copy-pr-bot bot commented Apr 28, 2025

Uh oh!

nirandaperera commented Apr 28, 2025

Uh oh!

nirandaperera commented Apr 28, 2025

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madsbk left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wence- Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wence- commented Jul 7, 2025

Uh oh!

nirandaperera commented Jul 7, 2025

Uh oh!

Uh oh!

Add `ChunkBatch` interface to aggregate small messages before sending #231

Add `ChunkBatch` interface to aggregate small messages before sending #231

nirandaperera commented Apr 28, 2025 •

edited by wence-

Loading

madsbk left a comment •

edited

Loading

wence- Apr 29, 2025 •

edited

Loading