Chunk with multiple messages #251

nirandaperera · 2025-05-06T20:52:04Z

Chunk with multiple messages. This PR only moves the existing Chunk class to the new impl and it would only have 1 message in it.

This class has two buffers:

metadata_: The metadata buffer that contains information about the messages in the chunks and the concatenated metadata of the messages.
data_: The data buffer that contains the concatenateddata of the messages in the chunk.

All the chunk information will be encoded to the metadata_ buffer as follows.
The metadata_ buffer uses the following format:

chunk_id: uint64_t, ID of the chunk
n_elements: size_t, Number of messages in the chunk
[partition_ids]: vector, Partition IDs of the messages, size = n_elements
[expected_num_chunks]: vector<size_t>, Expected number of chunks of the messages, size = n_elements
[meta_offsets]: vector<uint32_t>, Offsets (excluding 0) of the metadata sizes of the messages, size = n_elements
[data_offsets]: vector<uint64_t>, Offsets (excluding 0) of the data sizes of the messages, size = n_elements
[concat_metadata]: vector<uint8_t>, Concatenated metadata of the messages, size = meta_offsets[n_elements - 1]

For a chunk with N messages with M bytes of concat metadata the size of metadata_ buffer is sizeof(ChunkID) + sizeof(size_t) + N (sizeof(PartID) + sizeof(size_t) + sizeof(uint32_t) + sizeof(uint64_t)) + M = 16 + N 24 + M bytes.

For a chunk with a single control message, the size of the metadata_ buffer is sizeof(ChunkID) + sizeof(PartID)+ 2*sizeof(size_t) + sizeof(uint32_t) + sizeof(uint64_t) = 40 bytes.

For a chunk with a single message with M bytes of metadata, the size of the metadata_ buffer is sizeof(ChunkID) + sizeof(PartID) + sizeof(size_t) + sizeof(uint32_t) + sizeof(ChunkID) + sizeof(PartID) + sizeof(size_t) + sizeof(uint32_t) + sizeof(uint64_t) + M = 40 + M bytes.

Signed-off-by: niranda perera <[email protected]>

pentschev · 2025-05-06T20:54:33Z

If you're merging branch-25.08 into your PR you need to retarget it to branch-25.08 as well. Was that intended @nirandaperera ?

…ti-packed-data-chunk

Signed-off-by: niranda perera <[email protected]>

nirandaperera · 2025-05-06T21:02:31Z

If you're merging branch-25.08 into your PR you need to retarget it to branch-25.08 as well. Was that intended @nirandaperera ?

Thanks @pentschev. It was a mistake. I force pushed the changes now

Signed-off-by: niranda perera <[email protected]>

nirandaperera · 2025-05-06T21:26:06Z

@wence- @madsbk this PR has the "new" Chunk API, with scaffolding for housing multiple messages. I didnt rename it to Chunk ATM, because I felt like the API is cleaner to review like this. I will replace Chunk once this comes out of draft.

Signed-off-by: niranda perera <[email protected]>

wence- · 2025-05-07T15:25:59Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+/**
+ * @brief Chunk with multiple messages.
+ *
+ * This class will have two buffers:


Suggested change

* This class will have two buffers:

* This class will has two buffers:

wence- · 2025-05-07T15:26:18Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+ * - data_: The data buffer that contains the concatenateddata of the messages in the
+ * chunk.
+ *
+ * The metadata_ buffer will have the following format:


Suggested change

* The metadata_ buffer will have the following format:

* The metadata_ buffer has the following format:

wence- · 2025-05-07T15:27:16Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+ * - [psum_meta]: std::vector<uint32_t>, Prefix sums (excluding 0) of the metadata
+ * sizes of the messages, size = n_elements
+ * - [psum_data]: std::vector<uint64_t>, Prefix sums (excluding 0) of the data sizes of


Can we call this something like metadata_offsets and data_offsets respectively?

wence- · 2025-05-07T17:33:43Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+     * @return The number of messages in the chunk.
+     */
+    inline size_t n_messages() const {
+        return *reinterpret_cast<size_t*>(metadata_->data() + sizeof(ChunkID));


All these reinterpret_cast type-punning approaches break strict-aliasing rules unfortunately.

What we should do is:

(C++ 20) use std::bit_cast (but it's messy because we're carrying around std::byte and bit_cast takes values, not pointer + size).

Use memcpy (the compiler will optimise this)

So, this would be (for example):

inline size_t n_messages() const { size_t result; memcpy(&result, metadata_->data() + sizeof(ChunkID), sizeof(result)); return result; }

I see. TIL. Sure, I will change to memcpy.

Signed-off-by: niranda perera <[email protected]>

…ti-packed-data-chunk Signed-off-by: niranda perera <[email protected]>

nirandaperera · 2025-05-08T20:12:04Z

/ok to test

Signed-off-by: niranda perera <[email protected]>

madsbk

Overall looks good!
@nirandaperera, let's prioritize getting full chunk support. I think it will have a significant impact!

cpp/include/rapidsmpf/shuffler/chunk.hpp

cpp/src/shuffler/chunk.cpp

Signed-off-by: niranda perera <[email protected]>

…ti-packed-data-chunk

cpp/include/rapidsmpf/shuffler/chunk.hpp

cpp/src/shuffler/postbox.cpp

cpp/src/shuffler/shuffler.cpp

Signed-off-by: niranda perera <[email protected]>

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Signed-off-by: niranda perera <[email protected]>

…ti-packed-data-chunk

wence- · 2025-05-14T10:00:06Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+        assert(!data_offsets_.empty());
+        assert(!is_control_message(i));


question: Should we use RAPIDSMPF_EXPECTS here?

I was previously thinking about only populate data_offsets_ only when there are data messages. But ended up populating data_offsets_ and meta_offsets_ for all messages. So, we can remove these asserts.

wence-

I think we can tidy some things up and remove TODOs by ensuring Chunks obey their invariants by construction.

wence- · 2025-05-14T10:02:28Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+    inline size_t concat_metadata_size() const {
+        assert(metadata_);
+        assert(!meta_offsets_.empty());
+        assert(meta_offsets_[n_messages() - 1] == metadata_->size());


These invariants are (or should be) enforced by the constructor, so why do we need to assert them here?

fair point. I was mostly using the asserts to prevent me from shooting myself in the foot 😉 I'll remove these

cpp/include/rapidsmpf/shuffler/chunk.hpp

wence- · 2025-05-14T10:06:00Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

    );
+
+    ChunkID const chunk_id_;  ///< The ID of the chunk.
+    size_t const n_messages_;  ///< The number of messages in the chunk.


question: this information needs to be sent over the wire, but I think it redundantly encodes meta_offsets_.size(), is that right? If yes, should we remove it?

I agree. We can remove Chunk::n_messges_ and use the size of either on these vectors

std::vector<PartID> part_ids, std::vector<size_t> expected_num_chunks, std::vector<uint32_t> meta_offsets, std::vector<uint64_t> data_offsets,

wence- · 2025-05-14T10:07:15Z

cpp/src/shuffler/chunk.cpp

+    : chunk_id_{chunk_id},
+      n_messages_{n_messages},
+      part_ids_{std::move(part_ids)},
+      expected_num_chunks_{std::move(expected_num_chunks)},
+      meta_offsets_{std::move(meta_offsets)},
+      data_offsets_{std::move(data_offsets)},
+      metadata_{std::move(metadata)},
+      data_{std::move(data)} {}


suggestions: Let us do all the validation of the format of the Chunk here. Then we don't need assertions scattered around the rest of the code, since it will not be possible to construct an invalid chunk.

wence- · 2025-05-14T10:08:58Z

cpp/include/rapidsmpf/shuffler/chunk.hpp

+     * the message is a data message, the buffers will be moved to the new ChunkBatch.
+     * Otherwise a new ChunkBatch will be created by copying data.


nit: Not a ChunkBatch any more.

wence- · 2025-05-14T10:10:32Z

cpp/src/shuffler/chunk.cpp

+    // For each message, validate the metadata and data sizes
+    auto const* psum_meta = reinterpret_cast<uint32_t const*>(
+        serialized_buf.data() + sizeof(ChunkID) + sizeof(size_t)
+        + n_messages * (sizeof(PartID) + sizeof(size_t))


wence- · 2025-05-14T10:10:36Z

cpp/src/shuffler/chunk.cpp

    );
+    auto const* psum_data = reinterpret_cast<uint64_t const*>(psum_meta + n_messages);


wence- · 2025-05-14T10:13:41Z

cpp/src/shuffler/postbox.cpp

@@ -14,9 +14,16 @@ namespace rapidsmpf::shuffler::detail {

 template <typename KeyType>
 void PostBox<KeyType>::insert(Chunk&& chunk) {
+    // check if all partition IDs in the chunk map to the same key
+    KeyType key = key_map_fn_(chunk.part_id(0));


question: I guess by construction a chunk always contains at least one piece?

cpp/src/shuffler/shuffler.cpp

wence- · 2025-05-14T10:20:58Z

cpp/src/shuffler/shuffler.cpp

-        if (chunk.gpu_data) {
-            statistics_->add_bytes_stat("shuffle-payload-send", chunk.gpu_data->size);
-            statistics_->add_bytes_stat("shuffle-payload-recv", chunk.gpu_data->size);
+    // TODO: Guarantee that all messages in the chunk map to the same key (rank).


As above, do this in the constructor of Chunk then you don't need to litter the rest of the code with checks.

Are you suggesting that we add a mapping function to the Chunk ctr?

Signed-off-by: niranda perera <[email protected]>

…ti-packed-data-chunk

madsbk

Final suggestions

cpp/include/rapidsmpf/shuffler/chunk.hpp

cpp/tests/test_shuffler.cpp

Co-authored-by: Mads R. B. Kristensen <[email protected]>

cpp/include/rapidsmpf/shuffler/chunk.hpp

Co-authored-by: Mads R. B. Kristensen <[email protected]>

madsbk · 2025-05-30T07:07:49Z

@wence- do you have anything else?

wence-

I think there are still perhaps a few places where the invariants of chunks (e.g. that all partitions in a chunk map to the same ID) could be enforced at construction. But I think that can be done in a followup, since it requires pushing various additional information like the partition->rank mapping function through.

…ti-packed-data-chunk

nirandaperera · 2025-05-30T18:21:52Z

/merge

This PR adds multi-packed data for the shuffler. Closes #145 Depends on #251 #271 ## Perf Analysis Weak scaling analysis, of 4GB/ rank shuffle (pre-hash partitioned data, in the PDX cluster). The concatenation shows significant performance improvement, amid creating multiple copies of data. | out_parts | in_parts | nranks | global throughput GiB/s (new) | | local throughput GiB/s | | time (s) | | time change % | local throughput change % | global throughput change% | |:---------:|:--------:|:------:|:-----------------------------:|:------:|:----------------------:|:-----:|:--------:|:------:|:-------------:|:-------------------------:|:-------------------------:| | | | | new | old | new | old | new | old | | | | | 128 | 8 | 2 | 174.20 | 123.78 | 87.10 | 61.89 | 0.0500 | 0.0650 | 23.07 | 40.74 | 40.73 | | 128 | 8 | 4 | 318.13 | 136.83 | 79.53 | 34.21 | 0.0500 | 0.1175 | 57.44 | 132.49 | 132.49 | | 128 | 8 | 8 | 440.81 | 271.68 | 55.10 | 33.96 | 0.0737 | 0.1187 | 37.89 | 62.24 | 62.25 | [notebook link](https://colab.research.google.com/drive/1tgtn-dTw_YB9yfBQNs5RN-EJccAwBIm5?authuser=1#scrollTo=YeM0d1PVQ4UR) ![image](https://github.com/user-attachments/assets/9c4e7802-2591-4fa9-9eec-6554b0a9f051) ![image](https://github.com/user-attachments/assets/9407af38-4ecd-4dd4-a6d4-18fab8b895b2) Authors: - Niranda Perera (https://github.com/nirandaperera) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) URL: #291

adding chunk batch

baad9c0

Signed-off-by: niranda perera <[email protected]>

nirandaperera added 2 commits May 6, 2025 14:00

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

278c38a

…ti-packed-data-chunk

adding chunk batch

aba10c0

Signed-off-by: niranda perera <[email protected]>

nirandaperera force-pushed the multi-packed-data-chunk branch from fd59be4 to aba10c0 Compare May 6, 2025 21:01

nirandaperera added 2 commits May 6, 2025 14:05

precommit

ac5877f

Signed-off-by: niranda perera <[email protected]>

add todo

bd0b3c7

Signed-off-by: niranda perera <[email protected]>

nirandaperera added breaking Introduces a breaking change improvement Improves an existing functionality labels May 6, 2025

nirandaperera added 2 commits May 6, 2025 14:44

minor

5424530

Signed-off-by: niranda perera <[email protected]>

adding missing method

adeedcb

Signed-off-by: niranda perera <[email protected]>

wence- reviewed May 7, 2025

View reviewed changes

nirandaperera added 5 commits May 7, 2025 16:39

using class members rather than an encoded buffer

6654398

Signed-off-by: niranda perera <[email protected]>

adding commented out chunk

dcd3597

Signed-off-by: niranda perera <[email protected]>

fixing bug

bacb49e

Signed-off-by: niranda perera <[email protected]>

precommit

630922d

Signed-off-by: niranda perera <[email protected]>

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

a8dccd9

…ti-packed-data-chunk Signed-off-by: niranda perera <[email protected]>

nirandaperera added 2 commits May 8, 2025 15:31

removing old code

9cfe43b

Signed-off-by: niranda perera <[email protected]>

fixing tests

d6a7d2e

Signed-off-by: niranda perera <[email protected]>

nirandaperera requested a review from wence- May 8, 2025 22:46

madsbk reviewed May 9, 2025

View reviewed changes

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

cpp/src/shuffler/chunk.cpp Outdated Show resolved Hide resolved

nirandaperera marked this pull request as ready for review May 9, 2025 15:16

nirandaperera requested a review from a team as a code owner May 9, 2025 15:16

nirandaperera added 2 commits May 9, 2025 08:49

addressing comments

fc08094

Signed-off-by: niranda perera <[email protected]>

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

f5c8eda

…ti-packed-data-chunk

madsbk requested changes May 12, 2025

View reviewed changes

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

cpp/src/shuffler/postbox.cpp Outdated Show resolved Hide resolved

cpp/src/shuffler/shuffler.cpp Show resolved Hide resolved

addressing PR comments

82552e0

Signed-off-by: niranda perera <[email protected]>

nirandaperera and others added 3 commits May 13, 2025 13:52

Apply suggestions from code review

f3bf44e

Co-authored-by: Mads R. B. Kristensen <[email protected]>

precommit

af73b7d

Signed-off-by: niranda perera <[email protected]>

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

b79e7d8

…ti-packed-data-chunk

nirandaperera requested a review from madsbk May 13, 2025 20:53

wence- reviewed May 14, 2025

View reviewed changes

wence- requested changes May 14, 2025

View reviewed changes

nirandaperera added 2 commits May 14, 2025 16:21

addressing comments

ecd4147

Signed-off-by: niranda perera <[email protected]>

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

6e4b1c1

…ti-packed-data-chunk

nirandaperera requested a review from wence- May 14, 2025 23:21

nirandaperera added 2 commits May 15, 2025 13:05

Merge branch 'branch-25.06' into multi-packed-data-chunk

281b0fc

Merge branch 'branch-25.06' into multi-packed-data-chunk

4922258

madsbk requested changes May 19, 2025

View reviewed changes

cpp/include/rapidsmpf/shuffler/chunk.hpp Show resolved Hide resolved

cpp/tests/test_shuffler.cpp Outdated Show resolved Hide resolved

cpp/tests/test_shuffler.cpp Outdated Show resolved Hide resolved

nirandaperera and others added 2 commits May 20, 2025 01:38

Apply suggestions from code review

687980f

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Merge branch 'branch-25.06' into multi-packed-data-chunk

5c69447

madsbk reviewed May 20, 2025

View reviewed changes

cpp/include/rapidsmpf/shuffler/chunk.hpp Outdated Show resolved Hide resolved

nirandaperera and others added 2 commits May 20, 2025 09:44

Update cpp/include/rapidsmpf/shuffler/chunk.hpp

5c44c27

Co-authored-by: Mads R. B. Kristensen <[email protected]>

Merge branch 'branch-25.06' into multi-packed-data-chunk

053991a

nirandaperera requested a review from madsbk May 20, 2025 16:45

nirandaperera mentioned this pull request May 20, 2025

Add multi packed data for the shuffler #291

Merged

madsbk approved these changes May 20, 2025

View reviewed changes

nirandaperera added 2 commits May 20, 2025 14:44

Merge branch 'branch-25.06' into multi-packed-data-chunk

746db0d

Merge branch 'branch-25.06' into multi-packed-data-chunk

2caf64d

wence- approved these changes May 30, 2025

View reviewed changes

Merge branch 'branch-25.06' of github.com:rapidsai/rapidsmpf into mul…

ef69cbb

…ti-packed-data-chunk

nirandaperera changed the base branch from branch-25.06 to branch-25.08 May 30, 2025 17:38

Merge branch 'branch-25.08' into multi-packed-data-chunk

8252690

rapids-bot bot merged commit 9ea7d2a into rapidsai:branch-25.08 May 30, 2025
41 checks passed

	* This class will have two buffers:
	* This class will has two buffers:

	* The metadata_ buffer will have the following format:
	* The metadata_ buffer has the following format:

		assert(!data_offsets_.empty());
		assert(!is_control_message(i));

		* the message is a data message, the buffers will be moved to the new ChunkBatch.
		* Otherwise a new ChunkBatch will be created by copying data.

		);
		auto const* psum_data = reinterpret_cast<uint64_t const*>(psum_meta + n_messages);

Chunk with multiple messages #251

Chunk with multiple messages #251

Uh oh!

Conversation

nirandaperera commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pentschev commented May 6, 2025

Uh oh!

nirandaperera commented May 6, 2025

Uh oh!

nirandaperera commented May 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nirandaperera commented May 8, 2025

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madsbk commented May 30, 2025

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

nirandaperera commented May 6, 2025 •

edited

Loading