Support transfers for layouts with multiple id-indexed fields #284

2dm · 2025-09-03T00:19:43Z

This patch adds a fast‐path for multi‐field copies when source and destination instances store fields consecutively in memory with strictly increasing IDs. Instead of the generic per‐field iterator, Realm switches to an IDIndexedIterator and issues the entire copy as a single batched DMA operation.

Fast‐Path Mechanics

Layout Detection: During instance construction, Realm identifies when field IDs form a consecutive sequence and marks the instances for the optimized path.
IDIndexedIterator: Attaches a single FieldBlock (allocated from the replicated heap) to the address list, skipping per‐field offset lookups.
Affine Collapse: Treats the multi‐field operation as one affine rectangle with N attached fields, collapsing what would be N separate copies into a single batched command.
Flow‐Control Splitting: The DMA engine automatically splits the batch according to internal flow‐control limits and IB fragment sizes (for cross‐node transfers).

Testing

Integration Test: Verifies end-to-end multi-field copy behavior on real instances.
Unit Tests: Covers iterator construction, address-list setup, FieldBlock allocation, and DMA splitting logic.

src_inst{ 0: [..field_size..], 1: [..field_size..], 2: [..field_size..], N-1: [..field_size..] }
dst_inst{ 0: [..field_size..], 1: [..field_size..], 2: [..field_size..], N-1: [..field_size..] }
index_space.copy(src_fids[32, 16, 612, 0..], dst_fids[17, 3, 999, 2..])

Authored by @apryakhin

codecov · 2025-09-03T00:20:22Z

Codecov Report

❌ Patch coverage is 40.77670% with 183 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.37%. Comparing base (06e5ba4) to head (9a0df71).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/realm/transfer/transfer.cc	24.88%	132 Missing and 28 partials ⚠️
src/realm/inst_layout.inl	65.11%	15 Missing ⚠️
src/realm/runtime_impl.cc	0.00%	3 Missing ⚠️
src/realm/transfer/channel.h	0.00%	2 Missing ⚠️
tests/unit_tests/idindexed_fields_iterator_test.cc	95.74%	2 Missing ⚠️
src/realm/transfer/channel.cc	0.00%	1 Missing ⚠️

Additional details and impacted files

@@                     Coverage Diff                      @@
##           mdery/addr_list_refactor     #284      +/-   ##
============================================================
+ Coverage                     27.22%   27.37%   +0.14%     
============================================================
  Files                           189      190       +1     
  Lines                         39321    39574     +253     
  Branches                      14364    14287      -77     
============================================================
+ Hits                          10707    10833     +126     
+ Misses                        28240    27405     -835     
- Partials                        374     1336     +962

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

apryakhin · 2025-09-03T15:47:04Z

@2dm Thanks for adding this PR to the github. What's our strategy here? Are you proposing to carry it through the review process or do we plan to split it up?

apryakhin · 2025-09-03T15:50:12Z

src/realm/transfer/address_list.h

 namespace Realm {

+  template <typename FieldID>
+  struct FieldBlockBase {


@2dm I think the address list (.h and .cc) changes only are somewhat cumbersome. Can we split this out (with the test that we already have) into an independent PR?

apryakhin · 2025-09-03T15:50:59Z

src/realm/transfer/transfer.cc

+    }
+  }
+
+  /*template <int N, typename T>


This block probably needs to be deleted

apryakhin · 2025-09-03T15:52:18Z

src/realm/transfer/transfer_utils.h

+   */
+  template <int N, typename T>
+  inline int
+  compact_affine_dims(const AffineLayoutPiece<N, T> *affine, const Rect<N, T> &subrect,


we can split this change out as well (it also comes with an independent test)

the new address_list api is using the output of it. I think it makes sense to keep it together.

Okay I am fine with it

apryakhin · 2025-09-03T15:57:07Z

src/realm/inst_layout.h

-    std::map<FieldID, FieldLayout> fields;
+    using FieldMap = std::map<FieldID, FieldLayout>;
+    FieldMap fields;
+    bool idindexed_fields{false};


That's will perhaps be a non-functional split but I'd like us to consider splitting any instance related changes as well. For example, the logic to detect the layout can be reviewed separately. It doesn't have to be consumed together with the actual DMA plumbing path.

Some folks may have opinions on how we do this "detection" etc.

apryakhin · 2025-09-03T16:00:08Z

src/realm/cuda/cuda_internal.cc

-                                            GPU *in_gpu, AddressListCursor &out_alc,
-                                            uintptr_t out_base, GPU *out_gpu,
-                                            size_t bytes_left)
+    size_t GPUXferDes::read_address_entry(AffineCopyInfo<3> &copy_infos,


I think for this changes in cuda_internal.cc we have no choice but to make them together with everything under transfer.cc. But please see other comments we can at least review and check-in separately address list...utils and instance layout changes...with appropriate tests.

I am somewhat worried about bugs in read_address_entry and overall whether it's designed/implemented in the best possible way.

eddy16112 · 2025-09-03T16:11:39Z

src/realm/transfer/transfer.cc

              xdn.target_node = path_info.xd_channels[j]->node;
+
+              bool enable_multi_field = false;
+              bool success = get_runtime()->get_module_config("core")->get_property(


It returns an RealmStatus, not bool, I think this causes the error in CI.

apryakhin · 2025-09-03T16:12:43Z

@2dm The OG gitlab CI was passing for all tests (except clang-format):

https://gitlab.com/StanfordLegion/legion/-/pipelines/1865577933

It's either some regression with the additional changes we make to reduce copy launch overhead or there are different settings/tests in github CI compared to gitlab.

eddy16112 · 2025-09-03T16:14:19Z

src/realm/cuda/cuda_internal.cc

        unsigned bw = src_gpu->info->logical_peer_bandwidth[_src_gpu->info->index];
        unsigned latency = src_gpu->info->logical_peer_latency[_src_gpu->info->index];
-        unsigned frag_overhead = 2000; // HACK - estimate at 2 us
+        unsigned frag_overhead = 200; // HACK - estimate at 2 us


Why change it to 200?

eddy16112 · 2025-09-03T16:15:41Z

src/realm/cuda/cuda_internal.h

+          serdez_subclass;
+    };
+
+    class GPURemoteChannel : public RemoteChannel {


Could you please explain why we need a GPURemoteChannel now?

eddy16112 · 2025-09-03T16:28:29Z

src/realm/inst_layout.inl


+    // Compute preferred dimension ordering for idindexed_fields
+    if(layout->idindexed_fields) {
+      layout->preferred_dim_order.clear();


What does preferred_dim_order used for? What is the relationship between the dim_order of the function and preferred_dim_order?

preferred_dim_order holds a pre-computed order. It is decided on layout creation. dim_order will be set to preferred_dim_order after filtering out trivial dims if the transfer makes those.
The goal is to move the expensive computation out of the transfer for this layout (layout->idindexed_fields)

Co-authored-by: apriakhin <[email protected]>

2dm requested a review from apryakhin September 3, 2025 00:19

github-actions bot added the chore label Sep 3, 2025

2dm force-pushed the mdery/apriakhin-uniform-fields branch 2 times, most recently from a1c10a5 to ed1820b Compare September 3, 2025 02:11

apryakhin reviewed Sep 3, 2025

View reviewed changes

src/realm/transfer/transfer.cc Outdated

}

}

/*template <int N, typename T>

Copy link

Contributor

apryakhin Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block probably needs to be deleted

apryakhin reviewed Sep 3, 2025

View reviewed changes

eddy16112 reviewed Sep 3, 2025

View reviewed changes

2dm force-pushed the mdery/apriakhin-uniform-fields branch 5 times, most recently from 917b308 to 3bca93c Compare September 12, 2025 00:44

muraj force-pushed the main branch from a39ff4c to 9181321 Compare September 19, 2025 20:15

2dm changed the base branch from main to mdery/addr_list_refactor September 21, 2025 20:01

2dm force-pushed the mdery/apriakhin-uniform-fields branch from 3bca93c to 9e84214 Compare September 21, 2025 20:01

2dm force-pushed the mdery/addr_list_refactor branch from c032b92 to 06e5ba4 Compare September 21, 2025 20:05

2dm and others added 4 commits September 21, 2025 23:10

Multi-field iterator and transfer domain logic

8f30db3

Co-authored-by: apriakhin <[email protected]>

CUDA: Add multi-field optimized kernels and transfer logic

e60ed11

Co-authored-by: apriakhin <[email protected]>

Add multi-field channel support

b31f5b6

Co-authored-by: apriakhin <[email protected]>

Configuration, integration and testing

9a0df71

Co-authored-by: apriakhin <[email protected]>

2dm force-pushed the mdery/apriakhin-uniform-fields branch from 9e84214 to 9a0df71 Compare September 21, 2025 20:12

Support transfers for layouts with multiple id-indexed fields #284

Are you sure you want to change the base?

Support transfers for layouts with multiple id-indexed fields #284

Uh oh!

Conversation

2dm commented Sep 3, 2025

Uh oh!

codecov bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

apryakhin commented Sep 3, 2025

Uh oh!

apryakhin Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apryakhin commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 3, 2025 •

edited

Loading

apryakhin Sep 3, 2025 •

edited

Loading