Skip to content

Conversation

2dm
Copy link
Collaborator

@2dm 2dm commented Sep 3, 2025

This patch adds a fast‐path for multi‐field copies when source and destination instances store fields consecutively in memory with strictly increasing IDs. Instead of the generic per‐field iterator, Realm switches to an IDIndexedIterator and issues the entire copy as a single batched DMA operation.

Fast‐Path Mechanics

  • Layout Detection: During instance construction, Realm identifies when field IDs form a consecutive sequence and marks the instances for the optimized path.
  • IDIndexedIterator: Attaches a single FieldBlock (allocated from the replicated heap) to the address list, skipping per‐field offset lookups.
  • Affine Collapse: Treats the multi‐field operation as one affine rectangle with N attached fields, collapsing what would be N separate copies into a single batched command.
  • Flow‐Control Splitting: The DMA engine automatically splits the batch according to internal flow‐control limits and IB fragment sizes (for cross‐node transfers).

Testing

  • Integration Test: Verifies end-to-end multi-field copy behavior on real instances.
  • Unit Tests: Covers iterator construction, address-list setup, FieldBlock allocation, and DMA splitting logic.
src_inst{ 0: [..field_size..], 1: [..field_size..], 2: [..field_size..], N-1: [..field_size..] }
dst_inst{ 0: [..field_size..], 1: [..field_size..], 2: [..field_size..], N-1: [..field_size..] }
index_space.copy(src_fids[32, 16, 612, 0..], dst_fids[17, 3, 999, 2..])

Authored by @apryakhin

@2dm 2dm requested a review from apryakhin September 3, 2025 00:19
@github-actions github-actions bot added the chore label Sep 3, 2025
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 40.77670% with 183 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.37%. Comparing base (06e5ba4) to head (9a0df71).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/realm/transfer/transfer.cc 24.88% 132 Missing and 28 partials ⚠️
src/realm/inst_layout.inl 65.11% 15 Missing ⚠️
src/realm/runtime_impl.cc 0.00% 3 Missing ⚠️
src/realm/transfer/channel.h 0.00% 2 Missing ⚠️
tests/unit_tests/idindexed_fields_iterator_test.cc 95.74% 2 Missing ⚠️
src/realm/transfer/channel.cc 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##           mdery/addr_list_refactor     #284      +/-   ##
============================================================
+ Coverage                     27.22%   27.37%   +0.14%     
============================================================
  Files                           189      190       +1     
  Lines                         39321    39574     +253     
  Branches                      14364    14287      -77     
============================================================
+ Hits                          10707    10833     +126     
+ Misses                        28240    27405     -835     
- Partials                        374     1336     +962     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@2dm 2dm force-pushed the mdery/apriakhin-uniform-fields branch 2 times, most recently from a1c10a5 to ed1820b Compare September 3, 2025 02:11
@apryakhin
Copy link
Contributor

@2dm Thanks for adding this PR to the github. What's our strategy here? Are you proposing to carry it through the review process or do we plan to split it up?

namespace Realm {

template <typename FieldID>
struct FieldBlockBase {
Copy link
Contributor

@apryakhin apryakhin Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2dm I think the address list (.h and .cc) changes only are somewhat cumbersome. Can we split this out (with the test that we already have) into an independent PR?

}
}

/*template <int N, typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block probably needs to be deleted

*/
template <int N, typename T>
inline int
compact_affine_dims(const AffineLayoutPiece<N, T> *affine, const Rect<N, T> &subrect,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can split this change out as well (it also comes with an independent test)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new address_list api is using the output of it. I think it makes sense to keep it together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I am fine with it

std::map<FieldID, FieldLayout> fields;
using FieldMap = std::map<FieldID, FieldLayout>;
FieldMap fields;
bool idindexed_fields{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's will perhaps be a non-functional split but I'd like us to consider splitting any instance related changes as well. For example, the logic to detect the layout can be reviewed separately. It doesn't have to be consumed together with the actual DMA plumbing path.

Some folks may have opinions on how we do this "detection" etc.

GPU *in_gpu, AddressListCursor &out_alc,
uintptr_t out_base, GPU *out_gpu,
size_t bytes_left)
size_t GPUXferDes::read_address_entry(AffineCopyInfo<3> &copy_infos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this changes in cuda_internal.cc we have no choice but to make them together with everything under transfer.cc. But please see other comments we can at least review and check-in separately address list...utils and instance layout changes...with appropriate tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat worried about bugs in read_address_entry and overall whether it's designed/implemented in the best possible way.

xdn.target_node = path_info.xd_channels[j]->node;

bool enable_multi_field = false;
bool success = get_runtime()->get_module_config("core")->get_property(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an RealmStatus, not bool, I think this causes the error in CI.

@apryakhin
Copy link
Contributor

@2dm The OG gitlab CI was passing for all tests (except clang-format):

It's either some regression with the additional changes we make to reduce copy launch overhead or there are different settings/tests in github CI compared to gitlab.

unsigned bw = src_gpu->info->logical_peer_bandwidth[_src_gpu->info->index];
unsigned latency = src_gpu->info->logical_peer_latency[_src_gpu->info->index];
unsigned frag_overhead = 2000; // HACK - estimate at 2 us
unsigned frag_overhead = 200; // HACK - estimate at 2 us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change it to 200?

serdez_subclass;
};

class GPURemoteChannel : public RemoteChannel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why we need a GPURemoteChannel now?


// Compute preferred dimension ordering for idindexed_fields
if(layout->idindexed_fields) {
layout->preferred_dim_order.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does preferred_dim_order used for? What is the relationship between the dim_order of the function and preferred_dim_order?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preferred_dim_order holds a pre-computed order. It is decided on layout creation. dim_order will be set to preferred_dim_order after filtering out trivial dims if the transfer makes those.
The goal is to move the expensive computation out of the transfer for this layout (layout->idindexed_fields)

@2dm 2dm force-pushed the mdery/apriakhin-uniform-fields branch 5 times, most recently from 917b308 to 3bca93c Compare September 12, 2025 00:44
@2dm 2dm changed the base branch from main to mdery/addr_list_refactor September 21, 2025 20:01
@2dm 2dm force-pushed the mdery/apriakhin-uniform-fields branch from 3bca93c to 9e84214 Compare September 21, 2025 20:01
@2dm 2dm force-pushed the mdery/addr_list_refactor branch from c032b92 to 06e5ba4 Compare September 21, 2025 20:05
@2dm 2dm force-pushed the mdery/apriakhin-uniform-fields branch from 9e84214 to 9a0df71 Compare September 21, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants