-
Notifications
You must be signed in to change notification settings - Fork 10
Support transfers for layouts with multiple id-indexed fields #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mdery/addr_list_refactor
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## mdery/addr_list_refactor #284 +/- ##
============================================================
+ Coverage 27.22% 27.37% +0.14%
============================================================
Files 189 190 +1
Lines 39321 39574 +253
Branches 14364 14287 -77
============================================================
+ Hits 10707 10833 +126
+ Misses 28240 27405 -835
- Partials 374 1336 +962 ☔ View full report in Codecov by Sentry. |
a1c10a5
to
ed1820b
Compare
@2dm Thanks for adding this PR to the github. What's our strategy here? Are you proposing to carry it through the review process or do we plan to split it up? |
namespace Realm { | ||
|
||
template <typename FieldID> | ||
struct FieldBlockBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2dm I think the address list (.h and .cc) changes only are somewhat cumbersome. Can we split this out (with the test that we already have) into an independent PR?
src/realm/transfer/transfer.cc
Outdated
} | ||
} | ||
|
||
/*template <int N, typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block probably needs to be deleted
*/ | ||
template <int N, typename T> | ||
inline int | ||
compact_affine_dims(const AffineLayoutPiece<N, T> *affine, const Rect<N, T> &subrect, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can split this change out as well (it also comes with an independent test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new address_list api is using the output of it. I think it makes sense to keep it together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I am fine with it
std::map<FieldID, FieldLayout> fields; | ||
using FieldMap = std::map<FieldID, FieldLayout>; | ||
FieldMap fields; | ||
bool idindexed_fields{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's will perhaps be a non-functional split but I'd like us to consider splitting any instance related changes as well. For example, the logic to detect the layout can be reviewed separately. It doesn't have to be consumed together with the actual DMA plumbing path.
Some folks may have opinions on how we do this "detection" etc.
GPU *in_gpu, AddressListCursor &out_alc, | ||
uintptr_t out_base, GPU *out_gpu, | ||
size_t bytes_left) | ||
size_t GPUXferDes::read_address_entry(AffineCopyInfo<3> ©_infos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this changes in cuda_internal.cc we have no choice but to make them together with everything under transfer.cc. But please see other comments we can at least review and check-in separately address list...utils and instance layout changes...with appropriate tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat worried about bugs in read_address_entry and overall whether it's designed/implemented in the best possible way.
src/realm/transfer/transfer.cc
Outdated
xdn.target_node = path_info.xd_channels[j]->node; | ||
|
||
bool enable_multi_field = false; | ||
bool success = get_runtime()->get_module_config("core")->get_property( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns an RealmStatus, not bool, I think this causes the error in CI.
@2dm The OG gitlab CI was passing for all tests (except clang-format): It's either some regression with the additional changes we make to reduce copy launch overhead or there are different settings/tests in github CI compared to gitlab. |
unsigned bw = src_gpu->info->logical_peer_bandwidth[_src_gpu->info->index]; | ||
unsigned latency = src_gpu->info->logical_peer_latency[_src_gpu->info->index]; | ||
unsigned frag_overhead = 2000; // HACK - estimate at 2 us | ||
unsigned frag_overhead = 200; // HACK - estimate at 2 us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change it to 200?
serdez_subclass; | ||
}; | ||
|
||
class GPURemoteChannel : public RemoteChannel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why we need a GPURemoteChannel now?
|
||
// Compute preferred dimension ordering for idindexed_fields | ||
if(layout->idindexed_fields) { | ||
layout->preferred_dim_order.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does preferred_dim_order used for? What is the relationship between the dim_order of the function and preferred_dim_order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preferred_dim_order holds a pre-computed order. It is decided on layout creation. dim_order will be set to preferred_dim_order after filtering out trivial dims if the transfer makes those.
The goal is to move the expensive computation out of the transfer for this layout (layout->idindexed_fields)
917b308
to
3bca93c
Compare
3bca93c
to
9e84214
Compare
c032b92
to
06e5ba4
Compare
Co-authored-by: apriakhin <[email protected]>
Co-authored-by: apriakhin <[email protected]>
Co-authored-by: apriakhin <[email protected]>
Co-authored-by: apriakhin <[email protected]>
9e84214
to
9a0df71
Compare
This patch adds a fast‐path for multi‐field copies when source and destination instances store fields consecutively in memory with strictly increasing IDs. Instead of the generic per‐field iterator, Realm switches to an IDIndexedIterator and issues the entire copy as a single batched DMA operation.
Fast‐Path Mechanics
Testing
Authored by @apryakhin