-
Notifications
You must be signed in to change notification settings - Fork 415
Initial refactoring of edge storage #1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
28c4331
to
42693d6
Compare
iterator operator--() { | ||
value_ -= 1; | ||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the corresponding operators (also !=
below)? They don't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what? This comment doesn't make a lot of sense.
vpr/src/route/rr_node_storage.h
Outdated
t_edge_size num_edges(const RRNodeId& id) const { | ||
auto first_id = first_edge_[id]; | ||
auto second_id = (&first_edge_[id])[1]; | ||
return (size_t)second_id - (size_t)first_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dangerous to use auto
followed by a cast. I'd prefer an explicit type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually a cast, it's a operator size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider using size_t(second_id) - size_t(first_id)
to make this more explicit. That's how its done elsewhere in VPR.
e6267aa
to
87d0ef8
Compare
There are many uses of |
Sanity check with Before:
After:
This jump in ~1.2 GiB is basically the cost of storing the source rr node explicitly. I'll noddle over alternative strategies. |
87d0ef8
to
25f946f
Compare
So in an unexpected turn, while the max_rss after create device is higher than before, because the heap is not thrashed, the final max_rss is much better: Before:
After:
I'll need to do a full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thoughts after looking through this is that its looking good!
It is interesting that some parts of it are looking structurally more like the proposed RRGraph in #1046 (e.g. SoA form edge storage). I think ultimately moving towards that style of interface (pass an ID to get an attribute from an element in the graph) is where we want to go.
I have a couple more detailed comments below including a possible approach to avoid storing the edge_src_nodes_ after edge partitioning but still allowing (moderatly fast) access to an edge's source node.
vpr/src/route/rr_node_storage.h
Outdated
vtr::vector<RRNodeId, RREdgeId> first_edge_; | ||
vtr::vector<RRNodeId, t_edge_size> fan_in_; | ||
|
||
vtr::vector<RREdgeId, RRNodeId> edge_src_node_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noted, the additional storage of the source node is non-trivial (since there are more edges than nodes).
It's nice to have this information, but it isn't currently used (e.g. in the router). So if we really want to focus on memory usage, we could consider dropping it from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, given the sorted structure of the edges and that we know the start/end of each node's out-going edges, it seems like we should be able to get the source node for a given edge ID in log(num_nodes) time via binary search against the (sorted) t_rr_node_data::first_edge_
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason for the edge_src_node_
data is to sort the underlying data. After that is complete, we can discard the array.
vpr/src/route/rr_node_storage.h
Outdated
t_edge_size num_edges(const RRNodeId& id) const { | ||
auto first_id = first_edge_[id]; | ||
auto second_id = (&first_edge_[id])[1]; | ||
return (size_t)second_id - (size_t)first_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider using size_t(second_id) - size_t(first_id)
to make this more explicit. That's how its done elsewhere in VPR.
vpr/src/route/rr_node_storage.h
Outdated
@@ -71,17 +59,25 @@ struct t_rr_node_data { | |||
int16_t track_num; | |||
int16_t class_num; | |||
} ptc_; | |||
t_edge_size fan_in_ = 0; | |||
|
|||
uint16_t capacity_ = 0; | |||
}; | |||
|
|||
// RR node and edge storage class. | |||
class t_rr_node_storage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point this struct is really now representing the RR graph itself. Perhaps it should be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
vpr/src/route/rr_node_storage.cpp
Outdated
return; | ||
} | ||
|
||
edges_read_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on clear on what this variable is for, and how it differs from partitioned_. Likely need comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added
vpr/src/route/rr_node_storage.cpp
Outdated
const auto& device_ctx = g_vpr_ctx.device(); | ||
std::stable_sort( | ||
edge_sort_iterator(this, 0), | ||
edge_sort_iterator(this, edge_src_node_.size()), | ||
edge_compare_src_node_and_configurable_first(device_ctx.rr_switch_inf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some comments describing what's going on here.
I think its accomplishing:
- Ordering edges by src_node (so edges with a shared source node are contiguous)
- Partitioning the edges for each src_node by configurable/non-configurable
with a single sort. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
vpr/src/route/rr_node_storage.h
Outdated
t_rr_node_data& get(const RRNodeId& id) { | ||
return storage_[id]; | ||
} | ||
const t_rr_node_data& get(const RRNodeId& id) const { | ||
return storage_[id]; | ||
} | ||
|
||
// Take allocated edges in edge_src_node_/ edge_dest_node_ / edge_switch_ | ||
// sort, and assign the first edge for each | ||
void assign_edges(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename assign_first_edges()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
vpr/src/route/rr_node_storage.cpp
Outdated
* the edge data is directly usable for each node by simply slicing the arrays. | ||
* | ||
* */ | ||
struct edge_swapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice way to keep the swapping clean in Struct-of-Arrays form!
vpr/src/route/rr_node_storage.h
Outdated
RREdgeId edge_id(const RRNodeId& id, t_edge_size iedge) const { | ||
RREdgeId first_edge = first_edge_[id]; | ||
RREdgeId ret((size_t)first_edge + iedge); | ||
VTR_ASSERT_SAFE(ret < (&first_edge_[id])[1]); | ||
return ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That we can do this easily given the edge ordering is a really nice feature, which I expect will make porting/refactoring downstream code much easier.
I think I've identified a way to regain some of the lost CPU performance. I've started a vtr_reg_weekly QoR run, but it won't return results for a day or two. I'll push a new branch today with a rebase onto of master, along with fixes to review comments. |
25f946f
to
aca9c27
Compare
I believe all feedback has been addressed. I've kicked off a QoR run, results will be ready in a couple days. Preliminary results show the CPU cost is gone, with the 4% memory increase remaining. #1096 resolves the memory increase, and changes it to a 35-60% memory reduction. |
2b055d6
to
bbf1d04
Compare
Signed-off-by: Keith Rothman <[email protected]>
This should have a negliable performance impact, but this enables future changes to modify how rr nodes and rr edges are storaged. Signed-off-by: Keith Rothman <[email protected]>
This changes edge storage from an allocation array of struct per node to struct of array for all edge data. Several algorithms over edges that were previous per node per edge, but were actually just iteration over edges are now part of rr_node_storage. Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
This enables 16-byte alignment (4 nodes per cache line). Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
Signed-off-by: Keith Rothman <[email protected]>
bbf1d04
to
e96a3ba
Compare
Description
This changes edge storage from an allocation array of struct per node to
struct of array for all edge data.
Several algorithms over edges that were previous per node per edge, but
were actually just iteration over edges are now part of rr_node_storage.
This PR is built on top #1084 .
Related Issue
#1079
#1081
#1084
Motivation and Context
This changes the number of heap allocations on preallocated array (e.g. loading rr graph from file) from
Nnodes
to3
.This changes reduces the number of heap allocations during rr graph building from
Nnodes
to3 * ceil(log2(Nedges) - log2(10*Nodes))
.When the number of edges is known the max memory usage for edges is now
Nedge * (2*sizeof(int)+sizeof(short))
. When the number of edges is not known, the max memory usage for edges is now:How Has This Been Tested?
Types of changes
Checklist: