Initial refactoring of edge storage #1085

litghost · 2020-01-27T22:15:30Z

Description

This changes edge storage from an allocation array of struct per node to
struct of array for all edge data.

Several algorithms over edges that were previous per node per edge, but
were actually just iteration over edges are now part of rr_node_storage.

This PR is built on top #1084 .

Related Issue

#1079
#1081
#1084

Motivation and Context

This changes the number of heap allocations on preallocated array (e.g. loading rr graph from file) from Nnodes to 3.

This changes reduces the number of heap allocations during rr graph building from Nnodes to 3 * ceil(log2(Nedges) - log2(10*Nodes)).

When the number of edges is known the max memory usage for edges is now Nedge * (2*sizeof(int)+sizeof(short)). When the number of edges is not known, the max memory usage for edges is now:

min(10*Nnode, Nedge) * (2*sizeof(int)+sizeof(short)) + sizeof(int)* (Nedge/2)

How Has This Been Tested?

Travis CI is green
Nightly and weekly QoR metrics are acceptable

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

HackerFoo · 2020-01-27T22:28:30Z

vpr/src/route/rr_node_fwd.h

+    iterator operator--() {
+        value_ -= 1;
+        return *this;
+    }


Why not use the corresponding operators (also != below)? They don't exist?

Sorry what? This comment doesn't make a lot of sense.

HackerFoo · 2020-01-27T22:31:34Z

vpr/src/route/rr_node_storage.h

+    t_edge_size num_edges(const RRNodeId& id) const {
+        auto first_id = first_edge_[id];
+        auto second_id = (&first_edge_[id])[1];
+        return (size_t)second_id - (size_t)first_id;


It's dangerous to use auto followed by a cast. I'd prefer an explicit type here.

This isn't actually a cast, it's a operator size_t.

We could consider using size_t(second_id) - size_t(first_id) to make this more explicit. That's how its done elsewhere in VPR.

vpr/src/route/rr_node_storage.h

HackerFoo · 2020-01-27T22:47:53Z

There are many uses of first and second that seem to be used as placeholder names, like x and y.

litghost · 2020-01-27T22:54:05Z

Sanity check with gsm_switch_stratixiv_arch_timing.blif (from weekly titan tests)

Before:

## Build routing resource graph took 212.56 seconds (max_rss 7950.6 MiB, delta_rss +1698.9 MiB)
  RR Graph Nodes: 20014774
  RR Graph Edges: 159049032
# Create Device took 214.84 seconds (max_rss 7950.6 MiB, delta_rss +1698.9 MiB)

After:

## Build routing resource graph took 159.88 seconds (max_rss 9165.3 MiB, delta_rss +2842.8 MiB)
  RR Graph Nodes: 20014774
  RR Graph Edges: 159049032
# Create Device took 161.22 seconds (max_rss 9165.3 MiB, delta_rss +2842.8 MiB)

This jump in ~1.2 GiB is basically the cost of storing the source rr node explicitly. I'll noddle over alternative strategies.

litghost · 2020-01-28T02:53:16Z

Sanity check with gsm_switch_stratixiv_arch_timing.blif (from weekly titan tests)

Before:
## Build routing resource graph took 212.56 seconds (max_rss 7950.6 MiB, delta_rss +1698.9 MiB)
  RR Graph Nodes: 20014774
  RR Graph Edges: 159049032
# Create Device took 214.84 seconds (max_rss 7950.6 MiB, delta_rss +1698.9 MiB)
After:
## Build routing resource graph took 159.88 seconds (max_rss 9165.3 MiB, delta_rss +2842.8 MiB)
  RR Graph Nodes: 20014774
  RR Graph Edges: 159049032
# Create Device took 161.22 seconds (max_rss 9165.3 MiB, delta_rss +2842.8 MiB)
This jump in ~1.2 GiB is basically the cost of storing the source rr node explicitly. I'll noddle over alternative strategies.

So in an unexpected turn, while the max_rss after create device is higher than before, because the heap is not thrashed, the final max_rss is much better:

Before:

The entire flow of VPR took 13322.02 seconds (max_rss 16152.4 MiB)

After:

The entire flow of VPR took 7068.79 seconds (max_rss 10713.4 MiB)

I'll need to do a full vtr_reg_weekly to confirm the results.

kmurray

My initial thoughts after looking through this is that its looking good!

It is interesting that some parts of it are looking structurally more like the proposed RRGraph in #1046 (e.g. SoA form edge storage). I think ultimately moving towards that style of interface (pass an ID to get an attribute from an element in the graph) is where we want to go.

I have a couple more detailed comments below including a possible approach to avoid storing the edge_src_nodes_ after edge partitioning but still allowing (moderatly fast) access to an edge's source node.

kmurray · 2020-01-31T22:40:24Z

vpr/src/route/rr_node_storage.h

+    vtr::vector<RRNodeId, RREdgeId> first_edge_;
+    vtr::vector<RRNodeId, t_edge_size> fan_in_;
+
+    vtr::vector<RREdgeId, RRNodeId> edge_src_node_;


As you noted, the additional storage of the source node is non-trivial (since there are more edges than nodes).

It's nice to have this information, but it isn't currently used (e.g. in the router). So if we really want to focus on memory usage, we could consider dropping it from here.

Thinking about this a bit more, given the sorted structure of the edges and that we know the start/end of each node's out-going edges, it seems like we should be able to get the source node for a given edge ID in log(num_nodes) time via binary search against the (sorted) t_rr_node_data::first_edge_s.

The primary reason for the edge_src_node_ data is to sort the underlying data. After that is complete, we can discard the array.

vpr/src/route/rr_node_storage.h

kmurray · 2020-01-31T22:46:20Z

vpr/src/route/rr_node_storage.h

+    t_edge_size num_edges(const RRNodeId& id) const {
+        auto first_id = first_edge_[id];
+        auto second_id = (&first_edge_[id])[1];
+        return (size_t)second_id - (size_t)first_id;


We could consider using size_t(second_id) - size_t(first_id) to make this more explicit. That's how its done elsewhere in VPR.

kmurray · 2020-01-31T22:47:44Z

vpr/src/route/rr_node_storage.h

@@ -71,17 +59,25 @@ struct t_rr_node_data {
        int16_t track_num;
        int16_t class_num;
    } ptc_;
-    t_edge_size fan_in_ = 0;
+
    uint16_t capacity_ = 0;
 };

 // RR node and edge storage class.
 class t_rr_node_storage {


At this point this struct is really now representing the RR graph itself. Perhaps it should be renamed?

kmurray · 2020-01-31T22:49:48Z

vpr/src/route/rr_node_storage.cpp

+        return;
+    }
+
+    edges_read_ = true;


I'm on clear on what this variable is for, and how it differs from partitioned_. Likely need comments

Comments added

kmurray · 2020-01-31T22:52:45Z

vpr/src/route/rr_node_storage.cpp

+    const auto& device_ctx = g_vpr_ctx.device();
+    std::stable_sort(
+        edge_sort_iterator(this, 0),
+        edge_sort_iterator(this, edge_src_node_.size()),
+        edge_compare_src_node_and_configurable_first(device_ctx.rr_switch_inf));


We'll need some comments describing what's going on here.

I think its accomplishing:

Ordering edges by src_node (so edges with a shared source node are contiguous)

Partitioning the edges for each src_node by configurable/non-configurable
with a single sort. Is that correct?

Your understanding is correct.

Comment added

kmurray · 2020-01-31T22:54:27Z

vpr/src/route/rr_node_storage.h

    t_rr_node_data& get(const RRNodeId& id) {
        return storage_[id];
    }
    const t_rr_node_data& get(const RRNodeId& id) const {
        return storage_[id];
    }

+    // Take allocated edges in edge_src_node_/ edge_dest_node_ / edge_switch_
+    // sort, and assign the first edge for each
+    void assign_edges();


Maybe rename assign_first_edges()?

kmurray · 2020-01-31T23:03:14Z

vpr/src/route/rr_node_storage.cpp

+ * the edge data is directly usable for each node by simply slicing the arrays.
+ *
+ * */
+struct edge_swapper {


This is a nice way to keep the swapping clean in Struct-of-Arrays form!

kmurray · 2020-01-31T23:10:09Z

vpr/src/route/rr_node_storage.h

+    RREdgeId edge_id(const RRNodeId& id, t_edge_size iedge) const {
+        RREdgeId first_edge = first_edge_[id];
+        RREdgeId ret((size_t)first_edge + iedge);
+        VTR_ASSERT_SAFE(ret < (&first_edge_[id])[1]);
+        return ret;
+    }


That we can do this easily given the edge ordering is a really nice feature, which I expect will make porting/refactoring downstream code much easier.

litghost · 2020-02-03T22:07:43Z

I think I've identified a way to regain some of the lost CPU performance. I've started a vtr_reg_weekly QoR run, but it won't return results for a day or two.

I'll push a new branch today with a rebase onto of master, along with fixes to review comments.

litghost · 2020-02-03T22:35:24Z

I believe all feedback has been addressed. I've kicked off a QoR run, results will be ready in a couple days. Preliminary results show the CPU cost is gone, with the 4% memory increase remaining. #1096 resolves the memory increase, and changes it to a 35-60% memory reduction.

Signed-off-by: Keith Rothman <[email protected]>

This should have a negliable performance impact, but this enables future changes to modify how rr nodes and rr edges are storaged. Signed-off-by: Keith Rothman <[email protected]>

This changes edge storage from an allocation array of struct per node to struct of array for all edge data. Several algorithms over edges that were previous per node per edge, but were actually just iteration over edges are now part of rr_node_storage. Signed-off-by: Keith Rothman <[email protected]>

Signed-off-by: Keith Rothman <[email protected]>

This enables 16-byte alignment (4 nodes per cache line). Signed-off-by: Keith Rothman <[email protected]>

Signed-off-by: Keith Rothman <[email protected]>

probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Jan 27, 2020

litghost requested a review from kmurray January 27, 2020 22:15

litghost force-pushed the refactor_edges2 branch 2 times, most recently from 28c4331 to 42693d6 Compare January 27, 2020 22:19

litghost requested a review from vaughnbetz January 27, 2020 22:20

HackerFoo reviewed Jan 27, 2020

View reviewed changes

litghost force-pushed the refactor_edges2 branch 2 times, most recently from e6267aa to 87d0ef8 Compare January 27, 2020 22:41

litghost changed the title ~~Initial refactoring of edge storage~~ WIP: Initial refactoring of edge storage Jan 27, 2020

litghost force-pushed the refactor_edges2 branch from 87d0ef8 to 25f946f Compare January 27, 2020 22:59

This was referenced Jan 29, 2020

RR graph edge storage refactoring #1079

Closed

Memory clean up to placer #1096

Closed

kmurray reviewed Jan 31, 2020

View reviewed changes

litghost mentioned this pull request Feb 3, 2020

Proxy rr node #1084

Closed

9 tasks

litghost force-pushed the refactor_edges2 branch from 25f946f to aca9c27 Compare February 3, 2020 22:29

probot-autolabeler bot added the libvtrutil label Feb 3, 2020

litghost changed the title ~~WIP: Initial refactoring of edge storage~~ Initial refactoring of edge storage Feb 3, 2020

litghost requested a review from kmurray February 3, 2020 22:53

litghost force-pushed the refactor_edges2 branch from 2b055d6 to bbf1d04 Compare February 5, 2020 19:48

litghost added 5 commits February 5, 2020 12:53

Move rr node storage behind an object.

5f83798

Signed-off-by: Keith Rothman <[email protected]>

Convert t_rr_node to a fly-weight object.

291f0ea

This should have a negliable performance impact, but this enables future changes to modify how rr nodes and rr edges are storaged. Signed-off-by: Keith Rothman <[email protected]>

Add support for custom allocator to vtr::vector.

0b15b11

Signed-off-by: Keith Rothman <[email protected]>

Split node ptc data away from core storage.

359f142

This enables 16-byte alignment (4 nodes per cache line). Signed-off-by: Keith Rothman <[email protected]>

litghost added 6 commits February 5, 2020 12:53

Rename t_rr_node_storage to t_rr_graph_storage.

5c1b331

Signed-off-by: Keith Rothman <[email protected]>

Add comment around state flags.

db9eed4

Signed-off-by: Keith Rothman <[email protected]>

Add missing flag check.

eb13bd0

Signed-off-by: Keith Rothman <[email protected]>

Add comments around edge sorting.

058192e

Signed-off-by: Keith Rothman <[email protected]>

Used function form of size_t().

98a6d3e

Signed-off-by: Keith Rothman <[email protected]>

Integrate schema based reader with edge refactoring.

e96a3ba

Signed-off-by: Keith Rothman <[email protected]>

litghost force-pushed the refactor_edges2 branch from bbf1d04 to e96a3ba Compare February 5, 2020 20:54

litghost closed this Mar 4, 2020

litghost deleted the refactor_edges2 branch March 4, 2020 01:02

Initial refactoring of edge storage #1085

Initial refactoring of edge storage #1085

Uh oh!

Conversation

litghost commented Jan 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HackerFoo commented Jan 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Jan 27, 2020

Uh oh!

litghost commented Jan 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmurray left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

litghost commented Feb 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Feb 3, 2020

Uh oh!

Uh oh!

litghost commented Jan 27, 2020 •

edited

Loading

HackerFoo commented Jan 27, 2020 •

edited

Loading

litghost commented Jan 28, 2020 •

edited

Loading

litghost commented Feb 3, 2020 •

edited

Loading