Skip to content

Commit 571d3f4

Browse files
[FGParallelRouter] Updated Barrier to C++20 Std Barrier
The fine-grained parallel router was originally built before VTR upgraded to C++20, so we had to roll our own barrier. We originally had two barriers: spin barriers (thread spin on a lock while waiting) and a "mutex" barrer (where threads wait on a condition variable and potentially went to sleep). Through experimentation, found that the choice of barrier implementation did not matter; however, the standard barrier provides slight performance improvements for very long routes and has a much cleaner interface. Moved the FG parallel router to the standard barrier. The old implementations are left in as classes in case c++20 is not preferred for some users. Also added a QoR script to make parsing FG parallel router runs easier.
1 parent 8386eac commit 571d3f4

File tree

2 files changed

+56
-7
lines changed

2 files changed

+56
-7
lines changed

vpr/src/route/parallel_connection_router.h

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "multi_queue_d_ary_heap.h"
88

99
#include <atomic>
10+
#include <barrier>
1011
#include <thread>
1112
#include <mutex>
1213
#include <condition_variable>
@@ -48,7 +49,6 @@ class spin_lock_t {
4849
* condition variable to coordinate thread synchronization.
4950
*/
5051
class barrier_mutex_t {
51-
// FIXME: Try std::barrier (since C++20) to replace this mutex barrier
5252
std::mutex mutex_;
5353
std::condition_variable cv_;
5454
size_t count_;
@@ -61,17 +61,22 @@ class barrier_mutex_t {
6161
* @param num_threads Number of threads that must call wait() before
6262
* any thread is allowed to proceed
6363
*/
64-
explicit barrier_mutex_t(size_t num_threads)
64+
explicit inline barrier_mutex_t(size_t num_threads)
6565
: count_(num_threads)
6666
, max_count_(num_threads) {}
6767

68+
/**
69+
* Initialization method goes unused by this barrier implementation.
70+
*/
71+
inline void init() {}
72+
6873
/**
6974
* @brief Blocks the calling thread until all threads have called wait()
7075
*
7176
* When the specified number of threads have called this method, all
7277
* threads are unblocked and the barrier is reset for the next use.
7378
*/
74-
void wait() {
79+
inline void wait() {
7580
std::unique_lock<std::mutex> lock{mutex_};
7681
size_t gen = generation_;
7782
if (--count_ == 0) {
@@ -111,13 +116,13 @@ class barrier_spin_t {
111116
* @param num_threads Number of threads that must call wait() before
112117
* any thread is allowed to proceed
113118
*/
114-
explicit barrier_spin_t(size_t num_threads) { num_threads_ = num_threads; }
119+
explicit inline barrier_spin_t(size_t num_threads) { num_threads_ = num_threads; }
115120

116121
/**
117122
* @brief Initializes the thread-local sense flag
118123
* @note Should be called by each thread before first using the barrier.
119124
*/
120-
void init() {
125+
inline void init() {
121126
local_sense_ = false;
122127
}
123128

@@ -128,7 +133,7 @@ class barrier_spin_t {
128133
* to arrive unblocks all waiting threads. This method avoids using locks or
129134
* condition variables, making it potentially more efficient for short waits.
130135
*/
131-
void wait() {
136+
inline void wait() {
132137
bool s = !local_sense_;
133138
local_sense_ = s;
134139
size_t num_arrivals = count_.fetch_add(1) + 1;
@@ -142,7 +147,41 @@ class barrier_spin_t {
142147
}
143148
};
144149

145-
using barrier_t = barrier_spin_t; // Using the spin-based thread barrier
150+
/**
151+
* @brief Thread barrier implementation using std::barrier
152+
*
153+
* It ensures all participating threads reach a synchronization point
154+
* before any are allowed to proceed further.
155+
*/
156+
class standard_barrier_t {
157+
/// @brief Internal barrier implementation.
158+
std::barrier<> barrier_;
159+
160+
public:
161+
/**
162+
* @brief Constructs a barrier for a specific number of threads
163+
*
164+
* @param num_threads
165+
* Number of threads that must call wait() before any thread is allowed
166+
* to proceed.
167+
*/
168+
explicit inline standard_barrier_t(size_t num_threads)
169+
: barrier_(num_threads) {}
170+
171+
/**
172+
* Initialization method goes unused by this barrier implementation.
173+
*/
174+
inline void init() {}
175+
176+
/**
177+
* @brief Blocks the calling thread until all threads have called wait()
178+
*/
179+
inline void wait() {
180+
barrier_.arrive_and_wait();
181+
}
182+
};
183+
184+
using barrier_t = standard_barrier_t; // Using the standard thread barrier
146185

147186
/**
148187
* @class ParallelConnectionRouter implements the MultiQueue-based parallel connection
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# This collects QoR data that is interesting for the Fine-Grained Parallel
2+
# Router running on a fixed channel width.
3+
4+
vpr_status;output.txt;vpr_status=(.*)
5+
crit_path_delay;vpr.out;Critical path: (.*) ns
6+
post_route_wirelength;vpr.out;\s*Total wirelength: (\d+)
7+
total_connection_pathsearch_time;vpr.out;.*Time spent on path search: (.*) seconds.
8+
route_runtime;vpr.out;Routing took (.*) seconds
9+
total_runtime;vpr.out;The entire flow of VPR took (.*) seconds
10+
magic_cookie;vpr.out;Serial number \(magic cookie\) for the routing is: (.*)

0 commit comments

Comments
 (0)