Skip to content

Allow router to report successful routing even if RCV fails to resolve hold #1540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dpbaines
Copy link
Contributor

Description

Makes it so that RCV will report a successful routing regardless of whether hold has been resolved. This is just a simple check that allows it to report a successful routing if routing is feasible, and the maximum iteration count is reached. This doesn't fix the underlying problem of why RCV fails to resolve hold for some circuits (LU8PEEng and LU32PEEng) but it's a quick patch.

I want to figure out the underlying issue and also address that in this PR so I don't want to merge quite yet.

Related Issue

Closes #1533

Motivation and Context

Make RCV more robust and versatile on multiple circuits.

How Has This Been Tested?

I tested this on LU32PEEng and LU8PEEng circuits with harsh hold constraints and RCV enabled, and verified that it reports a successful routing even though hold isn't resolved. I don't see a point in running the full qor tests as this isn't a possibly breaking change yet.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Sep 13, 2020
@dpbaines
Copy link
Contributor Author

I don't want to merge this yet, as I'm hoping to get around to figuring out why the LU circuits are failing, and fix those.

@mithro
Copy link
Contributor

mithro commented Sep 14, 2020

@dpbaines -- Any chance we could fail by default (and instead add a flag which allows things to report success even if there are hold violations)? Designs with hold violations are broken. Reporting that PnR is successful when the design is broken violates the contract that a successful exit code means a successfully place and routed design which will work.

@vaughnbetz
Copy link
Contributor

@mithro I think we should report routing success in this case; the hold violation will still be reported, so reporting routing success does not imply timing success.

This matches what commercial tools do -- Quartus will report a successful routing but failed hold (or setup) timing constraints if any failed constraints are present. You can still analyze routability, look at the routing, get a timing report etc. Simply stopping makes it harder to look at the output (don't get a full analysis), and precludes programming a board (useful as hold margins usually have guardband, so you could still test to see if it works on a board. If the failure is small, it usually will work).

@@ -1987,9 +1987,11 @@ bool is_iteration_complete(bool routing_is_feasible, const t_router_opts& router
//in addition to routing being legal and the correct budgeting algorithm being set.

if (routing_is_feasible) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would re-order this if
Test the non-RCV (routing_budgets_algorithm != YOYO) first, so someone who is not thinking of RCV has less to read. Then the rest of the code.

if (router_opts.routing_budgets_algorithm != YOYO) {
return true;
} else if (router_opts.routing_budgets_algorithm == YOYO && (timing_info->hold_worst_negative_slack() == 0 || rcv_finished) && itry != 1) {
} else if (router_opts.routing_budgets_algorithm == YOYO && (timing_info->hold_worst_negative_slack() == 0 || can_router_finish) && itry != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the test for hold_worst_negative_slack be >= 0? Seems a lot safer than a test for precisely 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, but worst_negative_slack is equal to zero if worst hold slack is positive. Therefore they would equate to the same thing, but I could change it in case that changes in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, lets do >= for safety and readability though.

@vaughnbetz
Copy link
Contributor

I think a cleaner fix is possible.

  1. is_iteration_complete is not a very clear name for this function. The header file also doesn't have a function description. A better name would be consider_routing_legal ()

  2. A larger restructuring (maybe more than you want to do right now) would be:

  • replace the call to is_iteration_complete with a call to is_feasible_routing
  • make sure the is_better_quality_routing routine checks for hold and store the routing that has the best timing (first priority = hold if hold slack < 0, second priority = setup slack / current code).
  • if you can resolve more hold violations with more iteration, then you'd restructure some later code to also check a new hold_not_legal variable (not just how many convergences there are).

I think this would be cleaner and more powerful, but it would take more testing.

  1. I think code that says the routing is legal, even if Th is not resolved in the end, should go in no matter what. No algorithm is perfect and sometimes hold violations are not resolvable without running out of wiring. So even if you can resolve LU8 and LU32's hold problems, code to say the routing is legal should be added.

@dpbaines
Copy link
Contributor Author

@vaughnbetz Your suggestions make sense, I'll try to take a better look at them this week at some point.

@dpbaines
Copy link
Contributor Author

@vaughnbetz I've implemented your suggestions so far. I've eliminated the is_iteration_complete function, and instead modified the is_better_quality_routing function. If RCV is enabled the is_better_quality_routing function will prioritize routing with improved hold slack, and then setup slack. Otherwise it uses the old code path.

It will also indicate that routing was successful, however it won't allow the router to break early unless either hold slack >= 0, or RCV detects it can finish early, as exists in the current code. I haven't had a lot of time to figure out why LU8 and LU32 struggle to resolve, so I wanna keep picking away at it.

Let me know what you think of the current code. Or if you think is_better_quality should always prioritize hold even in the regular router (which would clean up code).

I have tested this on a few key circuits and it seems to be working well. For LU8 and LU32 it returns the routing that had the lowest negative hold slack.

@@ -2007,35 +2006,66 @@ bool should_setup_lower_bound_connection_delays(int itry, const t_router_opts& /
static bool is_better_quality_routing(const vtr::vector<ClusterNetId, t_traceback>& best_routing,
const RoutingMetrics& best_routing_metrics,
const WirelengthInfo& wirelength_info,
std::shared_ptr<const SetupHoldTimingInfo> timing_info) {
std::shared_ptr<const SetupHoldTimingInfo> timing_info,
const t_router_opts& router_opts) {
if (best_routing.empty()) {
return true; //First legal routing
}

//Rank first based on sWNS, followed by other timing metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem to match the code below.

} else if (timing_info->setup_total_negative_slack() < best_routing_metrics.sTNS) {
return false;
}
if (timing_info->hold_worst_negative_slack() > best_routing_metrics.hWNS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the RCV off case? If so, why is it making decisions based on hold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that check was there as a secondary quality check if the setup slacks were tied, to allow the router to pick routing with improved hold. I get that this would be unlikely though, so I guess it makes sense to remove this check for non-RCV routing.

}
if (timing_info->hold_worst_negative_slack() > best_routing_metrics.hWNS) {
return true;
} else if (timing_info->hold_worst_negative_slack() > best_routing_metrics.hWNS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the if and else if condition seem to be the same? Copy and paste error?

} else if (timing_info->hold_worst_negative_slack() > best_routing_metrics.hWNS) {
return false;
}
if (timing_info->hold_total_negative_slack() > best_routing_metrics.hTNS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment as this looks like a repeat of the prior 5 lines of code until you look very closely (total negative slack instead of worst negative slack).

return true;
} else if (timing_info->hold_total_negative_slack() > best_routing_metrics.hTNS) {
return false;
if (timing_info->setup_worst_negative_slack() > best_routing_metrics.sWNS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could reduce code duplication and make the code more readable by making helper functions: setup_comparison and hold_comparison that encapsulate the (1) test worst case slack first and then check total negative slack, and return better, worse or tied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those routines would also give you a place to comment that we don't care about positive slack (consider it a tie) and only check to see if negative slacks have improved.

|| early_reconvergence_exit_heuristic(router_opts, itry_since_last_convergence, timing_info, best_routing_metrics)) {
if ((legal_convergence_count >= router_opts.max_convergence_count
|| router_iteration_stats.connections_routed == 0
|| early_reconvergence_exit_heuristic(router_opts, itry_since_last_convergence, timing_info, best_routing_metrics))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move hold_legal into the early_reconvergence_exit_heuristic routine, which I think would be easier to read by simplifying this condition and giving you a place to comment why you're not saying you're exiting early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be easier to read, but because the default max_convergence_count is 1, the router would stop early regardless. What I could do is to move all of these conditions into a seperate function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outstanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When RCV is selected you could change the legal_convergence_count default to some larger number (we can have dependent option defaults; take a look at the option parsing code and you should find an example). Then if the early_reconvergence_exit_heuristic understands hold and rcv it can do the right thing. The if statement won't look as long.

You could also move the whole if into a control routine; that would also work. But I think changing legal_convergence_count makes sense no matter what, as I believe you've found RCV wants a higher value for this setting. Better to do that by actually changing the value than writing if statements to work around it.

@vaughnbetz
Copy link
Contributor

Thanks David. I have some questions and suggestions above. One is major: I think the code is backwards, and others are useful for readability.

@@ -579,11 +588,17 @@ bool try_timing_driven_route_tmpl(const t_router_opts& router_opts,
pres_fac = update_pres_fac(router_opts.initial_pres_fac);
}

if (budgeting_inf.if_set()) {
// If total negative slack is greater than or equal to 0, or RCV is finished, allow router to complete
hold_legal = timing_info->hold_total_negative_slack() >= 0 || rcv_finished_count == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this variable doesn't really mean hold is legal (it seems to mean RCV should stop trying) then you should give it a different name.

} else if (timing_info->hold_worst_negative_slack() > best_routing_metrics.hWNS) {
return false;
}
if (hold_comparison > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code could be shortened by checking if YOYO is enabled, and if so, checking hold and returning whatever is better, or doing nothing in a tie. Then check setup (no matter what), then wirelength.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I should have seen this! Will fix thanks.

@vaughnbetz
Copy link
Contributor

The compare_setup and compare_hold routines have cleaned up some code. I left some more comments on ways to shorten / clarify further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RCV should report a successful routing even if hold isn't completely resolved
3 participants