Skip to content

chore: Tune Boogie encoding for better solver performance #6194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

fabiomadge
Copy link
Collaborator

@fabiomadge fabiomadge commented Apr 15, 2025

This PR contains three changes to our Boogie encoding to improve performance of Z3 when solving Dafny generated problems:

  • A higher weight for an axiom defining $Unbox and $Box as each other's inverse
  • A higher weight on the generated axioms defining the rank order on inductive datatypes
  • Lower eager_threshold and linear scaling of existing weights

I arrived at those by benchmarking on a set of code bases, most of which are closed source. They have been pseudonymized (A.0, A.1, B.old, and B.new). A.0 and A.1 share some code and talk about the same problem. B.old and B.new are snapshots of the some code base, but changes have been made to improve proof stability. The open source one, are the Dafny standard libraries. To ensure the changes don't overfit, I added another code base (C) on the final evaluation and tested on a more recent version of Z3 (4.14.1 vs. 4.12.1). The table below shows the changes to the runtime of Z3 on those code bases when making the proposed changes.

Z3 libraries A.0 A.1 B.old B.new C
4.12.1 -27,43 % -20,82 % -6,07 % -13,62 % -3,99 % -21,53 %
4.14.1 -14,42 % -27,63 % -21,63 % -1,45 % 5,36 % 9,17 %

Below, there are more detailed statistics.

Z3 v4.12.1

Comparing libraries...
Analyzed 4027 queries. Of those, 4019 lined up and 2 stopped working, while 6 started working.
The overall runtime improved by 27.43 %.
Of the matching queries 1030 (25.63 %, 41.13 % runtime share) significantly improved, 14 (0.35 %, 0.96 % runtime share) experienced a marked slowdown, and 2975 (74.02 %) stayed roughly the same.
Comparing A.0...
Analyzed 575 queries. Of those, 573 lined up and 0 stopped working, while 2 started working.
The overall runtime improved by 20.82 %.
Of the matching queries 167 (29.14 %, 34.97 % runtime share) significantly improved, 3 (0.52 %, 1.59 % runtime share) experienced a marked slowdown, and 403 (70.33 %) stayed roughly the same.
Comparing A.1...
Analyzed 1208 queries. Of those, 1201 lined up and 4 stopped working, while 3 started working.
The overall runtime improved by 6.07 %.
Of the matching queries 126 (10.49 %, 14.73 % runtime share) significantly improved, 13 (1.08 %, 8.27 % runtime share) experienced a marked slowdown, and 1062 (88.43 %) stayed roughly the same.
Comparing B.old...
Analyzed 12690 queries. Of those, 12689 lined up and 0 stopped working, while 1 started working.
The overall runtime improved by 13.62 %.
Of the matching queries 80 (0.63 %, 12.05 % runtime share) significantly improved, 10 (0.08 %, 0.13 % runtime share) experienced a marked slowdown, and 12599 (99.29 %) stayed roughly the same.
Comparing B.new...
Analyzed 18009 queries. Of those, 18007 lined up and 0 stopped working, while 2 started working.
The overall runtime improved by 3.99 %.
Of the matching queries 124 (0.69 %, 4.18 % runtime share) significantly improved, 111 (0.62 %, 0.58 % runtime share) experienced a marked slowdown, and 17772 (98.69 %) stayed roughly the same.
Comparing C...
Analyzed 25266 queries. Of those, 25250 lined up and 6 stopped working, while 10 started working.
The overall runtime improved by 21.53 %.
Of the matching queries 8495 (33.64 %, 43.30 % runtime share) significantly improved, 25 (0.10 %, 0.91 % runtime share) experienced a marked slowdown, and 16730 (66.26 %) stayed roughly the same.

Z3 v4.14.1

Comparing libraries...
Analyzed 4009 queries. Of those, 3995 lined up and 12 stopped working, while 2 started working.
The overall runtime improved by 14.42 %.
Of the matching queries 21 (0.53 %, 28.82 % runtime share) significantly improved, 39 (0.98 %, 2.88 % runtime share) experienced a marked slowdown, and 3935 (98.50 %) stayed roughly the same.
Comparing A.0...
Analyzed 582 queries. Of those, 575 lined up and 2 stopped working, while 5 started working.
The overall runtime improved by 27.63 %.
Of the matching queries 10 (1.74 %, 49.48 % runtime share) significantly improved, 13 (2.26 %, 18.34 % runtime share) experienced a marked slowdown, and 552 (96.00 %) stayed roughly the same.
Comparing A.1...
Analyzed 1213 queries. Of those, 1209 lined up and 0 stopped working, while 4 started working.
The overall runtime improved by 21.36 %.
Of the matching queries 17 (1.41 %, 32.39 % runtime share) significantly improved, 20 (1.65 %, 3.90 % runtime share) experienced a marked slowdown, and 1172 (96.94 %) stayed roughly the same.
Comparing B.old...
Analyzed 12679 queries. Of those, 12675 lined up and 4 stopped working, while 0 started working.
The overall runtime improved by 1.45 %.
Of the matching queries 21 (0.17 %, 11.38 % runtime share) significantly improved, 83 (0.65 %, 4.66 % runtime share) experienced a marked slowdown, and 12571 (99.18 %) stayed roughly the same.
Comparing B.new...
Analyzed 18011 queries. Of those, 17894 lined up and 116 stopped working, while 1 started working.
The overall runtime worsened by 5.36 %.
Of the matching queries 60 (0.34 %, 0.54 % runtime share) significantly improved, 154 (0.86 %, 2.73 % runtime share) experienced a marked slowdown, and 17680 (98.80 %) stayed roughly the same.
Comparing C...
Analyzed 25247 queries. Of those, 25223 lined up and 17 stopped working, while 7 started working.
The overall runtime worsened by 9.17 %.
Of the matching queries 64 (0.25 %, 6.73 % runtime share) significantly improved, 154 (0.61 %, 8.29 % runtime share) experienced a marked slowdown, and 25005 (99.14 %) stayed roughly the same.

@fabiomadge fabiomadge closed this Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant