Skip to content

Conversation

@samuelchassot
Copy link
Contributor

What was changed?

Add a new axiom for the reflexivity of $HeapSucc in the prelude, fixing #1461.

How has this been tested?

A new test in IntegrationTests/TestFiles/LitTests/LitTest/dafny4/git-issue1461.dfy.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@samuelchassot samuelchassot changed the title Sam/fix1461 Fix #1461 Add a reflexivity axiom for $HeapSucc Jun 4, 2025
robin-aws
robin-aws previously approved these changes Jun 4, 2025
Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@robin-aws robin-aws enabled auto-merge (squash) June 4, 2025 19:38
auto-merge was automatically disabled June 4, 2025 21:09

Head branch was pushed to by a user without write access

@samuelchassot
Copy link
Contributor Author

@robin-aws Could reapprove the workflows please? Some tests failed

@robin-aws
Copy link
Member

Gladly, but I see a unit test failure you still need to fix too first: https://github.com/dafny-lang/dafny/actions/runs/15448207454/job/43492837619

@samuelchassot
Copy link
Contributor Author

Gladly, but I see a unit test failure you still need to fix too first: https://github.com/dafny-lang/dafny/actions/runs/15448207454/job/43492837619

I think we're good to go. I still have a few tests that fail locally, but they seem unrelated to me. So I think the CI can run and will see.

I'm also trying to fix my setup to that these tests pass locally, that'd help for the future.

@samuelchassot
Copy link
Contributor Author

samuelchassot commented Jun 4, 2025

I fixed the amount of resource in the expected output for SchorrWaite.dfy.
For the others (namely server/counterexample_commandline.dfy and wishlist/git-issue-6158.dfy)
https://github.com/dafny-lang/dafny/actions/runs/15453393190/job/43500878839?pr=6260

I don't know yet why they are failing given what changed. I'll investigate.

For counterexample_commandlin.dfy, the diff is

- assume this != null && |this.p| > 0 && |s| > 0 && 1 == |s| && '\0' == s[0] && 1 == |this.p| && '?' == this.p[0];
+ assume this != null && |this.p| > 0 && |s| > 0 && 1 == |s| && '\U{0001}' == s[0] && 1 == |this.p| && '?' == this.p[0];

which seems weird as why the axiom can change this. Maybe the new axiom changes the context which changes what counterexample is found. The one we get now seems correct so I updated the expected output.

@samuelchassot
Copy link
Contributor Author

And the other test which is failing starts with

// WISH: The following example should NOT verify.

// The following program shows an unsoundness in the Dafny 4.10 / Boogie / Z3 4.12.1 tool stack.
// Trouble-shooting investigations point fingers at Z3 4.12.1, because minute changes to the
// generated Boogie or changing to newer versions of Z3 no longer show signs of problems.

// The example is added to test suite for now, so we can keep track of it. It lives in the
// Test/wishlist folder to indicate that something is wrong. If a change to Dafny (e.g., a
// change to a newer version of Z3) causes this example to verify, then we should move this
// file to the Test/git-issues folder and indicate that we expect an error.

and now it does not verify anymore.

As this seems to be desirable, I'm moving the test + changing the expected output. This is open for discussion, but the diffs in the PR will help in that regard

@fabiomadge
Copy link
Collaborator

Do you have a reason for putting the new test into dafny4?

@samuelchassot
Copy link
Contributor Author

samuelchassot commented Jun 5, 2025

Do you have a reason for putting the new test into dafny4?

No, I didn't the git-issues folder when I created this test. Let me move that to git-issues.

@samuelchassot
Copy link
Contributor Author

samuelchassot commented Jun 5, 2025

The failing test git-issue6158.dfy is fixed.

The SchorrWaite test is fixed too (EDIT: I didn't realize that there were 2 expected output files for this one, hence my confusion about the resource usage).

@samuelchassot
Copy link
Contributor Author

Could someone authorize the workflows? Now, only the std library should fail, I'm working on it!

@samuelchassot
Copy link
Contributor Author

samuelchassot commented Jun 10, 2025

After adding a weight annotation to {:weight 1}, I get this performance:

EDIT: so weight = 1 is the default, so this should not affect anything. So this data shows what I obtained while comparing the 2 branches, without weights.

Found 4 projects: cedar_spec, cedar_symeval, libraries, B.new
Comparing cedar_spec...
Analyzed 591 queries. Of those, 568 lined up and 13 stopped working, while 10 started working.
The overall runtime improved by 4.07 %.
Of the matching queries 7 (1.23 %, 38.60 % runtime share) significantly improved, 128 (22.54 %, 23.94 % runtime share) experienced a marked slowdown, and 433 (76.23 %) stayed roughly the same.
Comparing cedar_symeval...
Analyzed 1220 queries. Of those, 1198 lined up and 10 stopped working, while 12 started working.
The overall runtime improved by 4.56 %.
Of the matching queries 95 (7.93 %, 16.27 % runtime share) significantly improved, 41 (3.42 %, 8.29 % runtime share) experienced a marked slowdown, and 1062 (88.65 %) stayed roughly the same.
Comparing libraries...
Analyzed 4079 queries. Of those, 3916 lined up and 82 stopped working, while 81 started working.
The overall runtime worsened by 2.17 %.
Of the matching queries 147 (3.75 %, 11.18 % runtime share) significantly improved, 172 (4.39 %, 8.55 % runtime share) experienced a marked slowdown, and 3597 (91.85 %) stayed roughly the same.
Comparing B.new...
Analyzed 19878 queries. Of those, 19620 lined up and 129 stopped working, while 129 started working.
The overall runtime improved by 14.56 %.
Of the matching queries 8016 (40.86 %, 38.92 % runtime share) significantly improved, 28 (0.14 %, 0.27 % runtime share) experienced a marked slowdown, and 11576 (59.00 %) stayed roughly the same.

@samuelchassot
Copy link
Contributor Author

samuelchassot commented Jun 10, 2025

I think that's more manageable than what you got @fabiomadge on the previous version. The differences seem attributable to noise here, or actually better with the new axiom.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants