Skip to content

Conversation

@luisacicolini
Copy link
Contributor

@luisacicolini luisacicolini commented Jun 20, 2025

This PR adds a minimal dockerfile for the artifact evaluation. #898 had a rather large diff with main and included a few changes non strictly docker-related. However, for the time being, I'd like to start adding this minimal docker, which I will then update step-by-step as we finalize the structure of our evaluation.

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

Copy link
Collaborator

@bollu bollu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as the first version 😄

@github-actions
Copy link
Contributor

bitwuzla and leanSAT provided counterexample for theorem 3 in file gapinthcast_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 40 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file gnarrowhmath_proof.lean
bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 13 in file gapinthselect_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 15 in file gapinthselect_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 1 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 5 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 7 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file gshifthshift_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gtrunchinseltpoison_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gadd2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthadd_proof.lean
bitwuzla proved and leanSAT failed theorem 5 in file g2012h08h28hudiv_ashl_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthand_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean
In total we found 7976 goals.
leanSAT and Bitwuzla solved: 7912
leanSAT and Bitwuzla provided 8 counterexamples
leanSAT and Bitwuzla both failed on 31 theorems
leanSAT failed and Bitwuzla succeeded on 25 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 112
ran rg 'Bitwuzla failed' | wc -l, expected 31, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 56, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 7912, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 7937, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 56 times
error The SMT solver timed out while solving the problem. was raised 31 times

- docker cp temp-container:/code/lean-mlir/bv-evaluation/raw-data ./docker-results
- docker rm temp-container
$ docker load -i opencompl-ssa.tar
$ docker run -it siddudruid/opencompl-ssa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have siddudruid here? In fact, why do we change this file at all? Should we maybe name it artifact-conference-.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@tobiasgrosser tobiasgrosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing. Thank you, @luisacicolini!

@tobiasgrosser
Copy link
Collaborator

Thank you @luisacicolini for the artefact. Thank you @bollu for the review. I am happy to see this go in. Consider my comments suggestions but no blockers.

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

1 similar comment
@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

1 similar comment
@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@luisacicolini luisacicolini enabled auto-merge June 22, 2025 22:01
@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@luisacicolini luisacicolini added this pull request to the merge queue Jun 22, 2025
Merged via the queue into main with commit ea9017b Jun 22, 2025
4 checks passed
@github-actions
Copy link
Contributor

bitwuzla and leanSAT provided counterexample for theorem 3 in file gapinthcast_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 19 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 40 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file gnarrowhmath_proof.lean
bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 13 in file gapinthselect_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 15 in file gapinthselect_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gsdivhcanonicalize_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 1 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 5 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 7 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file gshifthshift_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gadd2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthadd_proof.lean
bitwuzla proved and leanSAT failed theorem 5 in file g2012h08h28hudiv_ashl_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthand_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean
In total we found 7976 goals.
leanSAT and Bitwuzla solved: 7912
leanSAT and Bitwuzla provided 8 counterexamples
leanSAT and Bitwuzla both failed on 30 theorems
leanSAT failed and Bitwuzla succeeded on 26 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 111
ran rg 'Bitwuzla failed' | wc -l, expected 30, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 56, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 7912, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 7938, found 0, FAIL
error (kernel) deep recursion detected was raised 25 times
error The SAT solver timed out while solving the problem. was raised 56 times
error The SMT solver timed out while solving the problem. was raised 30 times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants