Skip to content

Conversation

@alexkeizer
Copy link
Collaborator

This PR changes the evaluation script to return a non-zero exitcode whenever the ripgrep checks find a mismatched number (i.e., whenever it currently prints FAIL to the log). This exitcode should cause the CI to fail, meaning we no longer have to manually check the logs to confirm the evaluation actually got the right numbers.

It does mean that now whenever we do expect the numbers to change, we have to change the expected numbers in the script before CI can pass, but forcing us to keep these numbers in sync with reality seems more like a feature than a bug.

…match

This PR changes the evaluation script to return a non-zero exitcode
whenever the ripgrep checks find a mismatched number (i.e., whenever it
currently prints FAIL to the log). This exitcode should cause the CI to
fail, meaning we no longer have to manually check the logs to confirm
the evaluation actually got the right numbers.

It does mean that now whenever we do expect the numbers to change, we
have to change the expected numbers in the script before CI can pass,
but forcing us to keep these numbers in sync with reality seems more
like a feature than a bug.
@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@alexkeizer alexkeizer marked this pull request as draft June 25, 2025 12:26
@alexkeizer
Copy link
Collaborator Author

This PR is in principle ready for review: I've drafted it because I felt like I should test it properly on CI, to confirm it does actually fail on mismatch, but CI seems to not work at all atm

@github-actions
Copy link
Contributor

Alive Statistics: 90 / 93 (3 failed)

@github-actions
Copy link
Contributor

bitwuzla and leanSAT provided counterexample for theorem 3 in file gapinthcast_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gexact_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 40 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 51 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 53 in file gdivhshift_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gdistribute_proof.lean
bitwuzla proved and leanSAT failed theorem 12 in file gnarrowhmath_proof.lean
bitwuzla proved and leanSAT failed theorem 9 in file gapinthsub_proof.lean
bitwuzla proved and leanSAT failed theorem 17 in file grem_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 13 in file gapinthselect_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 15 in file gapinthselect_proof.lean
bitwuzla proved and leanSAT failed theorem 1 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 3 in file gadd4_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gtrunc_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2008h02h23hMulSub_proof.lean
bitwuzla proved and leanSAT failed theorem 0 in file g2010h11h23hDistributed_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 1 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 5 in file gapinthdiv2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 7 in file gapinthdiv2_proof.lean
bitwuzla proved and leanSAT failed theorem 4 in file gadd_or_sub_proof.lean
bitwuzla proved and leanSAT failed theorem 15 in file gshifthshift_proof.lean
bitwuzla proved and leanSAT failed theorem 7 in file gtrunchinseltpoison_proof.lean
bitwuzla proved and leanSAT failed theorem 14 in file gshlhfactor_proof.lean
bitwuzla proved and leanSAT failed theorem 21 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 22 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 23 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 24 in file greassociatehnuw_proof.lean
bitwuzla proved and leanSAT failed theorem 6 in file gadd2_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthadd_proof.lean
bitwuzla proved and leanSAT failed theorem 5 in file g2012h08h28hudiv_ashl_proof.lean
bitwuzla and leanSAT provided counterexample for theorem 9 in file gapinthand_proof.lean
bitwuzla proved and leanSAT failed theorem 13 in file gsubhxorhcmp_proof.lean
In total we found 7978 goals.
leanSAT and Bitwuzla solved: 7914
leanSAT and Bitwuzla provided 8 counterexamples
leanSAT and Bitwuzla both failed on 29 theorems
leanSAT failed and Bitwuzla succeeded on 27 theorems
leanSAT succeeded and Bitwuzla failed on 0 theorems
There were 0 inconsistencies
Errors raised: 85
ran rg 'Bitwuzla failed' | wc -l, expected 29, found 0, FAIL
ran rg 'LeanSAT failed' | wc -l, expected 56, found 0, FAIL
ran rg 'LeanSAT provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'Bitwuzla provided a counter' | wc -l, expected 8, found 0, FAIL
ran rg 'LeanSAT proved' | wc -l, expected 7914, found 0, FAIL
ran rg 'Bitwuzla proved' | wc -l, expected 7941, found 0, FAIL
error The SAT solver timed out while solving the problem. was raised 56 times
error The SMT solver timed out while solving the problem. was raised 29 times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants