-
Notifications
You must be signed in to change notification settings - Fork 23
feat: make (fixed-width) evaluation return a non-zero exitcode on mismatch #1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…match This PR changes the evaluation script to return a non-zero exitcode whenever the ripgrep checks find a mismatched number (i.e., whenever it currently prints FAIL to the log). This exitcode should cause the CI to fail, meaning we no longer have to manually check the logs to confirm the evaluation actually got the right numbers. It does mean that now whenever we do expect the numbers to change, we have to change the expected numbers in the script before CI can pass, but forcing us to keep these numbers in sync with reality seems more like a feature than a bug.
|
Alive Statistics: 90 / 93 (3 failed) |
|
This PR is in principle ready for review: I've drafted it because I felt like I should test it properly on CI, to confirm it does actually fail on mismatch, but CI seems to not work at all atm |
|
Alive Statistics: 90 / 93 (3 failed) |
|
bitwuzla and leanSAT provided counterexample for theorem 3 in file gapinthcast_proof.lean |
This PR changes the evaluation script to return a non-zero exitcode whenever the ripgrep checks find a mismatched number (i.e., whenever it currently prints FAIL to the log). This exitcode should cause the CI to fail, meaning we no longer have to manually check the logs to confirm the evaluation actually got the right numbers.
It does mean that now whenever we do expect the numbers to change, we have to change the expected numbers in the script before CI can pass, but forcing us to keep these numbers in sync with reality seems more like a feature than a bug.