[codex] Tighten verification cache identity#1
Merged
Conversation
Add a `complexity_score()` method to `Spec` that estimates verification difficulty based on weighted contributions from postconditions, edge cases, description length, preconditions, and invariants. Returns a float in [0, 1]. Includes tests for boundary conditions and weight dominance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cache successful verification results keyed by SHA-256 of (canonical_spec + backend + provider). Subsequent runs with identical inputs return the cached result immediately, skipping the LLM and proof-assistant pipeline. Add --no-cache flag to the CLI verify command. Tests use an autouse fixture to isolate the cache per test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three backends (Lean4, Dafny, Verus) now raise ProofCompilationError with structured fields (backend_name, source_file, error_lines, raw_output) instead of returning VerificationResult with success=False. The ProofEngine catches these exceptions and converts them back to VerificationResult for the refinement loop. Legacy keyword arguments are preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a --progress flag to the batch CLI command that shows a rich progress bar with percentage and elapsed time. Add ProgressCallback support to verify() so callers can receive stage notifications (setup, generating, verified) during pipeline execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand the getting started guide with a complete end-to-end walkthrough of verifying a sorting function: writing the YAML spec, running CLI verification, using the Python API, checking complexity scores, inspecting proof certificates, and using the verification cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR hardens cache correctness in
vericode.The earlier cache work correctly separated results by language and by user-supplied implementation, but the cache key still ignored generation settings that can materially change the resulting proof artifact. In practice that meant runs with different temperatures or token budgets could incorrectly reuse one another's cached result.
This change includes those generation settings in the cache identity and expands the cache test suite to prove that language, existing code, temperature, and max-token changes all produce distinct cache entries.
Validation performed locally:
uv run pytest tests/test_cache.py -quv run pytest -quv run ruff check src testsuv run mypy srcResult: full suite green with clean lint and typing.