[codex] Tighten verification cache identity by sushaan-k · Pull Request #1 · sushaan-k/leancode

sushaan-k · 2026-03-31T02:29:58Z

This PR hardens cache correctness in vericode.

The earlier cache work correctly separated results by language and by user-supplied implementation, but the cache key still ignored generation settings that can materially change the resulting proof artifact. In practice that meant runs with different temperatures or token budgets could incorrectly reuse one another's cached result.

This change includes those generation settings in the cache identity and expands the cache test suite to prove that language, existing code, temperature, and max-token changes all produce distinct cache entries.

Validation performed locally:

uv run pytest tests/test_cache.py -q
uv run pytest -q
uv run ruff check src tests
uv run mypy src

Result: full suite green with clean lint and typing.

Add a `complexity_score()` method to `Spec` that estimates verification difficulty based on weighted contributions from postconditions, edge cases, description length, preconditions, and invariants. Returns a float in [0, 1]. Includes tests for boundary conditions and weight dominance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cache successful verification results keyed by SHA-256 of (canonical_spec + backend + provider). Subsequent runs with identical inputs return the cached result immediately, skipping the LLM and proof-assistant pipeline. Add --no-cache flag to the CLI verify command. Tests use an autouse fixture to isolate the cache per test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All three backends (Lean4, Dafny, Verus) now raise ProofCompilationError with structured fields (backend_name, source_file, error_lines, raw_output) instead of returning VerificationResult with success=False. The ProofEngine catches these exceptions and converts them back to VerificationResult for the refinement loop. Legacy keyword arguments are preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a --progress flag to the batch CLI command that shows a rich progress bar with percentage and elapsed time. Add ProgressCallback support to verify() so callers can receive stage notifications (setup, generating, verified) during pipeline execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Expand the getting started guide with a complete end-to-end walkthrough of verifying a sorting function: writing the YAML spec, running CLI verification, using the Python API, checking complexity scores, inspecting proof certificates, and using the verification cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

sushaan-k and others added 10 commits March 30, 2026 20:22

tighten verification cache identity

f6f964e

apply ci formatting

bf6273b

Add cache inspection command

f53c930

Add cache entry listing

4cd3f56

Preserve cache stats JSON shape

a457480

sushaan-k marked this pull request as ready for review May 18, 2026 20:34

Copilot AI review requested due to automatic review settings May 18, 2026 20:34

sushaan-k merged commit 36148c5 into main May 18, 2026
4 checks passed

Copilot started reviewing on behalf of sushaan-k May 18, 2026 20:35 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Tighten verification cache identity#1

[codex] Tighten verification cache identity#1
sushaan-k merged 10 commits into
mainfrom
codex/vericode-cache-identity

sushaan-k commented Mar 31, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sushaan-k commented Mar 31, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants