Skip to content

[codex] Tighten verification cache identity#1

Merged
sushaan-k merged 10 commits into
mainfrom
codex/vericode-cache-identity
May 18, 2026
Merged

[codex] Tighten verification cache identity#1
sushaan-k merged 10 commits into
mainfrom
codex/vericode-cache-identity

Conversation

@sushaan-k
Copy link
Copy Markdown
Owner

This PR hardens cache correctness in vericode.

The earlier cache work correctly separated results by language and by user-supplied implementation, but the cache key still ignored generation settings that can materially change the resulting proof artifact. In practice that meant runs with different temperatures or token budgets could incorrectly reuse one another's cached result.

This change includes those generation settings in the cache identity and expands the cache test suite to prove that language, existing code, temperature, and max-token changes all produce distinct cache entries.

Validation performed locally:

  • uv run pytest tests/test_cache.py -q
  • uv run pytest -q
  • uv run ruff check src tests
  • uv run mypy src

Result: full suite green with clean lint and typing.

sushaan-k and others added 10 commits March 30, 2026 20:22
Add a `complexity_score()` method to `Spec` that estimates verification
difficulty based on weighted contributions from postconditions, edge
cases, description length, preconditions, and invariants. Returns a
float in [0, 1]. Includes tests for boundary conditions and weight
dominance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cache successful verification results keyed by SHA-256 of
(canonical_spec + backend + provider). Subsequent runs with identical
inputs return the cached result immediately, skipping the LLM and
proof-assistant pipeline. Add --no-cache flag to the CLI verify command.
Tests use an autouse fixture to isolate the cache per test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three backends (Lean4, Dafny, Verus) now raise ProofCompilationError
with structured fields (backend_name, source_file, error_lines,
raw_output) instead of returning VerificationResult with success=False.
The ProofEngine catches these exceptions and converts them back to
VerificationResult for the refinement loop. Legacy keyword arguments
are preserved for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a --progress flag to the batch CLI command that shows a rich
progress bar with percentage and elapsed time. Add ProgressCallback
support to verify() so callers can receive stage notifications
(setup, generating, verified) during pipeline execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand the getting started guide with a complete end-to-end walkthrough
of verifying a sorting function: writing the YAML spec, running CLI
verification, using the Python API, checking complexity scores,
inspecting proof certificates, and using the verification cache.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sushaan-k sushaan-k marked this pull request as ready for review May 18, 2026 20:34
Copilot AI review requested due to automatic review settings May 18, 2026 20:34
@sushaan-k sushaan-k merged commit 36148c5 into main May 18, 2026
4 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants