Skip to content

Commit 36148c5

Browse files
authored
Merge pull request #1 from sushaan-k/codex/vericode-cache-identity
[codex] Tighten verification cache identity
2 parents 2a6f803 + a457480 commit 36148c5

18 files changed

Lines changed: 1602 additions & 114 deletions

docs/getting_started.md

Lines changed: 188 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
3. a machine-check result
88
4. a certificate binding the spec, code, and proof bundle together
99

10-
This guide covers the shortest path from install to a verified run.
10+
This guide covers the shortest path from install to a verified run, then
11+
walks through a complete example verifying a sorting function.
1112

1213
## Install
1314

@@ -59,6 +60,167 @@ export DEEPSEEK_API_KEY=...
5960
If you do not pass a provider explicitly in Python, the top-level `verify()`
6061
path defaults to the Anthropic provider.
6162

63+
## Walkthrough: Verifying a Sorting Function
64+
65+
This section walks through verifying a sorting function end-to-end, from
66+
writing the spec to inspecting the proof certificate.
67+
68+
### Step 1: Write the specification
69+
70+
Create a file called `sort_spec.yaml`:
71+
72+
```yaml
73+
description: >
74+
Sort a list of integers in non-decreasing order.
75+
The output must be a permutation of the input.
76+
The function must handle empty lists and single-element lists.
77+
function_name: sort
78+
input_types:
79+
lst: list[int]
80+
output_type: list[int]
81+
preconditions: []
82+
postconditions:
83+
- "is_sorted(result)"
84+
- "is_permutation(result, lst)"
85+
edge_cases:
86+
- "lst == []"
87+
- "len(lst) == 1"
88+
```
89+
90+
The specification tells `vericode` exactly what the function must satisfy.
91+
Postconditions are the properties the formal proof will encode.
92+
93+
### Step 2: Run verification from the CLI
94+
95+
```bash
96+
vericode verify --spec sort_spec.yaml --lang python --backend lean4
97+
```
98+
99+
`vericode` will:
100+
101+
1. Parse the YAML spec into a structured `Spec` object.
102+
2. Send the spec to the LLM, which generates both a Python implementation and
103+
a Lean 4 proof in a single pass.
104+
3. Compile the proof with the Lean 4 toolchain.
105+
4. If the proof fails, feed the compiler errors back to the LLM for refinement.
106+
5. Repeat up to `--max-iterations` times (default 5).
107+
108+
On success you will see output like:
109+
110+
```
111+
vericode
112+
Function: sort
113+
Language: python
114+
Backend: lean4
115+
Provider: anthropic
116+
117+
Verification successful!
118+
Iterations: 2
119+
120+
Implementation
121+
def sort(lst: list[int]) -> list[int]:
122+
...
123+
124+
Proof
125+
theorem sort_correct :
126+
...
127+
128+
Proof Receipt
129+
{
130+
"spec_hash": "a1b2c3...",
131+
"code_hash": "d4e5f6...",
132+
"proof_hash": "789abc...",
133+
"backend": "lean4",
134+
"timestamp": "2026-03-30T12:00:00+00:00",
135+
"verified": true
136+
}
137+
```
138+
139+
### Step 3: Run the same verification from Python
140+
141+
```python
142+
import asyncio
143+
from vericode import Spec, verify
144+
145+
146+
async def main() -> None:
147+
spec = Spec(
148+
description=(
149+
"Sort a list of integers in non-decreasing order. "
150+
"The output must be a permutation of the input."
151+
),
152+
function_name="sort",
153+
input_types={"lst": "list[int]"},
154+
output_type="list[int]",
155+
postconditions=[
156+
"is_sorted(result)",
157+
"is_permutation(result, lst)",
158+
],
159+
edge_cases=["lst == []", "len(lst) == 1"],
160+
)
161+
162+
result = await verify(
163+
spec,
164+
language="python",
165+
backend="lean4",
166+
)
167+
168+
print(f"Verified: {result.verified}")
169+
print(f"Iterations: {result.iterations}")
170+
171+
if result.verified:
172+
print(f"\n--- Implementation ---\n{result.code}")
173+
print(f"\n--- Proof ---\n{result.proof}")
174+
print(f"\n--- Certificate ---\n{result.certificate.to_json()}")
175+
else:
176+
for err in result.errors:
177+
print(f"Error: {err}")
178+
179+
180+
asyncio.run(main())
181+
```
182+
183+
### Step 4: Check the complexity score
184+
185+
Before running verification, you can estimate how hard the spec is to verify
186+
using `complexity_score()`:
187+
188+
```python
189+
score = spec.complexity_score()
190+
print(f"Complexity: {score:.2f}")
191+
# A score above 0.5 suggests the spec may need more refinement iterations.
192+
```
193+
194+
### Step 5: Inspect the proof certificate
195+
196+
The certificate binds the spec, implementation, and proof together via SHA-256
197+
hashes. You can re-verify it later:
198+
199+
```python
200+
from vericode.verifier import ProofCertificate
201+
202+
valid = ProofCertificate.verify_certificate(
203+
result.certificate,
204+
spec,
205+
result.code,
206+
result.proof,
207+
)
208+
print(f"Certificate valid: {valid}")
209+
```
210+
211+
If anyone changes the code or the spec after verification, the certificate will
212+
no longer validate.
213+
214+
### Step 6: Verify existing code with `prove`
215+
216+
If you already have an implementation and just want a proof:
217+
218+
```bash
219+
vericode prove --code sort.py --spec "output is sorted and is a permutation of input"
220+
```
221+
222+
This tells the LLM to keep the implementation fixed and only generate a proof.
223+
62224
## Fastest CLI Path
63225

64226
Verify a natural-language prompt:
@@ -85,6 +247,12 @@ Batch a directory of YAML specs:
85247
vericode batch --specs specs/ --output verified/
86248
```
87249

250+
Batch with a progress bar:
251+
252+
```bash
253+
vericode batch --specs specs/ --output verified/ --progress
254+
```
255+
88256
`batch` defaults the implementation language from the backend unless `--lang`
89257
is supplied:
90258

@@ -168,13 +336,31 @@ postconditions:
168336
- `code`: generated or preserved implementation
169337
- `proof`: backend-specific proof text
170338
- `verified`: final machine-check status
171-
- `iterations`: refinement rounds taken by the proof engine
339+
- `iterations`: refinement rounds taken by the proof engine (0 if cached)
172340
- `backend`: backend name used for the run
173341
- `certificate`: `ProofCertificate` binding the spec, code, and proof bundle
174342

175343
The certificate is designed to be machine-checkable later; it stores hashes of
176344
the canonicalized spec, implementation, and bound proof source.
177345

346+
## Verification Cache
347+
348+
Successful verification results are cached by default, keyed by a SHA-256 hash
349+
of the spec, backend, and provider. Subsequent runs with identical inputs skip
350+
the LLM and proof-assistant pipeline entirely.
351+
352+
To bypass the cache:
353+
354+
```bash
355+
vericode verify "sort a list" --no-cache
356+
```
357+
358+
Or in Python:
359+
360+
```python
361+
result = await verify(spec, backend="lean4", use_cache=False)
362+
```
363+
178364
## `verify` vs `prove`
179365

180366
- `verify` starts from a natural-language or YAML spec and asks the model to

src/vericode/backends/dafny.py

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from pathlib import Path
1414

1515
from vericode.backends.base import VerificationBackend, VerificationResult
16+
from vericode.exceptions import ProofCompilationError
1617

1718
logger = logging.getLogger(__name__)
1819

@@ -72,32 +73,43 @@ async def verify(self, proof_source: str) -> VerificationResult:
7273
proc.communicate(), timeout=self.timeout
7374
)
7475
except FileNotFoundError:
75-
return VerificationResult(
76-
success=False,
77-
compiler_output="",
78-
errors=["dafny binary not found on PATH"],
79-
backend=self.name,
80-
)
76+
raise ProofCompilationError(
77+
"dafny binary not found on PATH",
78+
backend_name=self.name,
79+
source_file=str(tmp_path),
80+
error_lines=["dafny binary not found on PATH"],
81+
raw_output="",
82+
) from None
8183
except TimeoutError:
8284
proc.kill()
8385
await proc.wait()
84-
return VerificationResult(
85-
success=False,
86-
compiler_output="",
87-
errors=[f"dafny verification timed out after {self.timeout}s"],
88-
backend=self.name,
89-
)
86+
raise ProofCompilationError(
87+
f"dafny verification timed out after {self.timeout}s",
88+
backend_name=self.name,
89+
source_file=str(tmp_path),
90+
error_lines=[f"dafny verification timed out after {self.timeout}s"],
91+
raw_output="",
92+
) from None
9093
finally:
9194
tmp_path.unlink(missing_ok=True)
9295

9396
elapsed = time.monotonic() - start
9497
output = (stdout_bytes.decode() + "\n" + stderr_bytes.decode()).strip()
9598
errors = _parse_dafny_errors(output)
9699

100+
if proc.returncode != 0 or len(errors) > 0:
101+
raise ProofCompilationError(
102+
f"Dafny proof compilation failed with {len(errors)} error(s)",
103+
backend_name=self.name,
104+
source_file=str(tmp_path),
105+
error_lines=errors,
106+
raw_output=output,
107+
)
108+
97109
return VerificationResult(
98-
success=proc.returncode == 0 and len(errors) == 0,
110+
success=True,
99111
compiler_output=output,
100-
errors=errors,
112+
errors=[],
101113
elapsed_seconds=elapsed,
102114
backend=self.name,
103115
)

src/vericode/backends/lean4.py

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from pathlib import Path
1414

1515
from vericode.backends.base import VerificationBackend, VerificationResult
16+
from vericode.exceptions import ProofCompilationError
1617

1718
logger = logging.getLogger(__name__)
1819

@@ -71,32 +72,43 @@ async def verify(self, proof_source: str) -> VerificationResult:
7172
proc.communicate(), timeout=self.timeout
7273
)
7374
except FileNotFoundError:
74-
return VerificationResult(
75-
success=False,
76-
compiler_output="",
77-
errors=["lean binary not found on PATH"],
78-
backend=self.name,
79-
)
75+
raise ProofCompilationError(
76+
"lean binary not found on PATH",
77+
backend_name=self.name,
78+
source_file=str(tmp_path),
79+
error_lines=["lean binary not found on PATH"],
80+
raw_output="",
81+
) from None
8082
except TimeoutError:
8183
proc.kill()
8284
await proc.wait()
83-
return VerificationResult(
84-
success=False,
85-
compiler_output="",
86-
errors=[f"lean verification timed out after {self.timeout}s"],
87-
backend=self.name,
88-
)
85+
raise ProofCompilationError(
86+
f"lean verification timed out after {self.timeout}s",
87+
backend_name=self.name,
88+
source_file=str(tmp_path),
89+
error_lines=[f"lean verification timed out after {self.timeout}s"],
90+
raw_output="",
91+
) from None
8992
finally:
9093
tmp_path.unlink(missing_ok=True)
9194

9295
elapsed = time.monotonic() - start
9396
output = (stdout_bytes.decode() + "\n" + stderr_bytes.decode()).strip()
9497
errors = _parse_lean_errors(output)
9598

99+
if proc.returncode != 0 or len(errors) > 0:
100+
raise ProofCompilationError(
101+
f"Lean 4 proof compilation failed with {len(errors)} error(s)",
102+
backend_name=self.name,
103+
source_file=str(tmp_path),
104+
error_lines=errors,
105+
raw_output=output,
106+
)
107+
96108
return VerificationResult(
97-
success=proc.returncode == 0 and len(errors) == 0,
109+
success=True,
98110
compiler_output=output,
99-
errors=errors,
111+
errors=[],
100112
elapsed_seconds=elapsed,
101113
backend=self.name,
102114
)

0 commit comments

Comments
 (0)