Skip to content
190 changes: 188 additions & 2 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
3. a machine-check result
4. a certificate binding the spec, code, and proof bundle together

This guide covers the shortest path from install to a verified run.
This guide covers the shortest path from install to a verified run, then
walks through a complete example verifying a sorting function.

## Install

Expand Down Expand Up @@ -59,6 +60,167 @@ export DEEPSEEK_API_KEY=...
If you do not pass a provider explicitly in Python, the top-level `verify()`
path defaults to the Anthropic provider.

## Walkthrough: Verifying a Sorting Function

This section walks through verifying a sorting function end-to-end, from
writing the spec to inspecting the proof certificate.

### Step 1: Write the specification

Create a file called `sort_spec.yaml`:

```yaml
description: >
Sort a list of integers in non-decreasing order.
The output must be a permutation of the input.
The function must handle empty lists and single-element lists.
function_name: sort
input_types:
lst: list[int]
output_type: list[int]
preconditions: []
postconditions:
- "is_sorted(result)"
- "is_permutation(result, lst)"
edge_cases:
- "lst == []"
- "len(lst) == 1"
```

The specification tells `vericode` exactly what the function must satisfy.
Postconditions are the properties the formal proof will encode.

### Step 2: Run verification from the CLI

```bash
vericode verify --spec sort_spec.yaml --lang python --backend lean4
```

`vericode` will:

1. Parse the YAML spec into a structured `Spec` object.
2. Send the spec to the LLM, which generates both a Python implementation and
a Lean 4 proof in a single pass.
3. Compile the proof with the Lean 4 toolchain.
4. If the proof fails, feed the compiler errors back to the LLM for refinement.
5. Repeat up to `--max-iterations` times (default 5).

On success you will see output like:

```
vericode
Function: sort
Language: python
Backend: lean4
Provider: anthropic

Verification successful!
Iterations: 2

Implementation
def sort(lst: list[int]) -> list[int]:
...

Proof
theorem sort_correct :
...

Proof Receipt
{
"spec_hash": "a1b2c3...",
"code_hash": "d4e5f6...",
"proof_hash": "789abc...",
"backend": "lean4",
"timestamp": "2026-03-30T12:00:00+00:00",
"verified": true
}
```

### Step 3: Run the same verification from Python

```python
import asyncio
from vericode import Spec, verify


async def main() -> None:
spec = Spec(
description=(
"Sort a list of integers in non-decreasing order. "
"The output must be a permutation of the input."
),
function_name="sort",
input_types={"lst": "list[int]"},
output_type="list[int]",
postconditions=[
"is_sorted(result)",
"is_permutation(result, lst)",
],
edge_cases=["lst == []", "len(lst) == 1"],
)

result = await verify(
spec,
language="python",
backend="lean4",
)

print(f"Verified: {result.verified}")
print(f"Iterations: {result.iterations}")

if result.verified:
print(f"\n--- Implementation ---\n{result.code}")
print(f"\n--- Proof ---\n{result.proof}")
print(f"\n--- Certificate ---\n{result.certificate.to_json()}")
else:
for err in result.errors:
print(f"Error: {err}")


asyncio.run(main())
```

### Step 4: Check the complexity score

Before running verification, you can estimate how hard the spec is to verify
using `complexity_score()`:

```python
score = spec.complexity_score()
print(f"Complexity: {score:.2f}")
# A score above 0.5 suggests the spec may need more refinement iterations.
```

### Step 5: Inspect the proof certificate

The certificate binds the spec, implementation, and proof together via SHA-256
hashes. You can re-verify it later:

```python
from vericode.verifier import ProofCertificate

valid = ProofCertificate.verify_certificate(
result.certificate,
spec,
result.code,
result.proof,
)
print(f"Certificate valid: {valid}")
```

If anyone changes the code or the spec after verification, the certificate will
no longer validate.

### Step 6: Verify existing code with `prove`

If you already have an implementation and just want a proof:

```bash
vericode prove --code sort.py --spec "output is sorted and is a permutation of input"
```

This tells the LLM to keep the implementation fixed and only generate a proof.

## Fastest CLI Path

Verify a natural-language prompt:
Expand All @@ -85,6 +247,12 @@ Batch a directory of YAML specs:
vericode batch --specs specs/ --output verified/
```

Batch with a progress bar:

```bash
vericode batch --specs specs/ --output verified/ --progress
```

`batch` defaults the implementation language from the backend unless `--lang`
is supplied:

Expand Down Expand Up @@ -168,13 +336,31 @@ postconditions:
- `code`: generated or preserved implementation
- `proof`: backend-specific proof text
- `verified`: final machine-check status
- `iterations`: refinement rounds taken by the proof engine
- `iterations`: refinement rounds taken by the proof engine (0 if cached)
- `backend`: backend name used for the run
- `certificate`: `ProofCertificate` binding the spec, code, and proof bundle

The certificate is designed to be machine-checkable later; it stores hashes of
the canonicalized spec, implementation, and bound proof source.

## Verification Cache

Successful verification results are cached by default, keyed by a SHA-256 hash
of the spec, backend, and provider. Subsequent runs with identical inputs skip
the LLM and proof-assistant pipeline entirely.

To bypass the cache:

```bash
vericode verify "sort a list" --no-cache
```

Or in Python:

```python
result = await verify(spec, backend="lean4", use_cache=False)
```

## `verify` vs `prove`

- `verify` starts from a natural-language or YAML spec and asks the model to
Expand Down
40 changes: 26 additions & 14 deletions src/vericode/backends/dafny.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pathlib import Path

from vericode.backends.base import VerificationBackend, VerificationResult
from vericode.exceptions import ProofCompilationError

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -72,32 +73,43 @@ async def verify(self, proof_source: str) -> VerificationResult:
proc.communicate(), timeout=self.timeout
)
except FileNotFoundError:
return VerificationResult(
success=False,
compiler_output="",
errors=["dafny binary not found on PATH"],
backend=self.name,
)
raise ProofCompilationError(
"dafny binary not found on PATH",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=["dafny binary not found on PATH"],
raw_output="",
) from None
except TimeoutError:
proc.kill()
await proc.wait()
return VerificationResult(
success=False,
compiler_output="",
errors=[f"dafny verification timed out after {self.timeout}s"],
backend=self.name,
)
raise ProofCompilationError(
f"dafny verification timed out after {self.timeout}s",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=[f"dafny verification timed out after {self.timeout}s"],
raw_output="",
) from None
finally:
tmp_path.unlink(missing_ok=True)

elapsed = time.monotonic() - start
output = (stdout_bytes.decode() + "\n" + stderr_bytes.decode()).strip()
errors = _parse_dafny_errors(output)

if proc.returncode != 0 or len(errors) > 0:
raise ProofCompilationError(
f"Dafny proof compilation failed with {len(errors)} error(s)",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=errors,
raw_output=output,
)

return VerificationResult(
success=proc.returncode == 0 and len(errors) == 0,
success=True,
compiler_output=output,
errors=errors,
errors=[],
elapsed_seconds=elapsed,
backend=self.name,
)
Expand Down
40 changes: 26 additions & 14 deletions src/vericode/backends/lean4.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pathlib import Path

from vericode.backends.base import VerificationBackend, VerificationResult
from vericode.exceptions import ProofCompilationError

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -71,32 +72,43 @@ async def verify(self, proof_source: str) -> VerificationResult:
proc.communicate(), timeout=self.timeout
)
except FileNotFoundError:
return VerificationResult(
success=False,
compiler_output="",
errors=["lean binary not found on PATH"],
backend=self.name,
)
raise ProofCompilationError(
"lean binary not found on PATH",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=["lean binary not found on PATH"],
raw_output="",
) from None
except TimeoutError:
proc.kill()
await proc.wait()
return VerificationResult(
success=False,
compiler_output="",
errors=[f"lean verification timed out after {self.timeout}s"],
backend=self.name,
)
raise ProofCompilationError(
f"lean verification timed out after {self.timeout}s",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=[f"lean verification timed out after {self.timeout}s"],
raw_output="",
) from None
finally:
tmp_path.unlink(missing_ok=True)

elapsed = time.monotonic() - start
output = (stdout_bytes.decode() + "\n" + stderr_bytes.decode()).strip()
errors = _parse_lean_errors(output)

if proc.returncode != 0 or len(errors) > 0:
raise ProofCompilationError(
f"Lean 4 proof compilation failed with {len(errors)} error(s)",
backend_name=self.name,
source_file=str(tmp_path),
error_lines=errors,
raw_output=output,
)

return VerificationResult(
success=proc.returncode == 0 and len(errors) == 0,
success=True,
compiler_output=output,
errors=errors,
errors=[],
elapsed_seconds=elapsed,
backend=self.name,
)
Expand Down
Loading
Loading