Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
- [Operations](./operations.md)
- [Simple example](./simpleexample.md)
- [Custom statements and custom operations](./custom.md)
- [Version 1](./custom1.md)
- [Version 1, written again](./custom1b.md)
- [Defining custom predicates](./custompred.md)
- [Custom statement example](./customexample.md)
- [How to hash a custom statement](./customhash.md)
- [POD types](./podtypes.md)
- [SignedPOD](./signedpod.md)
- [MainPOD](./mainpod.md)
Expand Down
40 changes: 40 additions & 0 deletions book/src/backendtypes.md
Original file line number Diff line number Diff line change
@@ -1 +1,41 @@
# Backend types

On the backend, there is only a single type: `Value`.

A `Value` is simply a tuple of field elements. With the plonky2 backend, a `Value` is a tuple of 4 field elements. In general, the backend will expose a constant `VALUE_SIZE`, and a `Value` will be a tuple of `VALUE_SIZE` field elements.

## Integers and booleans

The backend encoding stores integers in such a way that arithmetic operations (addition, multiplication, comparison) are inexpensive to verify in-circuit.

In the case of the Plonky2 backend, an integer $x$ is decomposed as
$$x = x_0 + x_1 \cdot 2^{32}$$
with $0 \leq x_0, x_1 < 2^{32}$ and represented as
$$\texttt{map}\ \iota\ [x_0, x_1, 0, 0],$$
where $\iota:\mathbb{N}\cup\{0\}\rightarrow\texttt{GoldilocksField}$ is the canonical projection.

On the backend, a boolean is stored as an integer, either 0 or 1; so logical operations on booleans are also inexpensive.

## Strings

The backend encoding stores strings as hashes, using a hash function that might not be zk-friendly. For this reason, string operations (substrings, accessing individual characters) are hard to verify in-circuit. The POD2 system does not provide methods for manipulating strings.

In other words: As POD2 sees it, two strings are either equal or not equal. There are no other relationships between strings.

In the case of the Plonky2 backend, a string is converted to a sequence of bytes with the byte `0x01` appended as padding, then the bytes are split into 7-byte chunks starting from the left, these chunks then being interpreted as integers in little-endian form, each of which is naturally an element of `GoldilocksField`, whence the resulting sequence may be hashed via the Poseidon hash function. Symbolically, given a string $s$, its hash is defined by

$$\texttt{poseidon}(\texttt{map}\ (\iota\circ\jmath_\texttt{le-bytes->int})\ \texttt{chunks}_7(\jmath_\texttt{string->bytes}(s)\ \texttt{++}\ [\texttt{0x01}])),$$

where `poseidon` is the Poseidon instance used by Plonky2, $\iota$ is as above, $\texttt{chunks}_{n}:[\texttt{u8}]\rightarrow [[\texttt{u8}]]$ is defined such that[^aux]

$$\texttt{chunks}_n(v) = \textup{if}\ v = [\ ]\ \textup{then}\ [\ ]\ \textup{else}\ [\texttt{take}_n v]\ \texttt{++}\ \texttt{chunks}_n(\texttt{drop}_n v),$$

the mapping $\jmath_\texttt{le-bytes->int}: [u8] \rightarrow{N}\cup\{0\}$ is given by

$$[b_0,\dots,b_{N-1}]\mapsto \sum_{i=0}^{N-1} b_i \cdot 2^{8i},$$

and $\jmath_\texttt{string->bytes}$ is the canonical mapping of a string to its UTF-8 representation.

## Compound types

The three front-end compound types (`Dictionary`, `Array`, `Set`) are all represented as Merkle roots on the backend. The details of the representation are explained on a separate [Merkle tree](./merkletree.md) page.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to define (not necessarily now in this PR) where the Dictionary,Array,Set belong: here they appear in the Backend section referencing the merkletree.md file, on the same time they are described at the Frontend subsection values.md file; and in the code they are implemented in the middleware.

28 changes: 26 additions & 2 deletions book/src/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,33 @@ At the backend level, every definition of a predicate is either a conjunction or

On the backend, custom predicates are defined in _groups_. A group can contain one or more custom predicates and their associated sub-predicates. Recursive definition is only possible within a group: the definition of a predicate in a group can only depend on previously existing predicates, itself, and other predicates in the same group.

## Custom predicates and their IDs
## Arguments of custom predicates

The definition of a custom predicate might also be called an _operation_ or _deduction rule_. It includes two (or, potentially, say, five) statement templates as conditions. The arguments to the statement templates are decomposed as (origin, key) pairs: if statements are allowed to have arity at most 4, then the statement templates in a deduction rule will have at most 8 arguments (4 origins and 4 keys). The same holds for the output statement.

Each argument (origin or key) to an statement template is either a wildcard or a literal. In the backend, the wildcard arguments will be identified as *1, *2, *3, ....

## Examples

See [examples](./customexample.md)

## Hashing and predicate IDs

Each custom predicate is defined as part of a _group_ of predicates. The definitions of all statements in the group are laid out consecutively (see [examples](./customexample.md)) and hashed. For more details, see the pages on [hashing custom statements](./customhash.md) and [custom predicates](./custompred.md).

## How to prove an application of an operation

The POD proof format is inspired by "two-column proofs" (for an example, see [Wikipedia](https://en.wikipedia.org/wiki/Mathematical_proof)). A POD contains a "tabular proof", in which each row includes a "statement" and an "operation". The "operation" is the "reason" that justifies the statement: it is everything the circuit needs as a witness to verify the statement.

For a custom statement, the "reason" includes the following witnesses and verifications:
- the definition of the statement, serialized (see [examples](./customexample.md))
- if the statement is part of a group, the definition of the full group, serialized
- verify that the hash of the definition is the statement ID
- the definition will have some number of "wildcards" (*1, *2, ...) as arguments to statement templates; a value for each wildcard must be provided as a witness (each will be either an origin ID or key)
- the circuit must substitute the claimed values for the wildcards, and the resulting statements (true statements with origins and keys) will appear as witnesses
- the circuit must verify that all the input statement templates (with origins and keys) appear in the previous statements (in higher rows of the table)
- the circuit also substitutes the claimed values for the wildcards in the output statement, and verifies that it matches the claimed output statement

A custom predicate, like a built-in predicate, is identified by a _name_ on the front end and an _identifier_ on the back end. The identifier is a cryptographic hash of the definition of the group.


[^inherit]: What to call this? One POD "inherits" from another?
Expand Down
36 changes: 0 additions & 36 deletions book/src/custom1b.md

This file was deleted.

20 changes: 20 additions & 0 deletions book/src/customhash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# How to hash a custom predicate

Every predicate, native or custom, is identified on the backend by a predicate ID.

The native predicates are numbered with small integers, sequentially. The ID of a custom predicate is a hash of its definition; this guarantees that two different predicates cannot have the same ID (aside from the miniscule probability of a hash collision).

This document explains in some detail how the definition of a custom predicate is serialized and hashed.

Custom predicates are defined in _groups_ (also known as _batches_); see an [example](./customexample.md). The definition of a custom predicate in a group involves other predicates, which may include:
- native predicates
- previously-defined custom predicates
- other predicates in the same group.

Predicate hashing is recursive: in order to hash a group of custom predicates, we need to know IDs for all the previously-defined custom predicates it depends on.

The definition of the whole group of custom predicates is serialized (as explained below), and that serialization is hashed (using a zk-friendly hash -- in the case of the plonky2 backend, Poseidon) to give a _group ID_. Each predicate in the group is then referenced by
```
predicate_ID = (group_ID, idx)
```
(here `idx` is simply the index of the predicate in the group).
2 changes: 1 addition & 1 deletion book/src/custom1.md → book/src/custompred.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Custom operations (or: how to define a custom predicate): VERSION 1
# Defining custom predicates

(Note: At the moment, we consider a "custom operation" to be exactly the same thing as the "definition of a custom predicate.")

Expand Down
17 changes: 15 additions & 2 deletions book/src/front_and_back.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
# Frontend and backend

The frontend is what we want the user to see.
The backend is what we want the circuit to see.
The POD2 system consists of a frontend and a backend, connected by a middleware. This page outlines some design principles for deciding which components go where.

```
user -- frontend -- middleware -- backend -- ZK circuit
```

The frontend is what we want the user to see; the backend is what we want the circuit to see.

## Circuit and proving system

The first implementation of POD2 uses Plonky2 as its proving system. In principle, a future implementation could use some other proving system. The frontend and middleware should not be aware of what proving system is in use: anything specific to the proving system belongs to the backend.

## User-facing types versus in-circuit types

The frontend type system exposes human-readable types to POD developers: strings, ints, bools, and so forth. On the backend, all types are build out of field elements. The middleware should handle the conversion.
7 changes: 7 additions & 0 deletions book/src/values.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# POD value types
From the frontend perspective, POD values may be one of the following[^type] types: two atomic types
- `Integer`
- `Bool`
- `String`
- `Raw`

and three compound types
- `Dictionary`
Expand All @@ -24,6 +26,8 @@ with $0 \leq x_0, x_1 < 2^{32}$ and representing it as
$$\texttt{map}\ \iota\ [x_0, x_1, 0, 0],$$
where $\iota:\mathbb{N}\cup\{0\}\rightarrow\texttt{GoldilocksField}$ is the canonical projection.

## `Bool`
In the frontend, this is a simple bool. In the backend, it will have the same encoding as an `Integer` `0` (for `false`) or `1` (for `true`).

## `String`
In the frontend, this type corresponds to the usual `String`. In the backend, the string will be mapped to a sequence of field elements and hashed with the hash function employed there, thus being represented by its hash.
Expand All @@ -42,7 +46,10 @@ $$[b_0,\dots,b_{N-1}]\mapsto \sum_{i=0}^{N-1} b_i \cdot 2^{8i},$$

and $\jmath_\texttt{string->bytes}$ is the canonical mapping of a string to its UTF-8 representation.

## `Raw`
"Raw" is short for "raw value". A `Raw` exposes a backend value on the frontend.

With the plonky2 backend, a `Raw` is a tuple of 4 elements of the Goldilocks field.
Comment on lines +29 to +32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the Raw in this section equivalent to the Value type at the section backendtypes.md?


## Dictionary, array, set

Expand Down
Loading