Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions preamble.tex
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@
\newcommand*{\Csegmentfootprint}{\mathsf{W}_F}
\newcommand*{\Csegmentsize}{\mathsf{W}_G}
\newcommand*{\Cmaxpackageimports}{\mathsf{W}_M}
\newcommand*{\Cecoriginalshards}{\mathsf{W}_O}
\newcommand*{\Csegmentecpieces}{\mathsf{W}_P}
\newcommand*{\Cmaxreportvarsize}{\mathsf{W}_R}
\newcommand*{\Cmemosize}{\mathsf{W}_T}
Expand Down
1 change: 1 addition & 0 deletions text/definitions.tex
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ \subsubsection{Constants}
\item[$\Csegmentsize = \Csegmentecpieces\Cecpiecesize = 4104$] The size of a segment in octets. See section \ref{sec:segments}.
\item[$\Csegmentfootprint = \Csegmentsize + 32\ceil{\log_2(\Cmaxpackageimports)} = 4488$] The additional footprint in the Audits DA of a single imported segment. See equation \ref{eq:segmentfootprint}.
\item[$\Cmaxpackageimports = 3,072$] The maximum number of imports in a work-package. See equation \ref{eq:limitworkpackagebandwidth}.
\item[$\Cecoriginalshards = \nicefrac{\Cecpiecesize}{2} = 342$] The number of required original data shards in an erasure coding scheme. See equation \ref{eq:erasurecoding}.
\item[$\Csegmentecpieces = 6$] The number of erasure-coded pieces in a segment.
\item[$\Cmaxreportvarsize = 48\cdot2^{10}$] The maximum total size of all unbounded blobs in a work-report, in octets. See equation \ref{eq:limitworkreportsize}.
\item[$\Cmemosize = 128$] The size of a transfer memo in octets. See equation \ref{eq:defxfer}.
Expand Down
22 changes: 11 additions & 11 deletions text/erasure_coding.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ \section{Erasure Coding}
\newcommand{\join}{\text{join}}
\newcommand{\spl}[1]{\text{split}_{#1}}

The foundation of the data-availability and distribution system of \Jam is a systematic Reed-Solomon erasure coding function in \textsc{gf}($2^{16}$) of rate 342:1023, the same transform as done by the algorithm of \cite{lin2014novel}. We use a little-endian $\blob[2]$ form of the 16-bit \textsc{gf} points with a functional equivalence given by $\fnencode[2]$. From this we may assume the encoding function $\fnerasurecode: \sequence[342]{\blob[2]} \to \sequence[1023]{\blob[2]}$ and the recovery function $\fnecrecover: \protoset{\tuple{\blob[2], \Nmax{1023}}}_{342} \to \sequence[342]{\blob[2]}$. Encoding is done by extrapolating a data blob of size 684 octets (provided in $\fnerasurecode$ here as 342 octet pairs) into 1,023 octet pairs. Recovery is done by collecting together any distinct 342 octet pairs, together with their indices, and transforming this into the original sequence of 342 octet pairs.
The foundation of the data-availability and distribution system of \Jam is a systematic Reed-Solomon erasure coding function in \textsc{gf}($2^{16}$) of rate $\Cecoriginalshards$:$\Cvalcount$, the same transform as done by the algorithm of \cite{lin2014novel}. We use a little-endian $\blob[2]$ form of the 16-bit \textsc{gf} points with a functional equivalence given by $\fnencode[2]$. From this we may assume the encoding function $\fnerasurecode: \sequence[\Cecoriginalshardsize]{\blob[2]} \to \sequence[\Cvalcount]{\blob[2]}$ and the recovery function $\fnecrecover: \protoset{\tuple{\blob[2], \Nmax{\Cvalcount}}}_{\Cecoriginalshardsize} \to \sequence[\Cecoriginalshardsize]{\blob[2]}$. Encoding is done by extrapolating a data blob of size $\Cecpiecesize$ octets (provided in $\fnerasurecode$ here as $\Cecoriginalshards$ octet pairs) into $\Cvalcount$ octet pairs. Recovery is done by collecting together any distinct $\Cecoriginalshards$ octet pairs, together with their indices, and transforming this into the original sequence of $\Cecoriginalshards$ octet pairs.

Practically speaking, this allows for the efficient encoding and recovery of data whose size is a multiple of 684 octets. Data whose length is not divisible by 684 must be padded (we pad with zeroes). We use this erasure-coding in two contexts within the \Jam protocol; one where we encode variable sized (but typically very large) data blobs for the Audit \textsc{da} and block-distribution system, and the other where we encode much smaller fixed-size data \emph{segments} for the Import \textsc{da} system.
Practically speaking, this allows for the efficient encoding and recovery of data whose size is a multiple of $\Cecpiecesize$ octets. Data whose length is not divisible by $\Cecpiecesize$ must be padded (we pad with zeroes). We use this erasure-coding in two contexts within the \Jam protocol; one where we encode variable sized (but typically very large) data blobs for the Audit \textsc{da} and block-distribution system, and the other where we encode much smaller fixed-size data \emph{segments} for the Import \textsc{da} system.

For the Import \textsc{da} system, we deal with an input size of 4,104 octets resulting in data-parallelism of order six. We may attain a greater degree of data parallelism if encoding or recovering more than one segment at a time though for recovery, we may be restricted to requiring each segment to be formed from the same set of indices (depending on the specific algorithm).
For the Import \textsc{da} system, we deal with an input size of 4,104 octets resulting in data-parallelism of order $\Csegmentecpieces$. We may attain a greater degree of data parallelism if encoding or recovering more than one segment at a time though for recovery, we may be restricted to requiring each segment to be formed from the same set of indices (depending on the specific algorithm).

\subsection{Blob Encoding and Recovery}

We assume some data blob $\mathbf{d} \in \blob[684k], k \in \N$. This blob is split into a whole number of $k$ pieces, each a sequence of 342 octet pairs. Each piece is erasure-coded using $\fnerasurecode$ as above to give 1,023 octet pairs per piece.
We assume some data blob $\mathbf{d} \in \blob[\Cecpiecesize{}k], k \in \N$. This blob is split into a whole number of $k$ pieces, each a sequence of $\Cecoriginalshards$ octet pairs. Each piece is erasure-coded using $\fnerasurecode$ as above to give $\Cvalcount$ octet pairs per piece.

The resulting matrix is grouped by its pair-index and concatenated to form 1,023 \emph{chunks}, each of $k$ octet-pairs. Any 342 of these chunks may then be used to reconstruct the original data $\mathbf{d}$.
The resulting matrix is grouped by its pair-index and concatenated to form $\Cvalcount$ \emph{chunks}, each of $k$ octet-pairs. Any $\Cecoriginalshards$ of these chunks may then be used to reconstruct the original data $\mathbf{d}$.

Formally we begin by defining two utility functions for splitting some large sequence into a number of equal-sized sub-sequences and for reconstituting such subsequences back into a single large sequence:
\begin{align}
Expand All @@ -27,21 +27,21 @@ \subsection{Blob Encoding and Recovery}
{}^\text{T}\sq{\sq{\mathbf{x}_{0, 0}, \mathbf{x}_{0, 1}, \mathbf{x}_{0, 2}, \dots}, \sq{\mathbf{x}_{1, 0}, \mathbf{x}_{1, 1}, \dots}, \dots} \equiv \sq{\sq{\mathbf{x}_{0, 0}, \mathbf{x}_{1, 0}, \mathbf{x}_{2, 0}, \dots}, \sq{\mathbf{x}_{0, 1}, \mathbf{x}_{1, 1}, \dots}, \dots}
\end{equation}

We may then define our erasure-code chunking function which accepts an arbitrary sized data blob whose length divides wholly into 684 octets and results in a sequence of 1,023 smaller blobs:
We may then define our erasure-code chunking function which accepts an arbitrary sized data blob whose length divides wholly into $\Cecpiecesize$ octets and results in a sequence of $\Cvalcount$ smaller blobs:
\begin{equation}\label{eq:erasurecoding}
\fnerasurecode_{k \in \N}\colon\abracegroup{
\blob[684k] &\to \sequence[1023]{\blob[2k]} \\
\blob[\Cecpiecesize{}k] &\to \sequence[\Cvalcount]{\blob[2k]} \\
\mathbf{d} &\mapsto \join^\#({}^{\text{T}}\sq{\build{\erasurecode{\mathbf{p}}}{\mathbf{p} \orderedin {}^\text{T}\spl{2}^\#(\spl{2k}(\mathbf{d}))}})
}
\end{equation}

The original data may be reconstructed with any 342 of the 1,023 resultant items (along with their indices). If the original 342 items are known then reconstruction is just their concatenation.
The original data may be reconstructed with any $\Cecoriginalshards$ of the $\Cvalcount$ resultant items (along with their indices). If the original $\Cecoriginalshards$ items are known then reconstruction is just their concatenation.
\begin{equation}
\label{eq:erasurecodinginv}
\fnecrecover_{k \in \N}\colon\abracegroup{
\protoset{\tuple{\blob[2k], \Nmax{1023}}}_{342} &\to \blob[684k] \\
\protoset{\tuple{\blob[2k], \Nmax{\Cvalcount}}}_{\Cecoriginalshardsize} &\to \blob[\Cecpiecesize{}k] \\
\mathbf{c} &\mapsto \begin{cases}
\encode{\sq{\build{\mathbf{x}}{\tup{\mathbf{x}, i} \orderedin \sqorderby{i}{\tup{\mathbf{x}, i} \in \mathbf{c}}}}} &\when \set{\build{i}{\tup{\mathbf{x}, i} \in \mathbf{c}}} = \Nmax{342}\\
\encode{\sq{\build{\mathbf{x}}{\tup{\mathbf{x}, i} \orderedin \sqorderby{i}{\tup{\mathbf{x}, i} \in \mathbf{c}}}}} &\when \set{\build{i}{\tup{\mathbf{x}, i} \in \mathbf{c}}} = \Nmax{\Cecoriginalshardsize}\\
\join(\join^\#({}^\text{T}\sq{
\build{
\ecrecover{{\set{\build{
Expand All @@ -59,7 +59,7 @@ \subsection{Blob Encoding and Recovery}



Segment encoding/decoding may be done using the same functions albeit with a constant $k = 6$.
Segment encoding/decoding may be done using the same functions albeit with a constant $k = \Csegmentecpieces$.

\subsection{Code Word representation}

Expand Down
6 changes: 3 additions & 3 deletions text/work_packages_and_reports.tex
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ \subsection{Honest Behavior}

\subsection{Segments and the Manifest}

Our basic erasure-coding segment size is $\Cecpiecesize = 684$ octets, derived from the fact we wish to be able to reconstruct even should almost two-thirds of our 1023 participants be malicious or incapacitated, the 16-bit Galois field on which the erasure-code is based and the desire to efficiently support encoding data of close to, but no less than, 4\textsc{kb}.
Our basic erasure-coding segment size is $\Cecpiecesize = 684$ octets, derived from the fact we wish to be able to reconstruct even should almost two-thirds of our $\Cvalcount = 1023$ participants be malicious or incapacitated, the 16-bit Galois field on which the erasure-code is based and the desire to efficiently support encoding data of close to, but no less than, 4\textsc{kb}.

Work-packages are generally small to ensure guarantors need not invest a lot of bandwidth in order to discover whether they can get paid for their evaluation into a work-report. Rather than having much data inline, they instead \emph{reference} data through commitments. The simplest commitments are extrinsic data.

Expand Down Expand Up @@ -270,9 +270,9 @@ \subsection{Computation of Work-Report}\label{sec:computeworkreport}

The Is-Authorized logic it references must be executed first in order to ensure that the work-package warrants the needed core-time. Next, the guarantor should ensure that all segment-tree roots which form imported segment commitments are known and have not expired. Finally, the guarantor should ensure that they can fetch all preimage data referenced as the commitments of extrinsic segments.

Once done, then imported segments must be reconstructed. This process may in fact be lazy as the Refine function makes no usage of the data until the \emph{fetch} host-call is made. Fetching generally implies that, for each imported segment, erasure-coded chunks are retrieved from enough unique validators (342, including the guarantor) and is described in more depth in appendix \ref{sec:erasurecoding}. (Since we specify systematic erasure-coding, its reconstruction is trivial in the case that the correct 342 validators are responsive.) Chunks must be fetched for both the data itself and for justification metadata which allows us to ensure that the data is correct.
Once done, then imported segments must be reconstructed. This process may in fact be lazy as the Refine function makes no usage of the data until the \emph{fetch} host-call is made. Fetching generally implies that, for each imported segment, erasure-coded chunks are retrieved from enough unique validators ($\Cecoriginalshards$, including the guarantor) and is described in more depth in appendix \ref{sec:erasurecoding}. (Since we specify systematic erasure-coding, its reconstruction is trivial in the case that the correct $\Cecoriginalshards$ validators are responsive.) Chunks must be fetched for both the data itself and for justification metadata which allows us to ensure that the data is correct.

Validators, in their role as availability assurers, should index such chunks according to the index of the segments-tree whose reconstruction they facilitate. Since the data for segment chunks is so small at 12 octets, fixed communications costs should be kept to a bare minimum. A good network protocol (out of scope at present) will allow guarantors to specify only the segments-tree root and index together with a Boolean to indicate whether the proof chunk need be supplied. Since we assume at least 341 other validators are online and benevolent, we can assume that the guarantor can compute $\importsegmentdata$ and $\justifysegmentdata$ above with confidence, based on the general availability of data committed to with $\mathbf{s}^\clubsuit$, which is specified below.
Validators, in their role as availability assurers, should index such chunks according to the index of the segments-tree whose reconstruction they facilitate. Since the data for segment chunks is so small at 12 octets, fixed communications costs should be kept to a bare minimum. A good network protocol (out of scope at present) will allow guarantors to specify only the segments-tree root and index together with a Boolean to indicate whether the proof chunk need be supplied. Since we assume at least $\Cecoriginalshards - 1$ other validators are online and benevolent, we can assume that the guarantor can compute $\importsegmentdata$ and $\justifysegmentdata$ above with confidence, based on the general availability of data committed to with $\mathbf{s}^\clubsuit$, which is specified below.

\subsubsection{Availability Specifier}\label{sec:availabiltyspecifier}
We define the availability specifier function $\newavailabilityspecifier$, which creates an availability specifier from the package hash, an octet sequence of the audit-friendly work-package bundle (comprising the work-package itself, the extrinsic data and the concatenated import segments along with their proofs of correctness), and the sequence of exported segments:
Expand Down