ethereum · ralexstokes · Apr 2, 2025 · Apr 2, 2025 · May 1, 2025 · May 1, 2025
@@ -2,18 +2,18 @@
 eip: 7594
 title: PeerDAS - Peer Data Availability Sampling
 description: Introducing simple DAS utilizing gossip distribution and peer requests
-author: Danny Ryan (@djrtwo), Dankrad Feist (@dankrad), Francesco D'Amato (@fradamt), Hsiao-Wei Wang (@hwwhww)
+author: Danny Ryan (@djrtwo), Dankrad Feist (@dankrad), Francesco D'Amato (@fradamt), Hsiao-Wei Wang (@hwwhww), Alex Stokes (@ralexstokes)
 discussions-to: https://ethereum-magicians.org/t/eip-7594-peerdas-peer-data-availability-sampling/18215
 status: Review
 type: Standards Track
-category: Networking
+category: Core
 created: 2024-01-12
 requires: 4844
 ---
 
 ## Abstract
 
-PeerDAS (Peer Data Availability Sampling) is a networking protocol that allows beacon nodes to perform data availability sampling (DAS) to ensure that blob data has been made available while downloading only a subset of the data. PeerDAS utilizes gossip for distribution, discovery for finding peers of particular data custody, and peer requests for sampling.
+PeerDAS (Peer Data Availability Sampling) is a networking protocol that allows nodes to perform data availability sampling (DAS) to ensure that blob data has been made available while downloading only a subset of the data. PeerDAS utilizes gossip for distribution, discovery for finding peers of particular data custody, and peer requests for sampling.
 
 ## Motivation
 
@@ -23,17 +23,21 @@ Providing additional data availability helps bring scale to Ethereum users in th
 
 ## Specification
 
-We extend the blobs introduced in EIP-4844 using a one-dimensional erasure coding extension. Each row consists of the blob data combined with its erasure code. It is subdivided into cells, which are the smallest units that can be authenticated with their respective blob's KZG commitments. Each column, associated with a specific gossip subnet, consists of the cells from all rows for a specific index. Each node is responsible for maintaining and custodying a deterministic set of column subnets and data as a function of their node ID.
+We extend the blobs introduced in EIP-4844 using a one-dimensional erasure coding extension. Each row consists of the blob data combined with its erasure code. It is subdivided into cells, which are the smallest units that can be authenticated with their respective blob's KZG commitments. Each column, associated with a specific gossip subnet, consists of the cells from all rows for a specific index. Each node is responsible for maintaining a deterministic set of column subnets and custodying their data as a function of their node ID.
 
 Nodes find and maintain a diverse peer set and sample columns from their peers to perform DAS every slot.
 
-A node can reconstruct the entire data matrix if it acquires at least 50% of all the columns. If a node has less than 50%, it can request the necessary columns from the peer nodes.
+A node can reconstruct the entire data matrix if it acquires at least 50% of all the columns. If a node has less than 50%, it can request the necessary columns from its peer nodes.
 
-The detailed specifications are on [ethereum/consensus-specs](https://github.com/ethereum/consensus-specs/tree/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/).
+### Consensus layer
+
+The detailed consensus layer specifications are on [ethereum/consensus-specs](https://github.com/ethereum/consensus-specs/tree/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/).
 
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
 
-### Networking
+### Execution layer
+
+#### Networking
 
 This EIP introduces cell KZG proofs, which are used to prove that a KZG commitment opens to a cell at the given index. This allows downloading only specific cells from a blob, while still ensuring data integrity with respect to the corresponding KZG commitment, and is therefore a key component of data availability sampling. However, computing the cell proofs for a blob is an expensive operation, which a block producer would have to repeat for many blobs. Since proof verification is much cheaper than proof computation, and the proof size is negligible compared to cell size, we instead require blob transaction senders to compute the proofs themselves and include them in the EIP-4844 transaction pool wrapper for blob transactions.
 
@@ -47,7 +51,7 @@ cell_proofs = [cell_proof_0, cell_proof_1, ...]
 
 The `tx_payload_body`, `blobs` and `commitments` are as in EIP-4844, while the `proofs` field is replaced by `cell_proofs`, and a `wrapper_version` is added. These are defined as follows:
 
-- `wrapper_version` - one byte indicating which version of the wrapper is used. For the current one, it is set to `1`.
+- `wrapper_version` - one byte indicating which version of the wrapper is used. For the current version, it is set to **`1`**.
 - `cell_proofs` - list of cell proofs for all `blobs`, including the proofs for the extension indices, for a total of `CELLS_PER_EXT_BLOB` proofs per blob (`CELLS_PER_EXT_BLOB` is the number of cells for an extended blob, defined [in the consensus specs](https://github.com/ethereum/consensus-specs/tree/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/polynomial-commitments-sampling.md#cells))
 
 Note that, while `cell_proofs` contain the proofs for all cells, including the extension cells, the blobs themselves are sent without being extended (`CELLS_PER_EXT_BLOB / 2` cells per blob). This is to avoid sending redundant data, which can quickly be computed by the receiving node.
@@ -58,23 +62,76 @@ The node MUST validate `tx_payload_body` and verify the wrapped data against it.
 - There are an equal number of `tx_payload_body.blob_versioned_hashes`, `blobs` and `commitments`.
 - `cell_proofs` contains exactly `CELLS_PER_EXT_BLOB * len(blobs)` cell proofs
 - The KZG `commitments` hash to the versioned hashes, i.e. `kzg_to_versioned_hash(commitments[i]) == tx_payload_body.blob_versioned_hashes[i]`
-- The KZG `commitments` match the corresponding `blobs` and `cell_proofs`. This requires computing the extension cells for all `blobs`, and verifying all `cell_proofs`. (Note: all cell proofs can be batch verified at once)
+- The KZG `commitments` match the corresponding `blobs` and `cell_proofs`. This requires computing the extension cells for all `blobs` (e.g. via `compute_cells`), and verifying all `cell_proofs`. (Note: all cell proofs can be batch verified at once, e.g. via `verify_cell_kzg_proof_batch`)
 
+## Rationale
 
+### Why use DAS to scale the DA layer?
 
-## Rationale
+PeerDAS is a DAS scheme that requires nodes to only download a small amount of data to satisfy a local availability check. The amount of data required grows sublinearly with the total amount of data (i.e. blobs in a block) so it constitutes a secure scaling scheme while supporting the decentralization of the network.
+
+### What does peer sampling provide?
+
+PeerDAS takes the set of peers of a node on the network as a primitive to build a DAS scheme around. A focus on peers allows for redundancy in the mechanism (as a node generally has many peers, and peer count can also be cheaply increased) which both helps with theoretical security as detailed in the "Security Considerations" section, but also pratical security of the implementation (e.g. if a single peer fails, a node can likely use another peer for the same sampling task).
+
+PeerDAS also has the nice property that any given node may voluntarily custody more of the data than the bare minimum which increases the performance of the mechanism. Alternative schemes do not readily support this "transparent" scaling property.
+
+### Why are these parameters chosen?
+
+The parameters of PeerDAS given in the specs support network security while keeping node requirements sufficiently low. See the security argument below for further details.
 
-TBD
+### Why do validators have an additional custody requirement beyond full nodes?
+
+Validators are assumed to have marginally higher requirements to participate on the network. PeerDAS introduces a custody requirement that scales with the validator count so that nodes with more resources can contribute to a more stable backbone that makes the global network more robust.
+
+### Column sampling vs row sampling
+
+PeerDAS defines a sample as a "column" which is a cross-section across _all_ blobs, rather than a "row" which would be a full blob.
+The sampling scheme could be defined over rows but then any reconstruction strategy would need to work over "extension" blobs that do not a prior exist on the network.
+Reconstruction becomes much more tractable by working over columns as nodes can be assumed to have much more of the complete data by default (e.g. because most/all of the blobs are in the public mempool).
 
 ## Backwards Compatibility
 
+This EIP is fully backwards compatible with [EIP-4844](./EIP-4844.md).
+
 ## Test Cases
 
-## Reference Implementation
+Refer to the consensus and execution spec tests for testing of this EIP.
 
 ## Security Considerations
 
-Needs discussion.
+The primary failure mode of a DAS scheme is a "data withholding" attack, where a block producer attempts to convince the network some data is available even when the block producer fails to provide the associated data.
+PeerDAS resolves withholding attacks by implementing a (pseudo)randomized sampling scheme that decreases the probability of a successful attack as the size of the network grows for a sublinear amount of data that must be downloaded.
+
+This intuition can be formalized as follows:
+
+Letting `n` be the total number of sampling nodes (i.e. the size of the network), `m` be the total number of samples possible (cf. NUMBER_OF_CUSTODY_GROUPS in the specs) and `k` be the minimum number of samples that a node must download (cf. `SAMPLES_PER_SLOT` in the specs), we have the following bound for the probability of convincing a fraction $`\epsilon`$ of the nodes that some data is available when it is withheld:
+
+```math
+\mathbb{P}(\text{tricking } n\epsilon \text{ nodes}) \le \binom{n}{n\epsilon}\binom{m}{\frac{m}{2}-1}2^{-kn\epsilon}
+```
+
+The first term is the number of possible ways to choose a subset of $n\epsilon$ nodes whose sampling queries should be satisfied (i.e. the nodes to be tricked).
+The second term is the number of ways to choose a maximally large subset of samples to be made available to satisfy the sampling queries of the $n\epsilon$ nodes without allowing reconstruction of the full data.
+Finally, for any such choices, the third term is the probability of success, i.e. the probability that the sampling queries of all chosen $n\epsilon$ nodes are satisfied by the chosen subset up to the reconstruction threshold.
+
+For mainnet parameters given in the specs and assuming 10,000 nodes on the network, we can compute upper bounds of attack success at various node counts.
+
+| $\epsilon$ | $n\epsilon$ (nodes) | Upper bound on $\mathbb{P}$  |
+|:-----------:|:---------------------:|:---------------------:|
+| 0.0         | 0                     | 2.36*10^37   |
+| 0.1         | 1 000                 | 10^-960.9   |
+| 0.2         | 2 000                 | 10^-2607.9  |
+| 0.3         | 3 000                 | 10^-4536.5  |
+| 0.4         | 4 000                 | 10^-6674.8  |
+| 0.5         | 5 000                 | 10^-8995.6  |
+| 0.6         | 6 000                 | 10^-11491.3 |
+| 0.7         | 7 000                 | 10^-14169.4 |
+| 0.8         | 8 000                 | 10^-17057.3 |
+| 0.9         | 9 000                 | 10^-20226.8 |
+| 1.0         | 10 000                | 10^-24045.0 |
+
+The table shows that the chances of a successful attack quickly drop to a negligible amount and so PeerDAS is considered secure to withholding attacks.
 
 ## Copyright