Skip to content

peerDAS: Public methods must accept raw bytes #3579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 50 additions & 19 deletions specs/_features/eip7594/polynomial-commitments-sampling.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you ensure that all the docstrings have proper punctuation? Lots of the *_polynomialcoeff methods and other funcs, like verify_kzg_proof_multi_impl, are missing punctuation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the documentation of the entire file is a bit lacking to the point that I would suggest we do it in another PR since it requires quite a bit of work.

(PS: That also made me look at the 4844 documentation, and it also seems severely incomplete or outdated)

Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
- [Preset](#preset)
- [Cells](#cells)
- [Helper functions](#helper-functions)
- [BLS12-381 helpers](#bls12-381-helpers)
- [`bytes_to_cell`](#bytes_to_cell)
- [Linear combinations](#linear-combinations)
- [`g2_lincomb`](#g2_lincomb)
- [FFTs](#ffts)
Expand Down Expand Up @@ -81,6 +83,18 @@ Cells are the smallest unit of blob data that can come with their own KZG proofs

## Helper functions

### BLS12-381 helpers

#### `bytes_to_cell`

```python
def bytes_to_cell(cell_bytes: Vector[Bytes32, FIELD_ELEMENTS_PER_CELL]) -> Cell:
"""
Convert untrusted bytes into a Cell.
"""
return [bytes_to_bls_field(element) for element in cell_bytes]
```

### Linear combinations

#### `g2_lincomb`
Expand Down Expand Up @@ -242,7 +256,7 @@ def interpolate_polynomialcoeff(xs: Sequence[BLSFieldElement], ys: Sequence[BLSF
summand, [(- int(weight_adjustment) * int(xs[j])) % BLS_MODULUS, weight_adjustment]
)
r = add_polynomialcoeff(r, summand)

return r
```

Expand Down Expand Up @@ -330,7 +344,7 @@ def verify_kzg_proof_multi_impl(commitment: KZGCommitment,
#### `coset_for_cell`

```python
def coset_for_cell(cell_id: int) -> Cell:
def coset_for_cell(cell_id: CellID) -> Cell:
Copy link
Member

@jtraglia jtraglia Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think cell_id should be column_index (or column_id) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, column_id and cell_id can be used interchangeably in this function, since each cell corresponds to a column in a one-to-one fashion.

I think it's better for us to choose which notion is more intuitive in a separate PR.

"""
Get the coset for a given ``cell_id``
"""
Expand Down Expand Up @@ -385,7 +399,7 @@ def compute_cells(blob: Blob) -> Vector[Cell, CELLS_PER_BLOB]:
polynomial = blob_to_polynomial(blob)
polynomial_coeff = polynomial_eval_to_coeff(polynomial)

extended_data = fft_field(polynomial_coeff + [0] * FIELD_ELEMENTS_PER_BLOB,
extended_data = fft_field(polynomial_coeff + [0] * FIELD_ELEMENTS_PER_BLOB,
compute_roots_of_unity(2 * FIELD_ELEMENTS_PER_BLOB))
extended_data_rbo = bit_reversal_permutation(extended_data)
return [extended_data_rbo[i * FIELD_ELEMENTS_PER_CELL:(i + 1) * FIELD_ELEMENTS_PER_CELL]
Expand All @@ -397,30 +411,37 @@ def compute_cells(blob: Blob) -> Vector[Cell, CELLS_PER_BLOB]:
#### `verify_cell_proof`

```python
def verify_cell_proof(commitment: KZGCommitment,
cell_id: int,
cell: Cell,
proof: KZGProof) -> bool:
def verify_cell_proof(commitment_bytes: Bytes48,
cell_id: CellID,
cell_bytes: Vector[Bytes32, FIELD_ELEMENTS_PER_CELL],
proof_bytes: Bytes48) -> bool:
"""
Check a cell proof

Public method.
"""
coset = coset_for_cell(cell_id)

return verify_kzg_proof_multi_impl(commitment, coset, cell, proof)
return verify_kzg_proof_multi_impl(
bytes_to_kzg_commitment(commitment_bytes),
coset,
bytes_to_cell(cell_bytes),
bytes_to_kzg_proof(proof_bytes))
```

#### `verify_cell_proof_batch`

```python
def verify_cell_proof_batch(row_commitments: Sequence[KZGCommitment],
row_ids: Sequence[int],
column_ids: Sequence[int],
cells: Sequence[Cell],
proofs: Sequence[KZGProof]) -> bool:
def verify_cell_proof_batch(row_commitments_bytes: Sequence[Bytes48],
row_ids: Sequence[uint64],
column_ids: Sequence[uint64],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be CellID too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not IMO. The ID of a row or column is semantically different from a cell ID, even if the underlying type is the same.

Specifically, we expect to have just a few rows. We also expect the amount of columns to be equal to the amount of cells per blob.

However, due to the semantic difference, I would be hesitant in reusing CellID here. IMO another approach would be to define a RowID and ColumnID type for these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree now, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related commit that will cause conflicts: 665e6fa

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went address it in #3574 -> RowIndex and ColumnIndex. IMO "index" implies they are consecutive numbers but ID is not.

We can fix conflict later whichever merge late.

cells_bytes: Sequence[Vector[Bytes32, FIELD_ELEMENTS_PER_CELL]],
proofs_bytes: Sequence[Bytes48]) -> bool:
"""
Check multiple cell proofs. This function implements the naive algorithm of checking every cell
Verify a set of cells, given their corresponding proofs and their coordinates (row_id, column_id) in the blob
matrix. The list of all commitments is also provided in row_commitments_bytes.

This function implements the naive algorithm of checking every cell
individually; an efficient algorithm can be found here:
https://ethresear.ch/t/a-universal-verification-equation-for-data-availability-sampling/13240

Expand All @@ -430,10 +451,16 @@ def verify_cell_proof_batch(row_commitments: Sequence[KZGCommitment],

Public method.
"""
assert len(cells_bytes) == len(proofs_bytes) == len(row_ids) == len(column_ids)

# Get commitments via row IDs
commitments = [row_commitments[row_id] for row_id in row_ids]

commitments_bytes = [row_commitments_bytes[row_id] for row_id in row_ids]

# Get objects from bytes
commitments = [bytes_to_kzg_commitment(commitment_bytes) for commitment_bytes in commitments_bytes]
cells = [bytes_to_cell(cell_bytes) for cell_bytes in cells_bytes]
proofs = [bytes_to_kzg_proof(proof_bytes) for proof_bytes in proofs_bytes]

return all(
verify_kzg_proof_multi_impl(commitment, coset_for_cell(column_id), cell, proof)
for commitment, column_id, cell, proof in zip(commitments, column_ids, cells, proofs)
Expand All @@ -445,7 +472,8 @@ def verify_cell_proof_batch(row_commitments: Sequence[KZGCommitment],
### `recover_polynomial`

```python
def recover_polynomial(cell_ids: Sequence[CellID], cells: Sequence[Cell]) -> Polynomial:
def recover_polynomial(cell_ids: Sequence[CellID],
cells_bytes: Sequence[Vector[Bytes32, FIELD_ELEMENTS_PER_CELL]]) -> Polynomial:
"""
Recovers a polynomial from 2 * FIELD_ELEMENTS_PER_CELL evaluations, half of which can be missing.

Expand All @@ -455,7 +483,10 @@ def recover_polynomial(cell_ids: Sequence[CellID], cells: Sequence[Cell]) -> Pol

Public method.
"""
assert len(cell_ids) == len(cells)
assert len(cell_ids) == len(cells_bytes)

cells = [bytes_to_cell(cell_bytes) for cell_bytes in cells_bytes]

assert len(cells) >= CELLS_PER_BLOB // 2
missing_cell_ids = [cell_id for cell_id in range(CELLS_PER_BLOB) if cell_id not in cell_ids]
roots_of_unity_reduced = compute_roots_of_unity(CELLS_PER_BLOB)
Expand Down Expand Up @@ -504,7 +535,7 @@ def recover_polynomial(cell_ids: Sequence[CellID], cells: Sequence[Cell]) -> Pol

eval_shifted_extended_evaluation = fft_field(shifted_extended_evaluation, roots_of_unity_extended)
eval_shifted_zero_poly = fft_field(shifted_zero_poly, roots_of_unity_extended)

eval_shifted_reconstructed_poly = [
div(a, b)
for a, b in zip(eval_shifted_extended_evaluation, eval_shifted_zero_poly)
Expand Down
2 changes: 1 addition & 1 deletion specs/deneb/polynomial-commitments.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ def verify_blob_kzg_proof_batch(blobs: Sequence[Blob],
"""

assert len(blobs) == len(commitments_bytes) == len(proofs_bytes)

commitments, evaluation_challenges, ys, proofs = [], [], [], []
for blob, commitment_bytes, proof_bytes in zip(blobs, commitments_bytes, proofs_bytes):
assert len(blob) == BYTES_PER_BLOB
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
from eth2spec.utils.bls import BLS_MODULUS


def field_element_bytes(x):
return int.to_bytes(x % BLS_MODULUS, 32, "big")


@with_eip7594_and_later
@spec_test
@single_phase
Expand All @@ -34,10 +38,13 @@ def test_verify_cell_proof(spec):
blob = get_sample_blob(spec)
commitment = spec.blob_to_kzg_commitment(blob)
cells, proofs = spec.compute_cells_and_proofs(blob)

cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells]

cell_id = 0
assert spec.verify_cell_proof(commitment, cell_id, cells[cell_id], proofs[cell_id])
assert spec.verify_cell_proof(commitment, cell_id, cells_bytes[cell_id], proofs[cell_id])
cell_id = 1
assert spec.verify_cell_proof(commitment, cell_id, cells[cell_id], proofs[cell_id])
assert spec.verify_cell_proof(commitment, cell_id, cells_bytes[cell_id], proofs[cell_id])


@with_eip7594_and_later
Expand All @@ -47,13 +54,16 @@ def test_verify_cell_proof_batch(spec):
blob = get_sample_blob(spec)
commitment = spec.blob_to_kzg_commitment(blob)
cells, proofs = spec.compute_cells_and_proofs(blob)
cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells]

assert len(cells) == len(proofs)

assert spec.verify_cell_proof_batch(
row_commitments=[commitment],
row_ids=[0],
column_ids=[0, 1],
cells=cells[0:1],
proofs=proofs,
row_commitments_bytes=[commitment],
row_ids=[0, 0],
column_ids=[0, 4],
cells_bytes=[cells_bytes[0], cells_bytes[4]],
proofs_bytes=[proofs[0], proofs[4]],
)


Expand All @@ -73,21 +83,21 @@ def test_recover_polynomial(spec):

# Extend data with Reed-Solomon and split the extended data in cells
cells = spec.compute_cells(blob)
cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells]

# Compute the cells we will be recovering from
cell_ids = []
known_cells = []
# First figure out just the indices of the cells
for i in range(N_SAMPLES):
j = rng.randint(0, spec.CELLS_PER_BLOB)
while j in cell_ids:
j = rng.randint(0, spec.CELLS_PER_BLOB)
cell_ids.append(j)
# Now the cells themselves
known_cells = [cells[cell_id] for cell_id in cell_ids]
known_cells_bytes = [cells_bytes[cell_id] for cell_id in cell_ids]

# Recover the data
recovered_data = spec.recover_polynomial(cell_ids, known_cells)
recovered_data = spec.recover_polynomial(cell_ids, known_cells_bytes)

# Check that the original data match the non-extended portion of the recovered data
assert original_polynomial == recovered_data[:len(recovered_data) // 2]
Expand Down