Retrieval Attestation

### Checklist

- [X] This is **not** a new feature or an enhancement to the Filecoin protocol. If it is,  please open an [FIP issue](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0001.md).
- [X] This is **not** brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on [the Boost forum](https://github.com/filecoin-project/boost/discussions/categories/ideas) and select the category as `Ideas`.
- [X] I **have** a specific, actionable, and well motivated feature request to propose.

### Boost component

- [ ] boost daemon - storage providers
- [ ] boost client
- [ ] boost UI
- [ ] boost data-transfer
- [ ] boost index-provider
- [X] Other

### What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

In Filecoin Station, we are building SPARK - a module that periodically checks retrievability of content from Filecoin Storage Providers. At the moment, we are adding FIL rewards for performing these checks. In order to combat fraud, we would like Boost to provide retrieval attestations that will allow 3rd parties to verify that a client performed a retrieval request from a particular provider.

You can learn more in [SPARK Content retrieval attestation](https://www.notion.so/SPARK-Content-retrieval-attestation-a6df8198d5a248ceac4c004ee971759c?pvs=21) and [Meridian Design Doc 03: Evaluation dissected](https://www.notion.so/Meridian-Design-Doc-03-Evaluation-dissected-52803c22ee564e2ab8a86756fffa2693?pvs=21)·

In short:·

- We want the SPARK MERidian evaluation service to be able to verify that the SPARK module performed the retrieval check as the SPARK orchestrator defined it.
- We also want a generic solution that can be used by other projects retrieving content from Filecoin and IPFS. This is, in particular, important to prevent retrieval servers (e.g. SPs) from being able to distinguish SPARK retrieval requests from requests made by other clients.


### Describe the solution you'd like

(1)
The retrieval client performing a retrieval request includes a new field in the request metadata - `retrieval_id` containing a string value. We recommend clients send a SHA-384 hash of the actual identifier.

(2)
The retrieval server returns an attestation signed with the server’s private key - the same key as used for libp2p peer identity. The attestation payload includes the following metadata:·
 - `retrieval_id` supplied by the client,
 - `cid` being requested, and
 - `protocol` used (bitswap, graphsync, http).
 - (Possibly more in the future.)

This is a high-level proposal that intentionally excludes details. I’d like us to first agree whether this feature is feasible at the high level, before we dive deeper into details.


### Describe alternatives you've considered

_No response_

### Additional context

**What kind of feedback I am looking for**

- Is this feasible to implement in Boost? Is the design, as detailed below, compatible with the project’s vision? Can you suggest a better approach?
- What are the next steps to make this happen? Our timeline is a bit tight, we would like to get this rolled out to at least some SPs by end of September 2023.
- Who are the best people to help us clarify implementation details and lay out a plan to get this done & deployed to SPs running Boost?
- We will need retrieval clients like Lassie to support this new feature, too. For example, Lassie should forward the retrieval id from the client to the SP and then forward the attestation token back from SP to the client. Do you have any suggestions on how to get this done?

**How this helps SPARK** 

- In our current design, each retrieval job is defined by a centralised orchestrator service, which assigns a unique id to each job. (In the future, we want to move to a decentralised orchestrator based on DRAND. I believe such a design will still give us a way how to deterministically derive a unique retrieval id.)
- The SPARK module will deterministically derive `retrieval_id` from this job id, perform the retrieval check and finally send the attestation returned by SP alongside other retrieval statistics.
- The fraud detection service will verify that the retrieval attestation reported by the client matches the retrieval job as it was defined by the orchestrator.
- Dishonest clients will be unable to create fake attestations. The client must send a new retrieval request to the designated SP for each job. (Unless the client and the SP are colluding and the client has access to SP’s private key, but we have other measures to minimise the impact of that.)
    - Clients cannot reuse an attestation from a different job because the retrieval id in the attestation would not match the retrieval id derived from the job id.
    - Clients must send the retrieval request to the SP defined by the job. Otherwise, the signature in the attestation would not match the public key in SP’s multiaddr.
    - Clients must retrieve the given CID using the given protocol. Otherwise, the attestation payload would show different values than expected by the verifier.

**How this can help other retrieval clients**

I feel the proposal is generic enough to support different usages. Since the retrieval id is a hash of arbitrary data, it’s possible to pack literally anything into the retrieval id, get the SP to sign that, and later verify that the SP signed the expected field values.

Ideally, we would like to have “Proof of Retrieval”. Unfortunately, such proof is still an open problem. We think that Retrieval Attestation can get us somewhat closer to that ideal.

For example, I can imagine a browser service worker retrieving content from a dCDN like Saturn can use the retrieval attestation to attribute credit to the specific SP that provided the content to serve the request, allowing content providers to reward SPs based on how many retrievals they helped to serve.

The proposed format based on JWT can be extended to support signature chains, e.g. the outer attestation token created by an untrusted gateway can wrap an inner attestation token produced by the SP from which the gateway retrieved the content.

**Technical details: retrieval id**

The implementation should support arbitrary formats of retrieval ids. However, we recommend all clients use a SHA-384 hash of the original retrieval identifier.

- Why a hash:
    - We want requests coming from different clients to be indistinguishable from each other. This way, storage providers cannot prioritise specific classes of clients (like SPARK and Reputation Bot) to provide better performance for the clients and artificially improve their reputation scores.
    - In decentralised networks, mapping a retrieval id (like GUID) to request properties is often not feasible. Instead, we must compose the retrieval id from the properties needed for subsequent verification. For example, if we want to verify that a retrieval was performed by a peer with a given `peer_id` using the DRAND seed from the epoch `N`, we can compose the retrieval id as `N;peer_id`, e.g. `539;12D3KooWRH71QRJe5vrMp6zZXoH4K7z5MDSWwTXXPriG9dK8HQXk`. Now if we send this string as the retrieval id, then the remote party can inspect the format of the string to guess what software is making the request. Additionally, the payload can be too large for the underlying protocol. Hashing the original id solves both issues.
- Why SHA-384:
    - SHA-256 is vulnerable to [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack). I don’t have any particular attack vector in mind; just being cautious.
    - [Blake3](https://github.com/BLAKE3-team/BLAKE3) would be a great solution, but it’s a hot new thing a thus not supported natively by browsers yet.
    - SHA-384 seems to be a good compromise - it’s not vulnerable to length extension attacks, and it’s widely supported: Go, Rust and [WebCrypto API](https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest#supported_algorithms) in browsers and Node.js.

**Technical details: attestation string**

I propose using [JWT](https://jwt.io/introduction) for the attestation string. JWT is a widely used format with good support in many programming languages. It’s used by other projects in the Web3 space, too - most notably [UCAN](https://ucan.xyz/).

In its compact form, JSON Web Tokens consist of three parts separated by dots (`.`), which are:

- Header
- Payload
- Signature

Therefore, a JWT typically looks like this: `Header.Payload.Signature`

**JWT Header**

```json
{
  "alg": "EdDSA",
  "typ": "JWT",
  "rav": "0.1.0"
}
```

This is a standard JWT header, plus the extra `rav` field.

- alg — the encryption algorithm used to create the attestation
- typ — the type of token this is, this will always be ‘JWT’
- rav — “Retrieval Attestation version” (so we can track the format of when it was issued)

**JWT Payload**

```json
{
  "iss": "12D3KooWRH71QRJe5vrMp6zZXoH4K7z5MDSWwTXXPriG9dK8HQXk",
  "retrv_rid": "38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b",
  "retrv_cid": "bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni",
  "retrv_proto": "graphsync"
}

```

- `iss` - “Issuer” ID of who created the attestation - the public key from the libp2p identity of the peer serving the retrieval. This field is defined by the JWT standard.
- `retrv_rid`: the retrieval id provided by the client
- `retrv_cid`: the CID retrieved
- `retrv_proto:` the protocol used - `graphsync`, `bitswap` or `http`

We expect more fields will be added in the future. For example, when a retrieval request specifies an IPLD selector, the attestation payload can include `retrv_selector` field describing what subset of the Merkle tree was requested. 

For the initial version, we want to introduce only the fields needed by SPARK.

**JWT Signature**

Quoting from [JWT Introduction](https://jwt.io/introduction)

> To create the signature part you have to take the encoded header, the encoded payload, a secret, the algorithm specified in the header, and sign that.
> 

> For example if you want to use the HMAC SHA256 algorithm, the signature will be created in the following way:
> 

```jsx
HMACSHA256(
  base64UrlEncode(header) + "." +
  base64UrlEncode(payload),
  secret)

```

> The signature is used to verify the message wasn't changed along the way, and, in the case of tokens signed with a private key, it can also verify that the sender of the JWT is who it says it is.
> 

Of course, we will use a different algorithm than HMAC SHA256. Maybe Ed25519? The algorithm will most likely depend on the algorithm used by the libp2p identity key-pair.

_Tagging @juliangruber, @patrickwoodhead and @willscott for visibility._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieval Attestation #1597

Checklist

Boost component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Retrieval Attestation #1597

Description

Checklist

Boost component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions