Skip to content

Commit 131b29d

Browse files
committed
ipip-445: rename to skip-raw-blocks URL param
+ basic editorials
1 parent c1e121e commit 131b29d

File tree

2 files changed

+131
-59
lines changed

2 files changed

+131
-59
lines changed

src/http-gateways/trustless-gateway.md

+26-24
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,28 @@ returned:
183183
returned to the client, the HTTP status code has already been sent to the
184184
client.
185185

186+
### :dfn[skip-raw-blocks] (request query parameter)
187+
188+
The optional `skip-raw-blocks` parameter is available only for CAR requests.
189+
190+
It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in
191+
the CAR response.
192+
193+
It accepts two values:
194+
- `y`: Blocks with `raw` multicodec MUST NOT be returned.
195+
- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks.
196+
197+
When not specified a gateway implementation MUST assume `n`.
198+
199+
:::note Notes for implementers
200+
201+
A `skip-raw-blocks=y` request for a content path with `raw` root CID does not
202+
make sense and SHOULD NOT be sent by clients.
203+
204+
A Gateway SHOULD return HTTP error 400 Bad Request
205+
206+
:::
207+
186208
# HTTP Response
187209

188210
Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].
@@ -212,10 +234,10 @@ The Body hash MUST match the Multihash from the requested CID.
212234

213235
# CAR Responses (application/vnd.ipld.car)
214236

215-
A CAR stream for the requested
216-
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
217-
content type (with optional `order`, `dups` and `skip-leaves` params), path and optional
218-
`dag-scope` and `entity-bytes` URL parameters.
237+
A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
238+
with optional `order` and `dups` content type parameters) for the requested
239+
content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks`
240+
URL parameters).
219241

220242
## CAR version
221243

@@ -301,26 +323,6 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as
301323
the raw data is already present in the parent block that links to the identity
302324
CID.
303325

304-
## CAR `skip-leaves` (content type parameter)
305-
306-
The `skip-leaves` parameter specifies whether blocks with the multicodec `raw`
307-
`0x55` must be sent.
308-
309-
It accepts two values:
310-
- `y`: Blocks with `raw` multicodec MUST NOT be sent.
311-
- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent.
312-
313-
A gateway MUST NOT assume this field is `y` if unspecified.
314-
When not specified it always MUST be understood as `n`.
315-
316-
:::note Notes for implementers
317-
318-
A request which is rooted at a `raw` block and has `skip-leaves=y` does not
319-
make sense and SHOULD NOT be sent by clients, it is fair for servers to
320-
error in this situation.
321-
322-
:::
323-
324326
## CAR format parameters and determinism
325327

326328
The default header and block order in a CAR format is not specified by IPLD specifications.

src/ipips/ipip-0445.md

+105-35
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,20 @@
11
---
2-
title: "IPIP-0445: trustless gateway skip-leaves option"
2+
title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses"
33
date: 2023-10-09
44
ipip: open
55
editors:
6-
- name: Hugo VALTIER
6+
- name: Hugo Valtier
77
github: Jorropo
88
url: https://jorropo.net/
99
affiliation:
1010
name: Protocol Labs
1111
url: https://protocol.ai/
12+
- name: Marcin Rataj
13+
github: lidel
14+
url: https://lidel.org/
15+
affiliation:
16+
name: Protocol Labs
17+
url: https://protocol.ai/
1218
relatedIssues:
1319
- https://github.com/ipfs/specs/issues/444
1420
order: 445
@@ -17,88 +23,152 @@ tags: ['ipips']
1723

1824
## Summary
1925

20-
Introduce `skip-leaves` flag for the :cite[trustless-gateway].
26+
Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway].
2127

2228
## Motivation
2329

2430
Allow clients to read a stream which only contain proofs in a bottom heavy
2531
graph using `raw` codec for it's leaves.
2632

27-
Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444).
33+
Usefull for UnixFS for features like webseeds
34+
([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata
35+
about a DAG is fetched from a trustless gateway, but the actual raw data can be
36+
fetched from any source that supports either trustless gateway specification,
37+
or plain HTTP Range Requests, allowing for trustless and verifiable data
38+
retrieval from plain HTTP (non-IPFS) data sources.
2839

2940
## Detailed design
3041

31-
The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway]
42+
The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway]
3243
allows clients to download an entity except blocks with the multicodec
3344
`raw` (`0x55`).
3445

3546
- When set to `y`, the parameter instructs the gateway not to transmit
36-
blocks tagged with the `raw` multicodec.
37-
- If set to `n`, or left unspecified, the gateway MUST transmit `raw`
38-
multicodec blocks.
47+
blocks referenced with a CID with the `raw` multicodec.
48+
- If set to `n`, or left unspecified, there is no special handling of `raw`
49+
multicodec blocks (the existing default behavior remains the same).
3950

4051
Importantly, unless explicitly specified as `y`, the default operational
41-
mode of the gateway MUST assume the value of `skip-leaves` to be `n`.
52+
mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`.
4253

4354
## Design rationale
4455

4556
### User Benefit
4657

47-
Implementing the `skip-leaves` parameter offers several benefits to users:
58+
Implementing the `skip-raw-blocks` parameter offers several benefits to users:
4859

4960
1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
5061
files in their deserialized form without necessitating the transmission of
5162
raw blocks from the gateway.
63+
5264
2. **Incremental Download:** Clients can incrementally download files in
5365
deserialized forms from non-IPFS servers. Allowing applications to share
54-
distribution for IPFS and non IPFS clients.
55-
3. **Efficient Block Discovery:** With the `skip-leaves` option enabled,
66+
distribution for IPFS and non-IPFS clients.
67+
68+
3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled,
5669
clients can quickly discover numerous candidate blocks without being
5770
bottlenecked by the gateway's transmission of raw blocks.
5871

72+
4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
73+
over HTTP in deserialized form can now act as sources for specific block
74+
byte ranges, without having to support any IPFS specific APIs. Plain HTTP
75+
Range Requests can be used for fetching remaining raw block data, and the
76+
metadata read via `skip-raw-blocks=y` is enough for a client to verify the
77+
remaining raw block byte ranges fetched from non-IPFS system match expected
78+
CIDs.
79+
5980
### Compatibility
6081

61-
Setting the default value of the `skip-leaves` parameter to `n` ensures
82+
Setting the default value of the `skip-raw-blocks` parameter to `n` ensures
6283
backward compatibility with existing clients and systems that are unaware
6384
of this new flag.
6485

65-
### Prevention of Amplification Attacks and Efficient Server Operation
86+
### Alternatives
6687

67-
By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
68-
to fetch or skip a block without having to learn any new information.
69-
Although more limited and not able to handle unixfs file using dag-pb for their
70-
leaves, it allows both the client and server to trivially verify a block
71-
must not be fetched. Preventing issues of Amplification where a server could
72-
need to fetch multiple orders more data than the client when executing the
73-
request.
88+
An alternative approach would be to request blocks individually.
89+
However, it adds extra round trips and more per HTTP request overhead
90+
and thus is undesirable.
7491

75-
### Why not `dag-scope=skip-leaves` ?
92+
#### Why not `dag-scope=skip-raw-blocks` ?
7693

77-
The `dag-scope` parameter determines the overall range of blocks to retrieve,
78-
while `skip-leaves` selectively filters specific blocks within that range.
94+
The existing `dag-scope` parameter determines the overall range of blocks to retrieve,
95+
while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges.
7996
Combining them under one parameter would restrict their combined utility.
8097

8198
For example:
82-
- A client is streaming a video from a webseed and the user seeked through the
99+
- A client is streaming a video from a webseed and the user seeks through the
83100
video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
84-
with `skip-leaves=y` to download the proofs for the required section of the
85-
video.
86-
- A client is verifying an OOB transfered directory in deserialized form,
87-
then `dag-scope=all` with `skip-leaves=y` makes sense.
101+
with `skip-raw-blocks=y` to download the proofs for the required section of the
102+
video, and then fetches remaining raw data byte ranges from a faster CDN.
103+
- A client is verifying an OOB transferred directory in deserialized form,
104+
then `dag-scope=all` with `skip-raw-blocks=y` makes sense.
88105

89-
### Alternatives
106+
#### Why not CAR content type parameter ?
90107

91-
An alternative approach would be to request blocks individually.
92-
However it adds extra round trips and more per HTTP request overhead
93-
and thus is undesireable.
108+
CAR content type's
109+
([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
110+
optional parameters like `order` and `dups` impact the way data is represented
111+
when returned as a CAR stream, but does modify the scope of the data itself.
112+
Does not add nor subtract data from the response.
113+
114+
The scope of the data is controlled by URL content path and optional
115+
`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks`
116+
belongs.
117+
118+
This is not just a matter of aesthetics: the URL path and query parameters
119+
allow for caching of different subsets of a DAG in a way that is interoperable
120+
with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
121+
response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being
122+
in the URL query, we ensure CAR responses without `raw` blocks will be cached
123+
under different key than full responses (just like already existing `dag-scope`
124+
and `entity-bytes`).
125+
126+
#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?
127+
128+
Prevention of amplification attacks and efficient server operation.
129+
130+
By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
131+
to fetch or skip a block without having to fetch it to learn any new
132+
information.
133+
134+
If we framed this feature around skipping all leaf nodes, that would require
135+
server to fetch the leaves to learn if they have any child nodes. This would
136+
force server to fetch data that is never returned to the client.
137+
138+
Although `skip-raw-blocks` is more limited and not able to handle UnixFS files
139+
chunked without `--raw-leaves` option, it allows both the client and server to
140+
trivially verify a block must not be fetched. Preventing issues of
141+
Amplification where a server could need to fetch multiple orders more data than
142+
the client when executing the request.
94143

95144
## Security
96145

97-
None.
146+
This IPIP does not impact security model of trustless gateway.
98147

99148
## Test fixtures
100149

101-
TODO
150+
:::issue
151+
152+
TODO: update below section with CIDs or CARs from conformance tests
153+
154+
Scenarios we should check:
155+
- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
156+
`skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs
157+
- [ ] create a new CAR fixture that only have non-raw blocks. Request it with
158+
`skip-raw-blocks=y`, confirm the response includes expected CIDs and does not
159+
include raw blocks referenced by parents.
160+
- important part is creating CAR fixture by hand, and ensure the raw blocks are
161+
NEVER announced anywhere (generate fixture with random data, add to ipfs
162+
with raw-leaves option, then export DAG without `raw` blocks (use go-car's
163+
[`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or
164+
similar)
165+
- Why? This goes extra mile, but ensures every conformant gateway
166+
implementation is not doing useless work of fetching raw blocks which are
167+
not required for fulfilling `skip-raw-blocks=y` requests). We did
168+
similar thing for `entity-bytes` and it was the only way we could show
169+
bugs in Saturn project's cache implementation at the time.
170+
171+
:::
102172

103173
### Copyright
104174

0 commit comments

Comments
 (0)