1
1
---
2
- title : " IPIP-0445: trustless gateway skip-leaves option "
2
+ title : " IPIP-0445: Option to Skip Raw Blocks in Gateway Responses "
3
3
date : 2023-10-09
4
4
ipip : open
5
5
editors :
6
- - name : Hugo VALTIER
6
+ - name : Hugo Valtier
7
7
github : Jorropo
8
8
url : https://jorropo.net/
9
9
affiliation :
10
10
name : Protocol Labs
11
11
url : https://protocol.ai/
12
+ - name : Marcin Rataj
13
+ github : lidel
14
+ url : https://lidel.org/
15
+ affiliation :
16
+ name : Protocol Labs
17
+ url : https://protocol.ai/
12
18
relatedIssues :
13
19
- https://github.com/ipfs/specs/issues/444
14
20
order : 445
@@ -17,88 +23,152 @@ tags: ['ipips']
17
23
18
24
## Summary
19
25
20
- Introduce ` skip-leaves ` flag for the : cite [ trustless-gateway] .
26
+ Introduce ` skip-raw-blocks ` flag for the : cite [ trustless-gateway] .
21
27
22
28
## Motivation
23
29
24
30
Allow clients to read a stream which only contain proofs in a bottom heavy
25
31
graph using ` raw ` codec for it's leaves.
26
32
27
- Usefull with unixfs for features like webseeds [ #444 ] ( https://github.com/ipfs/specs/issues/444 ) .
33
+ Usefull for UnixFS for features like webseeds
34
+ ([ ipfs/specs #444 ] ( https://github.com/ipfs/specs/issues/444 ) ), where metadata
35
+ about a DAG is fetched from a trustless gateway, but the actual raw data can be
36
+ fetched from any source that supports either trustless gateway specification,
37
+ or plain HTTP Range Requests, allowing for trustless and verifiable data
38
+ retrieval from plain HTTP (non-IPFS) data sources.
28
39
29
40
## Detailed design
30
41
31
- The ` skip-leaves ` CAR Content-Type parameter on : cite [ trustless-gateway]
42
+ The ` skip-raw-blocks ` URL query parameter on : cite [ trustless-gateway]
32
43
allows clients to download an entity except blocks with the multicodec
33
44
` raw ` (` 0x55 ` ).
34
45
35
46
- When set to ` y ` , the parameter instructs the gateway not to transmit
36
- blocks tagged with the ` raw ` multicodec.
37
- - If set to ` n ` , or left unspecified, the gateway MUST transmit ` raw `
38
- multicodec blocks.
47
+ blocks referenced with a CID with the ` raw ` multicodec.
48
+ - If set to ` n ` , or left unspecified, there is no special handling of ` raw `
49
+ multicodec blocks (the existing default behavior remains the same) .
39
50
40
51
Importantly, unless explicitly specified as ` y ` , the default operational
41
- mode of the gateway MUST assume the value of ` skip-leaves ` to be ` n ` .
52
+ mode of the gateway MUST assume the value of ` skip-raw-blocks ` to be ` n ` .
42
53
43
54
## Design rationale
44
55
45
56
### User Benefit
46
57
47
- Implementing the ` skip-leaves ` parameter offers several benefits to users:
58
+ Implementing the ` skip-raw-blocks ` parameter offers several benefits to users:
48
59
49
60
1 . ** Verification Flexibility:** Clients can verify out-of-band (OOB) received
50
61
files in their deserialized form without necessitating the transmission of
51
62
raw blocks from the gateway.
63
+
52
64
2 . ** Incremental Download:** Clients can incrementally download files in
53
65
deserialized forms from non-IPFS servers. Allowing applications to share
54
- distribution for IPFS and non IPFS clients.
55
- 3 . ** Efficient Block Discovery:** With the ` skip-leaves ` option enabled,
66
+ distribution for IPFS and non-IPFS clients.
67
+
68
+ 3 . ** Efficient Block Discovery:** With the ` skip-raw-blocks ` option enabled,
56
69
clients can quickly discover numerous candidate blocks without being
57
70
bottlenecked by the gateway's transmission of raw blocks.
58
71
72
+ 4 . ** Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
73
+ over HTTP in deserialized form can now act as sources for specific block
74
+ byte ranges, without having to support any IPFS specific APIs. Plain HTTP
75
+ Range Requests can be used for fetching remaining raw block data, and the
76
+ metadata read via ` skip-raw-blocks=y ` is enough for a client to verify the
77
+ remaining raw block byte ranges fetched from non-IPFS system match expected
78
+ CIDs.
79
+
59
80
### Compatibility
60
81
61
- Setting the default value of the ` skip-leaves ` parameter to ` n ` ensures
82
+ Setting the default value of the ` skip-raw-blocks ` parameter to ` n ` ensures
62
83
backward compatibility with existing clients and systems that are unaware
63
84
of this new flag.
64
85
65
- ### Prevention of Amplification Attacks and Efficient Server Operation
86
+ ### Alternatives
66
87
67
- By utilizing the ` raw ` (` 0x55 ` ) codec servers can trivially determine whether
68
- to fetch or skip a block without having to learn any new information.
69
- Although more limited and not able to handle unixfs file using dag-pb for their
70
- leaves, it allows both the client and server to trivially verify a block
71
- must not be fetched. Preventing issues of Amplification where a server could
72
- need to fetch multiple orders more data than the client when executing the
73
- request.
88
+ An alternative approach would be to request blocks individually.
89
+ However, it adds extra round trips and more per HTTP request overhead
90
+ and thus is undesirable.
74
91
75
- ### Why not ` dag-scope=skip-leaves ` ?
92
+ #### Why not ` dag-scope=skip-raw-blocks ` ?
76
93
77
- The ` dag-scope ` parameter determines the overall range of blocks to retrieve,
78
- while ` skip-leaves ` selectively filters specific blocks within that range .
94
+ The existing ` dag-scope ` parameter determines the overall range of blocks to retrieve,
95
+ while ` skip-raw-blocks ` selectively filters specific blocks across all scopes and ranges .
79
96
Combining them under one parameter would restrict their combined utility.
80
97
81
98
For example:
82
- - A client is streaming a video from a webseed and the user seeked through the
99
+ - A client is streaming a video from a webseed and the user seeks through the
83
100
video, then the client would send ` dag-scope=entity&entity-bytes=42:1337 `
84
- with ` skip-leaves =y ` to download the proofs for the required section of the
85
- video.
86
- - A client is verifying an OOB transfered directory in deserialized form,
87
- then ` dag-scope=all ` with ` skip-leaves =y ` makes sense.
101
+ with ` skip-raw-blocks =y ` to download the proofs for the required section of the
102
+ video, and then fetches remaining raw data byte ranges from a faster CDN .
103
+ - A client is verifying an OOB transferred directory in deserialized form,
104
+ then ` dag-scope=all ` with ` skip-raw-blocks =y ` makes sense.
88
105
89
- ### Alternatives
106
+ #### Why not CAR content type parameter ?
90
107
91
- An alternative approach would be to request blocks individually.
92
- However it adds extra round trips and more per HTTP request overhead
93
- and thus is undesireable.
108
+ CAR content type's
109
+ ([ application/vnd.ipld.car] ( https://www.iana.org/assignments/media-types/application/vnd.ipld.car ) )
110
+ optional parameters like ` order ` and ` dups ` impact the way data is represented
111
+ when returned as a CAR stream, but does modify the scope of the data itself.
112
+ Does not add nor subtract data from the response.
113
+
114
+ The scope of the data is controlled by URL content path and optional
115
+ ` dag-scope ` , ` entity-bytes ` URL parameters. This is where ` skip-raw-blocks `
116
+ belongs.
117
+
118
+ This is not just a matter of aesthetics: the URL path and query parameters
119
+ allow for caching of different subsets of a DAG in a way that is interoperable
120
+ with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
121
+ response due to HTTP cache misconfiguration. Thanks to ` skip-raw-blocks ` being
122
+ in the URL query, we ensure CAR responses without ` raw ` blocks will be cached
123
+ under different key than full responses (just like already existing ` dag-scope `
124
+ and ` entity-bytes ` ).
125
+
126
+ #### Why not generic ` skip-leaves ` that skips all leaves, not just ` raw ` blocks?
127
+
128
+ Prevention of amplification attacks and efficient server operation.
129
+
130
+ By utilizing the ` raw ` (` 0x55 ` ) codec servers can trivially determine whether
131
+ to fetch or skip a block without having to fetch it to learn any new
132
+ information.
133
+
134
+ If we framed this feature around skipping all leaf nodes, that would require
135
+ server to fetch the leaves to learn if they have any child nodes. This would
136
+ force server to fetch data that is never returned to the client.
137
+
138
+ Although ` skip-raw-blocks ` is more limited and not able to handle UnixFS files
139
+ chunked without ` --raw-leaves ` option, it allows both the client and server to
140
+ trivially verify a block must not be fetched. Preventing issues of
141
+ Amplification where a server could need to fetch multiple orders more data than
142
+ the client when executing the request.
94
143
95
144
## Security
96
145
97
- None .
146
+ This IPIP does not impact security model of trustless gateway .
98
147
99
148
## Test fixtures
100
149
101
- TODO
150
+ ::: issue
151
+
152
+ TODO: update below section with CIDs or CARs from conformance tests
153
+
154
+ Scenarios we should check:
155
+ - [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
156
+ ` skip-raw-blocks=n ` , confirm the response includes expected raw leaves' CIDs
157
+ - [ ] create a new CAR fixture that only have non-raw blocks. Request it with
158
+ ` skip-raw-blocks=y ` , confirm the response includes expected CIDs and does not
159
+ include raw blocks referenced by parents.
160
+ - important part is creating CAR fixture by hand, and ensure the raw blocks are
161
+ NEVER announced anywhere (generate fixture with random data, add to ipfs
162
+ with raw-leaves option, then export DAG without ` raw ` blocks (use go-car's
163
+ [ ` filter ` ] ( https://github.com/ipld/go-car/tree/master/cmd/car#readme ) or
164
+ similar)
165
+ - Why? This goes extra mile, but ensures every conformant gateway
166
+ implementation is not doing useless work of fetching raw blocks which are
167
+ not required for fulfilling ` skip-raw-blocks=y ` requests). We did
168
+ similar thing for ` entity-bytes ` and it was the only way we could show
169
+ bugs in Saturn project's cache implementation at the time.
170
+
171
+ :::
102
172
103
173
### Copyright
104
174
0 commit comments