Skip to content

Commit 4c1c5d7

Browse files
authored
Merge pull request #337 from ipfs/feat/delegated-routing-http-api
IPIP-337: Delegated Content Routing HTTP API
2 parents 5079c34 + 573417e commit 4c1c5d7

File tree

2 files changed

+293
-0
lines changed

2 files changed

+293
-0
lines changed
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# IPIP-337: Delegated Content Routing HTTP API
2+
3+
- Start Date: 2022-10-18
4+
- Related Issues:
5+
- https://github.com/ipfs/specs/pull/337
6+
7+
## Summary
8+
9+
This IPIP specifies an HTTP API for delegated content routing.
10+
11+
## Motivation
12+
13+
Idiomatic and first-class HTTP support for delegated routing is an important requirement for large content routing providers,
14+
and supporting large content providers is a key strategy for driving down IPFS content routing latency.
15+
These providers must handle high volumes of traffic and support many users, so leveraging industry-standard tools and services
16+
such as HTTP load balancers, CDNs, reverse proxies, etc. is a requirement.
17+
To maximize compatibility with standard tools, IPFS needs an HTTP API specification that uses standard HTTP idioms and payload encoding.
18+
The [Reframe spec](https://github.com/ipfs/specs/blob/main/reframe/REFRAME_PROTOCOL.md) for delegated content routing is an experimental attempt at this,
19+
but it has resulted in a very unidiomatic HTTP API which is difficult to implement and is incompatible with many existing tools.
20+
The cost of a proper redesign, implementation, and maintenance of Reframe and its implementation is too high relative to the urgency of having a delegated content routing HTTP API.
21+
22+
Note that this does not supplant nor deprecate Reframe. Ideally in the future, Reframe and its implementation would receive the resources needed to map the IDL to idiomatic HTTP,
23+
and implementations of this spec could then be rewritten in the IDL, maintaining backwards compatibility.
24+
25+
We expect this API to be extended beyond "content routing" in the future, so additional IPIPs may rename this to something more general such as "Delegated Routing HTTP API".
26+
27+
## Detailed design
28+
29+
See the [Delegated Content Routing HTTP API spec](../routing/DELEGATED_CONTENT_ROUTING_HTTP.md) included with this IPIP.
30+
31+
## Design rationale
32+
33+
To understand the design rationale, it is important to consider the concrete Reframe limitations that we know about:
34+
35+
- Reframe [method types](../reframe/REFRAME_KNOWN_METHODS.md) using the HTTP transport are encoded inside IPLD-encoded messages
36+
- This prevents URL-based pattern matching on methods, which makes it hard and expensive to do basic HTTP scaling and optimizations:
37+
- Configuring different caching strategies for different methods
38+
- Configuring reverse proxies on a per-method basis
39+
- Routing methods to specific backends
40+
- Method-specific reverse proxy config such as timeouts
41+
- Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter
42+
- This was initially done by URL-escaping the raw bytes
43+
- Not possible to consume correctly using standard JavaScript (see [edelweiss#61](https://github.com/ipld/edelweiss/issues/61))
44+
- Shipped in Kubo 0.16
45+
- Packing a CID into a struct, encoding it with DAG-CBOR, multibase-encoding that, percent-encoding that, and then passing it in a URL, rather than merely passing the CID in the URL, is needlessly complex from a user's perspective, and has already made it difficult to manually construct requests or interpret logs
46+
- Added complexity of "Cacheable" methods supporting both POSTs and GETs
47+
- The required streaming support and message groups add a lot of implementation complexity, but streaming does not currently work for cachable methods sent over HTTP
48+
- Ex for FindProviders, the response is buffered anyway for ETag calculation
49+
- There are no limits on response sizes nor ways to impose limits and paginate
50+
- This is useful for routers that have highly variable resolution time, to send results as soon as possible, but this is not a use case we are focusing on right now and we can add it later
51+
- The Identify method is not implemented because it is not currently useful
52+
- This is because Reframe's ambition is to be a generic catch-all bag of methods across protocols, while delegated routing use case only requires a subset of its methods.
53+
- Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions
54+
- Example: [bug reported by implementer](https://github.com/ipld/edelweiss/issues/62), and [another one](https://github.com/ipld/edelweiss/issues/61)
55+
- The Go implementation is [complex](https://github.com/ipfs/go-delegated-routing/blob/main/gen/proto/proto_edelweiss.go) and [brittle](https://github.com/ipfs/go-delegated-routing/blame/main/client/provide.go#L51-L100), and is currently maintained by IPFS Stewards who are already over-committed with other priorities
56+
- Only the HTTP transport has been designed and implemented, so it's unclear if the existing design will work for other transports, and what their use cases and requirements are
57+
- This means Reframe can't be trusted to be transport-agnostic until there is at least a second transport implemented (e.g. as a reframe-over-libp2p protocol)
58+
- There's naming confusion around "Reframe, the protocol" and "Reframe, the set of methods"
59+
60+
So this API proposal makes the following changes:
61+
62+
- The Delegated Content Routing API is defined using HTTP semantics, and can be implemented without introducing Reframe concepts nor IPLD
63+
- There is a clear distinction between the RPC protocol (HTTP) and the API (Deleged Content Routing)
64+
- "Method names" and cache-relevant parameters are pushed into the URL path
65+
- Streaming support is removed, and default response size limits are added.
66+
- We will add streaming support in a subsequent IPIP, but we are trying to minimize the scope of this IPIP to what is immediately useful
67+
- Bodies are encoded using idiomatic JSON, instead of using IPLD codecs, and are compatible with OpenAPI specifications
68+
- The JSON uses human-readable string encodings of common data types
69+
- CIDs are encoded as CIDv1 strings with a multibase prefix (e.g. base32), for consistency with CLIs, browsers, and [gateway URLs](https://docs.ipfs.io/how-to/address-ipfs-on-web/)
70+
- Multiaddrs use the [human-readable format](https://github.com/multiformats/multiaddr#specification) that is used in existing tools and Kubo CLI commands such as `ipfs id` or `ipfs swarm peers`
71+
- Byte array values, such as signatures, are multibase-encoded strings (with an `m` prefix indicating Base64)
72+
- The "Identify" method and "message groups" are not included
73+
- The "GetIPNS" and "PutIPNS" methods are not included
74+
75+
### User benefit
76+
77+
The cost of building and operating content routing services will be much lower, as developers will be able to maximally reuse existing industry-standard tooling.
78+
Users will not need to learn a new RPC protocol and tooling to consume or expose the API.
79+
This will result in more content routing providers, each providing a better experience for users, driving down content routing latency across the IPFS network
80+
and increasing data availability.
81+
82+
### Compatibility
83+
84+
#### Backwards Compatibility
85+
86+
IPFS Stewards will implement this API in [go-delegated-routing](https://github.com/ipfs/go-delegated-routing), using breaking changes in a new minor version.
87+
Because the existing Reframe spec can't be safely used in JavaScript and we won't be investing time and resources into changing the wire format implemented in edelweiss to fix it,
88+
the experimental support for Reframe in Kubo will be deprecated in the next release and delegated content routing will subsequently use this HTTP API.
89+
We may decide to re-add Reframe support in the future once these issues have been resolved.-
90+
91+
#### Forwards Compatibility
92+
93+
Standard HTTP mechanisms for forward compatibility are used:
94+
95+
- The API is versioned using a version number prefix in the path
96+
- The `Accept` and `Content-Type` headers are used for content type negotiation, allowing for backwards-compatible additions of new MIME types, hypothetically such as:
97+
- `application/cbor` for binary-encoded responses
98+
- `application/x-ndjson` for streamed responses
99+
- `application/octet-stream` if the content router can provide the content/block directly
100+
- New paths+methods can be introduced in a backwards-compatible way
101+
- Parameters can be added using either new query parameters or new fields in the request/response body.
102+
- Provider records are both opaque and versioned to allow evolution of schemas and semantics for the same transfer protocol
103+
104+
As a proof-of-concept, the tests for the initial implementation of this HTTP API were successfully tested with a libp2p transport using [libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http), demonstrating viability for also using this API over libp2p.
105+
106+
### Security
107+
108+
- All CID requests are sent to a central HTTPS endpoint as plain text, with TLS being the only protection against third-party observation.
109+
- While privacy is not a concern in the current version, plans are underway to add a separate endpoint that prioritizes lookup privacy. Follow the progress in related pre-work in [IPIP-272 (double hashed DHT)](https://github.com/ipfs/specs/pull/373/) and [ipni#5 (reader privacy in indexers)](https://github.com/ipni/specs/pull/5).
110+
- The usual JSON parsing rules apply. To prevent potential Denial of Service (DoS) attack, clients should ignore responses larger than 100 providers and introduce a byte size limit that is applicable to their use case.
111+
112+
### Alternatives
113+
114+
- Reframe (general-purpose RPC) was evaluated, see "Design rationale" section for rationale why it was not selected.
115+
116+
### Copyright
117+
118+
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Delegated Content Routing HTTP API
2+
3+
![reliable](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square) Delegated Content Routing HTTP API
4+
5+
**Author(s)**:
6+
7+
- Gus Eggert
8+
9+
**Maintainer(s)**:
10+
11+
* * *
12+
13+
**Abstract**
14+
15+
"Delegated content routing" is a mechanism for IPFS implementations to use for offloading content routing to another process/server. This spec describes an HTTP API for delegated content routing.
16+
17+
## API Specification
18+
19+
The Delegated Content Routing Routing HTTP API uses the `application/json` content type by default.
20+
21+
As such, human-readable encodings of types are preferred. This spec may be updated in the future with a compact `application/cbor` encoding, in which case compact encodings of the various types would be used.
22+
23+
## Common Data Types
24+
25+
- CIDs are always string-encoded using a [multibase](https://github.com/multiformats/multibase)-encoded [CIDv1](https://github.com/multiformats/cid#cidv1).
26+
- Multiaddrs are string-encoded according to the [human-readable multiaddr specification](https://github.com/multiformats/multiaddr#specification)
27+
- Peer IDs are string-encoded according [PeerID string representation specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation)
28+
- Multibase bytes are string-encoded according to [the Multibase spec](https://github.com/multiformats/multibase), and *should* use base64.
29+
- Timestamps are Unix millisecond epoch timestamps
30+
31+
Until required for business logic, servers should treat these types as opaque strings, and should preserve unknown JSON fields.
32+
33+
### Versioning
34+
35+
This API uses a standard version prefix in the path, such as `/v1/...`. If a backwards-incompatible change must be made, then the version number should be increased.
36+
37+
### Provider Records
38+
39+
A provider record contains information about a content provider, including the transfer protocol and any protocol-specific information useful for fetching the content from the provider.
40+
41+
The information required to write a record to a router (*"write" provider records*) may be different than the information contained when reading provider records (*"read" provider records*).
42+
43+
For example, indexers may require a signature in `bitswap` write records for authentication of the peer contained in the record, but the read records may not include this authentication information.
44+
45+
Both read and write provider records have a minimal required schema as follows:
46+
47+
```json
48+
{
49+
"Protocol": "<transfer_protocol_name>",
50+
"Schema": "<transfer_protocol_schema>",
51+
...
52+
}
53+
```
54+
55+
Where:
56+
57+
- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec)
58+
- `Schema` denotes the schema to use for encoding/decoding the record
59+
- This is separate from the `Protocol` to allow this HTTP API to evolve independently of the transfer protocol
60+
- Implementations should switch on this when parsing records, not on `Protocol`
61+
- `...` denotes opaque JSON, which may contain information specific to the transfer protocol
62+
63+
Specifications for some transfer protocols are provided in the "Transfer Protocols" section.
64+
65+
## API
66+
67+
### `GET /routing/v1/providers/{CID}`
68+
69+
#### Response codes
70+
71+
- `200` (OK): the response body contains 0 or more records
72+
- `404` (Not Found): must be returned if no matching records are found
73+
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints
74+
75+
#### Response Body
76+
77+
```json
78+
{
79+
"Providers": [
80+
{
81+
"Protocol": "<protocol_name>",
82+
"Schema": "<schema>",
83+
...
84+
}
85+
]
86+
}
87+
```
88+
89+
Response limit: 100 providers
90+
91+
Each object in the `Providers` list is a *read provider record*.
92+
93+
## Pagination
94+
95+
This API does not support pagination, but optional pagination can be added in a backwards-compatible spec update.
96+
97+
## Streaming
98+
99+
This API does not currently support streaming, however it can be added in the future through a backwards-compatible update by using a content type other than `application/json`.
100+
101+
## Error Codes
102+
103+
- `501` (Not Implemented): must be returned if a method/path is not supported
104+
- `429` (Too Many Requests): may be returned along with optional [Retry-After](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After) header to indicate to the caller that it is issuing requests too quickly
105+
- `400` (Bad Request): must be returned if an unknown path is requested
106+
107+
## CORS and Web Browsers
108+
109+
Browser interoperability requires implementations to support
110+
[CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS).
111+
112+
JavaScript client running on a third-party Origin must be able to send HTTP
113+
request to the endpoints defined in this specification, and read the received
114+
values. This means HTTP server implementing this API must (1) support
115+
[CORS preflight requests](https://developer.mozilla.org/en-US/docs/Glossary/Preflight_request)
116+
sent as HTTP OPTIONS, and (2) always respond with headers that remove CORS
117+
limits, allowing every site to query the API for results:
118+
119+
```plaintext
120+
Access-Control-Allow-Origin: *
121+
Access-Control-Allow-Methods: GET, OPTIONS
122+
```
123+
124+
## Known Transfer Protocols
125+
126+
This section contains a non-exhaustive list of known transfer protocols (by name) that may be supported by clients and servers.
127+
128+
### Bitswap
129+
130+
Multicodec name: `transport-bitswap`
131+
Schema: `bitswap`
132+
Specification: [ipfs/specs/BITSWAP.md](https://github.com/ipfs/specs/blob/main/BITSWAP.md)
133+
134+
#### Bitswap Read Provider Records
135+
136+
```json
137+
{
138+
"Protocol": "transport-bitswap",
139+
"Schema": "bitswap",
140+
"ID": "12D3K...",
141+
"Addrs": ["/ip4/..."]
142+
}
143+
```
144+
145+
- `ID`: the [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) to contact
146+
- `Addrs`: a list of known multiaddrs for the peer
147+
- This list may be incomplete or incorrect and should only be treated as *hints* to improve performance by avoiding extra peer lookups
148+
149+
The server should respect a passed `transport` query parameter by filtering against the `Addrs` list.
150+
151+
### Filecoin Graphsync
152+
153+
Multicodec name: `transport-graphsync-filecoinv1`
154+
Schema: `graphsync-filecoinv1`
155+
Specification: [ipfs/go-graphsync/blob/main/docs/architecture.md](https://github.com/ipfs/go-graphsync/blob/main/docs/architecture.md)
156+
157+
#### Filecoin Graphsync Read Provider Records
158+
159+
```json
160+
{
161+
"Protocol": "transport-graphsync-filecoinv1",
162+
"Schema": "graphsync-filecoinv1",
163+
"ID": "12D3K...",
164+
"Addrs": ["/ip4/..."],
165+
"PieceCID": "<cid>",
166+
"VerifiedDeal": true,
167+
"FastRetrieval": true
168+
}
169+
```
170+
171+
- `ID`: the peer ID of the provider
172+
- `Addrs`: a list of known multiaddrs for the provider
173+
- `PieceCID`: the CID of the [piece](https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece) within which the data is stored
174+
- `VerifiedDeal`: whether the deal corresponding to the data is verified
175+
- `FastRetrieval`: whether the provider claims there is an unsealed copy of the data available for fast retrieval

0 commit comments

Comments
 (0)