From 1f8064cc39fd61c0a33b0514257858b62192dc2d Mon Sep 17 00:00:00 2001 From: James Graham Date: Tue, 21 Sep 2021 11:16:22 +0100 Subject: [PATCH] RFC 98: Remote channels for cross-browsing-group communication --- rfcs/remote_channel.md | 616 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 616 insertions(+) create mode 100644 rfcs/remote_channel.md diff --git a/rfcs/remote_channel.md b/rfcs/remote_channel.md new file mode 100644 index 00000000..bfdb966d --- /dev/null +++ b/rfcs/remote_channel.md @@ -0,0 +1,616 @@ +# RFC 98: Cross-Window Channels + +## Summary + +Provide a channel API which provides support for server-mediated +messaging between browsing contexts, including those which are in +different browsing context groups, and so cannot communicate purely +from the client. + +## Details + +### Use Case + +The modern web platform has features which allow isolating different +browsing contexts and scripting environments from each other such that +they are unable to communicate. For example windows opened with the +`noopener` attribute will not have a handle to their parent, nor will +the parent have a handle to the child. Similarly, cross-origin +navigations with the `cross-origin-opener-policy` header appropriately +set create a new browsing context group which is totally isolated from +the prior context. Typically in implementations each browsing context +group is assigned a unique OS-level process. + +This creates some problems for testing; because testharness tests are +running in a single browsing context, and test that involves a context +that is isolated from the context containing the test will be unable +to communicate test results back to the harness using web platform +APIs. + +Given the importance of isolation in the modern web platform, it's +important to improve support for tests involving multiple browsing +context groups in web-platform-tests. Otherwise test authors will +either not write tests for these scenarios, or write them using +browser-specific techniques. + +### Prior Art + +* Gecko tests able to provide cross-context communication using the + parent (i.e. browser UI) process as an intermediary. The + [SpecialPowers.spwan](https://searchfox.org/mozilla-central/source/testing/specialpowers/content/SpecialPowersChild.jsm#1547-1584) + API available to gecko tests allows running a function in another + context, using structured clone to pass the arguments + ([example](https://searchfox.org/mozilla-central/source/dom/tests/browser/browser_data_document_crossOriginIsolated.js#17)). + +* The + [cross-origin-opener-policy](https://github.com/web-platform-tests/wpt/tree/7b0ebaccc62b566a1965396e5be7bb2bc06f841f/html/cross-origin-opener-policy) + tests currently in web-platform-tests use a bespoke + [dispatcher.py](https://github.com/web-platform-tests/wpt/blob/7b0ebaccc62b566a1965396e5be7bb2bc06f841f/html/cross-origin-opener-policy/resources/dispatcher.py) + to allow cross-origin communication. This uses a queue (implemented + as a list) stored in the wpt server stash object as a message + channel. Each context is given a UUID in a URL parameter named + `uuid`, and uses HTTP GET requests to check for any message in the + queue stored in the stash entry with that UUID. Other contexts that + know the UUID can POST a message that's appended to the queue. + +* WebDriver also has the ability to communicate between different + contexts, and [RFC + 89](https://github.com/web-platform-tests/rfcs/pull/89) proposed + adding an `execute_script` method to testdriver to enable running a + script in a different context. However the design of WebDriver and + wptrunner means that only one context can communicate using + WebDriver at a time, and for testharness we must always return + control to the test window, since WebDriver is also used to + communicate test results back to the harness. The single threaded + nature of WebDriver, and requirement that all testdriver actions go + via the main test window, make it hard to build an ergonomic + cross-context messaging API. + +### Proposal + +All the above examples of prior art have some attractive features, and +it's possible to combine them in a way that should provide an +ergonomic API for web-platform-tests + +* The Gecko SpecialPowers API appears to offer the best ergonomics, + providing a relatively high level API for executing script in a + remote context. However obviously the implementation depends on + gecko specifics. + +* The dispatcher framework provides an addressing mechanism (the + `uuid` parameter in URLs) that is workable in the context of + web-platform-tests, and the use of the server stash as the backend + makes sense for wpt. However the requirement to poll the server for + messages, and the relatively low level API based on passing strings + around, seem like areas for improvement. + +The following subsections will set out a proposal for an API that +combines some of these strengths. + +#### Addressing + +Contexts that want to participate in messaging must have a parameter +called `uuid` in their URL, with a value that's a UUID. This will be +used to identify the channel dedicated to messages sent to that +context. If the context is navigated the new document may reuse the +same UUID if it wants to share the same message queue. Otherwise it's +an error to use the same UUID for multiple contexts. + +#### Backend + +The message passing backend is based on a Python +[Queue](https://docs.python.org/3/library/queue.html) object per +channel, stored in the server stash. The advantage of using queues +here is that they are designed allow blocking reads. This means we +don't have to use a polling API but can, on the server side, wait for +a message to be added to a channel, and immediately forward the +message to the client. Because the stash itself runs in a single +Python process and communicates with the server processes via IPC, it +is sufficient to use a thread-based `Queue` rather than requiring a +full process per queue. + +#### Low-Level API + +The low-level API for messaging is based on the concept of +Multiple-Producer-Single-Consumer (MPSC) channels. Each channel is +identified by a UUID. When a channel is created a pair of objects are +returned, a `RecvChannel` to read from the channel and a `SendChannel` +to write to the channel. The `SendChannel` may be cloned to create +new writers, but only a single `RecvChannel` per channel is +permitted. This limitation is deigned to enforce good design in which +there are no races between different readers to get a message. In +particular, the intended use is not for broadcasting tasks to multiple +consumers, but to message a specific browsing context. Browsing +contexts will be able to create a `RecvChannel` with the UUID in their +URL and use that to receive messages from other contexts. + +A `SendChannel` has a `send(obj)` method. This causes the object to be +encoded, first using the remote object serialization (see below) and +then as a JSON string, before being sent to the queue. + +A `RecvChannel` must call its async `connect()` method to start +receiving messages. Messages are first deserialized from JSON and then +undergo remote object serialization. Once connected, the +`RecvChannel` acts like an event target; consumers can call the +`addEventListener(fn)` method to add a callback function when a +message is received. Alternatively the async `next()` function will +return the next message to be received. + +The implementation of channels is based on websockets, with one +websocket per `Channel` object (i.e. a different websocket is used for +a `SendChannel` vs a `RecvChannel` even when they correspond to the +same underlying queue). On the backend the websocket handler function +for a `SendChannel` listens for incoming messages and writes them to +the corresponding `Queue`, whilst the websocket handler for a +`RecvChannel` essentially blocks on the `get()` method for the `Queue` +and writes and message it receives to the socket. In detail some +additional care is needed to ensure that the websocket handler +functions shutdown when the socket is closed. + +#### Remote Object Serialization + +In order to allow passing complex JavaScript objects between contexts, +a serialization format based on the [WebDriver +BiDi](https://w3c.github.io/webdriver-bidi/#type-common-RemoteValue) +proposal is used. An object is represented using a JSON object as +follows: +```js +{ + type: Name of the object type, + value: A representation of the object in a JSON-compatible form + where possible, + objectId: A unique id assigned to the object (not for primitives) +} +``` + +So, for example, the array `[1, "foo", {"bar": null}]` is represented as: + +```js +{ + "type": "array", + "objectId": , + "value": [ + { + type: "number", + value: 1 + }, + { + "type": "string", + "value": "foo" + }, + "type": "object", + "objectId": , + "value": { + "bar": { + "type": null + } + } + ] +} +``` + +In addition to the types specified in the WebDriver-BiDi +specification, `SendChannel` is given first class support with +`"type": "sendchannel"` and `value` set to the UUID of the +channel. This enables an important pattern: to receive messages from a +remote context, you can send it a `SendChannel` object to use for +responses. + +For deserialization, primitive values are converted back to +primitives, but complex values are represented by a `RemoteObject` +type. In cases like arrays where there is a `value` field holding a +container object, `RemoteObject.toLocal()` recursively converts the +content of the container into local objects (so e.g. a `type: array` is +converted into a local `Array` instances, and any contained `type: object` +objects are converted into local `Object` instances, and so on through +the full tree). + +#### Higher Level API + +The low-level API provides required primitives, but it's difficult to +use directly. To achieve the aim of making tests no harder to write +than SpecialPowers-based Gecko tests, there is also a higher-level API +initially providing two main capabilities: the ability to +`postMessage` a remote context, and the ability to `executeScript` so +that the script runs in a remote context. + +This API is provided by a `RemoteWindow` object. The `RemoteWindow` +object doesn't handle creating the browsing context, but given the +UUID for the remote window, creates a `SendChannel` which is able to +send messages to the remote. Alternatively the `RemoteWindow` may be +created first and its `uuid` property used when constructing the URL. + +Inside the remote browsing context itself, the test author has to call +`await start_window()` in order to set up a `RecvChannel` with UUID given +by the `uuid` parameter in `location.href`. The returned object offers +an `addMessageHandler(callback)` API to receive messages sent with the +`postMessage` API on `RemoteWindow`, and `async nextMessage()` to wait +for the next message. + +##### Script Execution + +`RemoteWindow.executeScript(fn, ...args)` serializes the function +`fn`, using `Function.toString()`. Each argument is +serialized using the remote value serialization algorithm. Along with +the function string and the arguments, a `SendChannel` is sent for the +command response (only one such channel is created per `RemoteWindow` +for efficiency reasons), and a command id is sent to disambiguate +responses from different commands. + +On the remote side the function is deserialized and executed. If +execution results in a `Promise` value, the result of that promise is +awaited. The final return value after the promise is resolved is sent +back and forms the async return value of the `executeScript` call. If +the script throws, the thrown value is provided as the result, and +re-thrown in the originating context. In addition an `exceptionDetails` +field provides the line/column numbers of the original exception, +where available. + +In addition there is a `RemoteWindow.executeScriptNoResult(fn, +...args)` method. This works the same way except no channel is passed, +and so no result is returned. This can be useful in case the script +does something like trigger a navigation, so there's no need to +synchronize the navigation starting (which will close the socket) with +writing the response. + +TODO: the naming here isn't great. In particular a `RemoteWindow` +could actually be some other kind of global like a worker, and +`start_window()` is a pretty nondescript method name. + +#### Navigation and bfcache + +For specific use cases around bfcache, it's important to be able to +ensure that no network connections (including websockets) remain open +at the time of navigation, otherwise the page will be excluded from +bfcache. In the current prototype this is handled as follows: + +* A `pause` method on `SendChannel`. This causes a server-initiated + disconnect of the corresponding `RecvChannel` websocket. This is + pretty confusing! The idea is to allow a page to send a command that + will initiate a navigation, then without knowing when the navigation + is done, send further commands that will be processed when the + `RecvChannel` reconnects. If the commands are sent before the + navigation, but not processed, they can be buffered by the remote + and then lost during navigation. An alternative here would be a more + explicit protocol in which the remote has to send an explicit + message to the test page indicating that it's done navigating and + it's safe to send more commands. But the way the messaging works, + it's hard for a random page that's loaded to initiate a connection + to the top-level test context For example, consider a test page T + with a channel pair allowing communication with remote window A. If + A navigates to B, there's no simple mechanism for B to create a + channel to T. One could get around this by e.g. putting the uuid of + the existing `SendChannel` from A to T into the URL of B and + constructing it from there, but that's quite fiddly and doesn't work + in cases where the URL is immutable e.g. history navigation. + +* A `closeAllChannelSockets()` method. This just closes all the + open websockets associated with channels in the context in which + it's called. Any channel then has to be reconnected to be used + again. This doesn't feel especially elegant since it's violating + layering, but it does mean you can be relatively confident that + calling `closeAllChannelSockets()` right before navigating will + leave you in a state with no open websocket connections (unless + something happens to reopen one before the navigation starts). + +One possible alternative to this would be to build a poll-based +`RecvChannel` specifically for bfcache use cases. This would only +receive new messages when a `poll()` method is called, so would put +the remote page fully in control of network connections. Similarly a +`poll()` based `SendChannel()` could ensure that a network connection +is only created when actually sending a message. + +### API Summary + +```ts +/** +* Create a new channel pair +*/ +function channel(): [RecvChannel, SendChannel] {} + +/** +* Channel used to send messages +*/ +class SendChannel() { + uuid: string + + /** + * Create a SendChannel from a given UUID + */ + constructor(uuid: string) {} + + /** + * Connect to the channel. Automatically called when sending the + * first message + */ + async connect() {} + + /** + * Close the channel and underlying websocket connection + */ + async close() {} + + /** + * Send a message `msg`. The message object must be JSON-serializable. + */ + async send() {msg: Object} + + /** + * Disconnect the RecvChannel, if any, on the server side + */ + async pause() {} +} + +/** + * Channel used to handle messages. Not directly constructable. + */ +class RecvChannel() { + uuid: string + + /** + * Connect to the channel and start handling messages. + */ + async connect() {} + + /** + * Close the channel and underlying websocket connection + */ + async close() {} + + /** + * Add a message handler function + */ + addEventListener(fn: (msg: Object) => void) {} + + /** + * Remove a message handler function + */ + removeEventListener(fn: (msg: Object) => void) {} + + /** + * Wait for the next message and return it (after passing it to + * existing handlers) + */ + async next(): Promise {} +} + +/** + * Start listening for RemoteWindow messages on a channel defined by a `uuid` in `location.href` + */ +async start_window(): Promise {} + +/** + * Handler for RemoteWindow commands + */ + +class RemoteWindowCommandRecvChannel { + /** + * Connect to the channel and start handling messages. + */ + async connect() {} + + /** + * Close the channel and underlying websocket connection + */ + async close() {} + + /** + * Add a handler for `postMessage` messages + */ + addEventListener(fn: (msg: Object) => void) {} + + /** + * Remove a handler for `postMessage` messages + */ + removeEventListener(fn: (msg: Object) => void) {} + + /** + * Wait for the next `postMessage` message and return it (after passing it to + * existing handlers) + */ + async nextMessage(): Promise {} +} + +class RemoteWindow { + /** + * Create a RemoteWindow. The dest parameter is either a + `SendChannel` object or the UUID for the channel. If ommitted a new + UUID is generated. + */ + constructor(dest?: SendChannel | string) {} + + /** + * Connect to the channel. Automatically called when sending the + * first message + */ + async connect() {} + + /** + * Close the channel and underlying websocket connections + */ + async close() {} + + /** + * Disconnect the RecvChannel running in the remote context, if any, + * on the server side + */ + async pause() {} + + /** + * Post the object msg to the remote, using JSON serialization + */ + postMessage(msg: Object) {} + + /** + * Run the function `fn` in the remote context, passing arguments + * `args`, and return the result after awaiting any returned + * promise. + * + * Arguments and return values are serialized as RemoteObjects. + */ + async executeScript(fn: (args: ...any) => any, ..args: any): Promise {} + + /** + * Run the function `fn` in the remote context, passing arguments + * `args`, but without returning a result + * + * Arguments are serialized as RemoteObjects. + */ + async executeScriptNoResult(fn: (args: ...any) => any, ..args: any) {} +} + +/** + * Representation of a non-primitive type passed through a channel + */ +class RemoteObject { + type: string; + value: any; + objectId: string | undefined; + + /** + * Recursively convert the object to a local type (where possible) + * so eg. a remote array is converted into a local array. + * + * Objects without a meaningful local representation are passed back unchanged. + */ + toLocal(): any {} +} + +/** + * Close all websockets in the current global that are being used for channels. + */ +function closeAllChannelSockets() {} + +/// +``` + +### Resource Management + +With the stash-based approach, it's important to clean up the stash +once no channels remain. Otherwise we end up leaking queues. The +general approach is to explicitly store a refcounf as part of the +stash value. Each socket that connects increments the refcount, and +decrements it once the socket is closed. Access to the stash is +protected by a lock, so only one socket may touch the refcount at a +time. In practice the fact that there can only be one `RecvChannel` +per queue means that we can just count `SendChannel` sockets and +enforce an at-most-one rule for `RecvChannel`. The underlying queue is +deleted when no socket connections remain. This does mean that if we +were to e.g. put a message on a queue, then delete all the channels +before forwarding the message, and then use the UUID to create a new +`RecvChannel` the queue would be deleted and the message lost. One can +imagine this scenario happening when navigating a remote context which +has a `RecvChannel` tied to the UUID in the URL, if the `SendChannel` +for the context isn't kept alive. Another possibility is to only +delete the queue if it's empty, and register an explicit completion +callback in the test that deletes any queue with a UUID known to the +test context. Of course this doesn't help if a channel is created in a +non-test context (e.g. for messaging between multiple remotes). + +## Example + +test.html + +```html + +executeScript example + +``` + +child.html + +```html +

FAIL

+

PASS

+ +``` + +## Possible Future Additions + +The primitives here could be integrated more completely with +testharness.js. For example we could use a `RemoteWindow` as a source +of tests in `fetch_tests_from_window`. Alternatively, or in addition +to that, we could integrate asserts with script execution, so that a +remote context could include a minimal testharness.js and enable +something like: + +```js +promise_test(t => { + let r = new RemoteWindow(); + window.open(`file.html?uuid=${r.uuid}`, "_blank", "noopener"); + await r.executeScript(() => { + assert_equals(window.opener, null) + }); +}); +``` + +Currently this is possible if `file.html` is put in `resources` or +`support` directory so it can't be detected as a test. But it would +perhaps be better to have a minimal subset of testharness available +without providing the API surface that only makes sense in a test +window. + +It may be possible in the future to replace the backend with a +WebDriver BiDi based message passing system that would avoid the need +for websockets. By sticking close to WebDriver BiDi proposed semantics +the transition may even be seamless. + +testdriver integration is possible. For example we could add +`RemoteContext.testdriver.click` to execute a click in the remote +context (and similarly for the remainder of the testdriver +API). testdriver in this case would identify the target window by +looking for a window with the appropriate `uuid` parameter in its +`location.href`. + +## Risks + +This has been largely covered elsewhere in the document, but several +risks stand out: + +* This is a relatively large API addition which will require ongoing + maintenance. + +* The websocket based transport adds complexity that can be hard for + test authors to understand, particularly in tests that are + themselves affected by the presence of the websocket connection + (e.g. bfcache). The semantics around queue deletion may also be + tricky. Broken tests may be hard to debug. + +* Extensive use of promises in the API means using async/await is very + desirable for readable code in the implementation and in the + tests. This may cause issues with older js implementations. However + this has been available in most browsers since around 2017, so it + seems unlikely to be a real problem. + +* The queue based backend increases the impact of resource leaks vs a + backend based on lists, since additional python threads are used in + the queue implementation. + +* Having to manually pass around UUIDs to identify channels makes it + easy to author broken tests (e.g. trying to reuse the same UUID + across multiple windows). + +## References + +[PR 29803](https://github.com/web-platform-tests/wpt/pull/29803) +contains a prototype implementation of this. + + +