Skip to content

Commit 686b62c

Browse files
committed
Async Rust ADR
1 parent ae30c40 commit 686b62c

8 files changed

+1830
-0
lines changed

Diff for: docs/adr/0009-async-rust.md

+191
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Using Async Rust
2+
3+
* Status: draft
4+
* Deciders:
5+
* Date: ???
6+
* Previous discussion: https://github.com/mozilla/application-services/pull/5910
7+
8+
## Context and Problem Statement
9+
10+
Our Rust components are currently written using synchronous Rust, however all current consumers wrap them in another class to present an async-style interface:
11+
12+
* Firefox Android uses `CoroutineScope.runBlocking` to adapt methods to be `suspend`
13+
* Firefox iOs mostly uses `DispatchQueue` and completion handlers, but the long-term plan is to transition to `async` functions.
14+
Some components already do this, using `withCheckedThrowingContinuation` on top of `DispatchQueue` to define async functions.
15+
* Firefox Desktop auto-generates C++ wrapper code to make them async using a [TOML config file](https://searchfox.org/mozilla-central/rev/cdfe21b20eacfaa6712dd9821d6383859ce386c6/toolkit/components/uniffi-bindgen-gecko-js/config.toml).
16+
This was chosen because JS is single-threaded and doesn't provide a simple way to run blocking functions in a task queue.
17+
One drawback from this choice is that Desktop has a very different async system compared to the mobile apps.
18+
19+
With the new UniFFI async capabilities, it's possible to move the async code into Rust and avoid this wrapper layer.
20+
This ADR discusses if we should do this, how we could implement it, and what our general approach to async should be.
21+
To make things concrete, the Suggest component is used as an example of what would change if we moved to async Rust.
22+
23+
### Scope
24+
25+
This ADR covers the question of using a wrapper layer to implant async functionality.
26+
It does not cover some related questions:
27+
28+
* **Scheduling this work.**
29+
If we decide to embrace async Rust, we do not need to commit to any particular deadline for switching to it.
30+
* **Wrappers in general.**
31+
[The previous PR](https://github.com/mozilla/application-services/pull/5910/) dedicated a separate ADR for this question, but we never came to a decision on this.
32+
It seems that there's no single answer to the question, the general consensus was that we should evaluate things on a case-by-case basis and this ADR will just focus on the case of async.
33+
* **Wrappers in consumer code.**
34+
If we change things so that the current async wrapping layer is no longer needed, consumer engineers will still have a choice on if they want to keep their current wrapper layer or not.
35+
This choice should be left to consumer engineers.
36+
This ADR will touch on this, but not recommend anything.
37+
38+
## Considered Options
39+
40+
* **Option A: Do nothing.**
41+
Keep the current system in place.
42+
* **Option B: Async Rust using the current UniFFI.**
43+
Make our component methods async by requiring the foreign code to pass in an interface that runs tasks in a worker queue and using
44+
`oneshot::Channel` to communicate the results back. See below for how this would work in practice.
45+
* **Option C: Async Rust with additional UniFFI support.**
46+
Like `B`, but make `WorkerQueue` and `RustTask` built-in UniFFI types.
47+
This would eliminate the need to define our own interfaces for these, instead UniFFI would allow foreign code to passing in `DispatchQueue`/`CoroutineScopes` to Rust and Rust could use those to run blocking tasks in a work queue.
48+
* **Option D: Extend the uniffi-bindgen-gecko-js config system to Mobile.**
49+
Extend the Gecko-JS system, where the config specifies that functions/methods should be wrapped as async, to also work for Kotlin/Swift.
50+
51+
### Example code
52+
53+
I (Ben) made some diffs to illustrate how the code would change for each option.
54+
When doing that, I realized the wrapper layer was actually implementing important functionality for the component:
55+
56+
* The Kotlin code extended our interrupt support to also interrupt pending `query()` calls.
57+
* The Kotlin code also catches all errors and coverts them into regular returns (`query()` returns an empty list and `ingest()` returns false).
58+
* The Swift code split async methods into 2 categories: low-priority calls like `ingest()` and high-priority calls like `query()`
59+
60+
As part of the example changes, I moved this functionality to our Rust components.
61+
This results in some extra code in the diffs, but I thought it was good to explore the messy details of this transition.
62+
A important factor for deciding this ADR is where we want this functionality to live.
63+
64+
#### Option B
65+
- [app-services changes](./files/0009/option-b-app-services.diff)
66+
- [android-components changes](./files/0009/option-b-android-components.diff)
67+
- [Firefox iOS changes](./files/0009/option-b-firefox-ios.diff)
68+
- [Firefox Desktop changes](./files/0009/option-b-firefox-desktop.diff)
69+
70+
This option is possible to implement today, so I was able to test that the app-services and android-components changes actually compiled
71+
I didn't do that for iOS and desktop, mostly because it's harder to perform a local build.
72+
I think we can be confident that the actual changes would look similar to this.
73+
74+
Summary of changes:
75+
* Added the `WorkerQueue` and `RustTask` interfaces
76+
* `RustTask` encapsulates a single task
77+
* `WorkerQueue` is implemented by the foreign side and runs a `RustTask` in a work queue where it's okay to block.
78+
* Use the above interfaces to wrap sync calls to be async, by running them in the `WorkerQueue` then sending the result via a `oneshot::Channel` that the
79+
original async function was `await`ing.
80+
* This is also a good place to do the error conversion/reporting.
81+
* SuggestStoreBuilder gets a `build_async` method, which creates an `SuggestStoreAsync`.
82+
It inputs 2 `WorkerQueue`s: one for ingest and one for everything else.
83+
* Added supporting code so that `query()` can create it's `SqlInterruptScope` ahead of time, outside of the scheduled task.
84+
That way `interrupt()` can also interrupt pending calls to `query()`.
85+
* Updated the `ingest()` method to catch errors and return `false` rather than propagating them.
86+
In general, I think this is a better API for consumers since there's nothing meaningful they can do with errors other than report them.
87+
`query()` should probably also be made infallible.
88+
* Removed the wrapper class from `android-components`, but kept the wrapper class in `firefox-ios`.
89+
The goal here was to show the different options for consumer code.
90+
91+
#### Option C
92+
- [app-services changes](./files/0009/option-c-app-services.diff)
93+
- [android-components changes](./files/0009/option-c-android-components.diff)
94+
- [Firefox iOS changes](./files/0009/option-c-firefox-ios.diff)
95+
- [Firefox Desktop changes](./files/0009/option-c-firefox-desktop.diff)
96+
97+
This option assumes that UniFFI will provide something similar to `WorkerQueue`, so we don't need to define/implement that in app-services or the consumer repos.
98+
This requires changes to UniFFI core, so none of this code works today.
99+
However, I think we can be fairly confident that these changes will work since we have a long-standing [UniFFI PR](https://github.com/mozilla/uniffi-rs/pull/1837) that implements a similar feature -- in fact it's a more complex version.
100+
101+
Summary of changes: essentially the same as `B`, but we don't need to define/implement the `WorkerQueue` and `RustTask` interfaces.
102+
103+
#### Option D
104+
105+
I'm not completely sure how this one would work in practice, but I assume that it would mean TOML configuration for Kotlin/Swift similar to the current Desktop configuration:
106+
107+
```
108+
[suggest.async_wrappers]
109+
# All functions/methods are wrapped to be async by default and must be `await`ed.
110+
enable = true
111+
# These are exceptions to the async wrapping. These functions must not be `await`ed.
112+
main_thread = [
113+
"raw_suggestion_url_matches",
114+
"SuggestStore.new",
115+
"SuggestStore.interrupt",
116+
"SuggestStoreBuilder.new",
117+
"SuggestStoreBuilder.data_path",
118+
"SuggestStoreBuilder.load_extension",
119+
"SuggestStoreBuilder.remote_settings_bucket_name",
120+
"SuggestStoreBuilder.remote_settings_server",
121+
"SuggestStoreBuilder.build",
122+
]
123+
```
124+
125+
This would auto-generate async wrapper code.
126+
For example, the Kotlin code would look something like this:
127+
128+
```
129+
class SuggestStore {
130+
131+
/**
132+
* Queries the database for suggestions.
133+
*/
134+
suspend fun query(query: SuggestionQuery): List<Suggestion> =
135+
withContext(Dispatchers.IO) {
136+
// ...current auto-generated code here
137+
}
138+
```
139+
140+
## Decision Outcome
141+
142+
## Pros and Cons of the Options
143+
144+
### (A) Do nothing
145+
146+
* Good, because it requires the least amount of work
147+
* Good, because it's proven to work
148+
* Bad, because async is handled very differently in JS vs Kotlin+Swift
149+
* Bad, because our component functionality implemented in wrapper code.
150+
It needs to be duplicated or will be only available on one platform.
151+
For example, the Kotlin functionality to interrupt pending `query()` calls would also be useful on Swift, but we never implemented that.
152+
* Bad, because our component functionality is implemented in other repos, which makes it likely to go stale.
153+
However, we could mitigate this moving that Kotlin/Swift wrapper code into app-services.
154+
* Bad, because it makes our documentation system worse. For example, our docs for [query()](https://mozilla.github.io/application-services/kotlin/kotlin-components-docs/mozilla.appservices.suggest/-suggest-store/query.html) say it's a sync function, but it practice it's actually async.
155+
156+
### (B) Async Rust using the current UniFFI
157+
158+
* Good, because we'll have a common system for Async that works similarly all platforms
159+
* Good, because our generated docs will match how the methods are used in practice.
160+
* Good, because it encourages us to move async complexity into the core code.
161+
This makes it available to all platforms and more likely to be maintained.
162+
* Good because it opens the door for more efficient thread usage.
163+
For example, we could make methods more fine-grained and only use the work queue for SQL operations, but not for network requests.
164+
* Bad, because we're taking on risk by introducing the async UniFFI code to app-services.
165+
* Bad, because our consumers need to define all a `WorkerQueue` implementations, which is a bit of a burden.
166+
This feels especially bad on JS, where the whole concept of a work queue feels alien.
167+
* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++.
168+
Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work.
169+
170+
### (C) Async Rust with additional UniFFI support
171+
172+
* Good/Bad/Risky for mostly the same reasons as (B).
173+
* Good, because it removes the need for us and consumers to define `WorkerQueue` traits/impls.
174+
* Good, because it can simplify the `WorkerQueue` code.
175+
In particular, we can guarantee that the task is only executed once, which removes the need for `RustTaskContainer`.
176+
* Bad, because we'll have to maintain the UniFFI worker queue code.
177+
178+
### (D) Extend the uniffi-bindgen-gecko-js config system to Mobile
179+
180+
* Good, because we'll have a common system for Async that works similarly all platforms
181+
* Mostly good, because our generated docs will match how the methods are used in practice.
182+
However, it could be weird to write docstrings for sync functions that are wrapped to be async.
183+
* Good, because it's less Risky than B/C.
184+
The system would just be auto-generating the kind of wrappers we already used.
185+
* Bad, because it's hard for consumer engineers to configure.
186+
For example, how can firefox-ios devs pick which QOS they want to use with a `DispatchQueue`?
187+
They would probably have to specify it in the config file, which is less convent than passing it in from code.
188+
* Bad, because it's not clear how we could handle complex cases like using both a low-priority and high-priority queue.
189+
190+
## Decision Outcome
191+
?

0 commit comments

Comments
 (0)