|
| 1 | +# Using Async Rust |
| 2 | + |
| 3 | +* Status: draft |
| 4 | +* Deciders: |
| 5 | +* Date: ??? |
| 6 | +* Previous discussion: https://github.com/mozilla/application-services/pull/5910 |
| 7 | + |
| 8 | +## Context and Problem Statement |
| 9 | + |
| 10 | +Our Rust components are currently written using synchronous Rust, however all current consumers wrap them in another class to present an async-style interface: |
| 11 | + |
| 12 | +* Firefox Android uses `CoroutineScope.runBlocking` to adapt methods to be `suspend` |
| 13 | +* Firefox iOs mostly uses `DispatchQueue` and completion handlers, but the long-term plan is to transition to `async` functions. |
| 14 | + Some components already do this, using `withCheckedThrowingContinuation` on top of `DispatchQueue` to define async functions. |
| 15 | +* Firefox Desktop auto-generates C++ wrapper code to make them async using a [TOML config file](https://searchfox.org/mozilla-central/rev/cdfe21b20eacfaa6712dd9821d6383859ce386c6/toolkit/components/uniffi-bindgen-gecko-js/config.toml). |
| 16 | + This was chosen because JS is single-threaded and doesn't provide a simple way to run blocking functions in a task queue. |
| 17 | + One drawback from this choice is that Desktop has a very different async system compared to the mobile apps. |
| 18 | + |
| 19 | +With the new UniFFI async capabilities, it's possible to move the async code into Rust and avoid this wrapper layer. |
| 20 | +This ADR discusses if we should do this, how we could implement it, and what our general approach to async should be. |
| 21 | +To make things concrete, the Suggest component is used as an example of what would change if we moved to async Rust. |
| 22 | + |
| 23 | +### Scope |
| 24 | + |
| 25 | +This ADR covers the question of using a wrapper layer to implant async functionality. |
| 26 | +It does not cover some related questions: |
| 27 | + |
| 28 | +* **Scheduling this work.** |
| 29 | + If we decide to embrace async Rust, we do not need to commit to any particular deadline for switching to it. |
| 30 | +* **Wrappers in general.** |
| 31 | + [The previous PR](https://github.com/mozilla/application-services/pull/5910/) dedicated a separate ADR for this question, but we never came to a decision on this. |
| 32 | + It seems that there's no single answer to the question, the general consensus was that we should evaluate things on a case-by-case basis and this ADR will just focus on the case of async. |
| 33 | +* **Wrappers in consumer code.** |
| 34 | + If we change things so that the current async wrapping layer is no longer needed, consumer engineers will still have a choice on if they want to keep their current wrapper layer or not. |
| 35 | + This choice should be left to consumer engineers. |
| 36 | + This ADR will touch on this, but not recommend anything. |
| 37 | + |
| 38 | +## Considered Options |
| 39 | + |
| 40 | +* **Option A: Do nothing.** |
| 41 | + Keep the current system in place. |
| 42 | +* **Option B: Async Rust using the current UniFFI.** |
| 43 | + Make our component methods async by requiring the foreign code to pass in an interface that runs tasks in a worker queue and using |
| 44 | +`oneshot::Channel` to communicate the results back. See below for how this would work in practice. |
| 45 | +* **Option C: Async Rust with additional UniFFI support.** |
| 46 | + Like `B`, but make `WorkerQueue` and `RustTask` built-in UniFFI types. |
| 47 | + This would eliminate the need to define our own interfaces for these, instead UniFFI would allow foreign code to passing in `DispatchQueue`/`CoroutineScopes` to Rust and Rust could use those to run blocking tasks in a work queue. |
| 48 | +* **Option D: Extend the uniffi-bindgen-gecko-js config system to Mobile.** |
| 49 | + Extend the Gecko-JS system, where the config specifies that functions/methods should be wrapped as async, to also work for Kotlin/Swift. |
| 50 | + |
| 51 | +### Example code |
| 52 | + |
| 53 | +I (Ben) made some diffs to illustrate how the code would change for each option. |
| 54 | +When doing that, I realized the wrapper layer was actually implementing important functionality for the component: |
| 55 | + |
| 56 | +* The Kotlin code extended our interrupt support to also interrupt pending `query()` calls. |
| 57 | +* The Kotlin code also catches all errors and coverts them into regular returns (`query()` returns an empty list and `ingest()` returns false). |
| 58 | +* The Swift code split async methods into 2 categories: low-priority calls like `ingest()` and high-priority calls like `query()` |
| 59 | + |
| 60 | +As part of the example changes, I moved this functionality to our Rust components. |
| 61 | +This results in some extra code in the diffs, but I thought it was good to explore the messy details of this transition. |
| 62 | +A important factor for deciding this ADR is where we want this functionality to live. |
| 63 | + |
| 64 | +#### Option B |
| 65 | + - [app-services changes](./files/0009/option-b-app-services.diff) |
| 66 | + - [android-components changes](./files/0009/option-b-android-components.diff) |
| 67 | + - [Firefox iOS changes](./files/0009/option-b-firefox-ios.diff) |
| 68 | + - [Firefox Desktop changes](./files/0009/option-b-firefox-desktop.diff) |
| 69 | + |
| 70 | +This option is possible to implement today, so I was able to test that the app-services and android-components changes actually compiled |
| 71 | +I didn't do that for iOS and desktop, mostly because it's harder to perform a local build. |
| 72 | +I think we can be confident that the actual changes would look similar to this. |
| 73 | + |
| 74 | +Summary of changes: |
| 75 | +* Added the `WorkerQueue` and `RustTask` interfaces |
| 76 | + * `RustTask` encapsulates a single task |
| 77 | + * `WorkerQueue` is implemented by the foreign side and runs a `RustTask` in a work queue where it's okay to block. |
| 78 | +* Use the above interfaces to wrap sync calls to be async, by running them in the `WorkerQueue` then sending the result via a `oneshot::Channel` that the |
| 79 | + original async function was `await`ing. |
| 80 | + * This is also a good place to do the error conversion/reporting. |
| 81 | +* SuggestStoreBuilder gets a `build_async` method, which creates an `SuggestStoreAsync`. |
| 82 | + It inputs 2 `WorkerQueue`s: one for ingest and one for everything else. |
| 83 | +* Added supporting code so that `query()` can create it's `SqlInterruptScope` ahead of time, outside of the scheduled task. |
| 84 | + That way `interrupt()` can also interrupt pending calls to `query()`. |
| 85 | +* Updated the `ingest()` method to catch errors and return `false` rather than propagating them. |
| 86 | + In general, I think this is a better API for consumers since there's nothing meaningful they can do with errors other than report them. |
| 87 | + `query()` should probably also be made infallible. |
| 88 | +* Removed the wrapper class from `android-components`, but kept the wrapper class in `firefox-ios`. |
| 89 | + The goal here was to show the different options for consumer code. |
| 90 | + |
| 91 | +#### Option C |
| 92 | + - [app-services changes](./files/0009/option-c-app-services.diff) |
| 93 | + - [android-components changes](./files/0009/option-c-android-components.diff) |
| 94 | + - [Firefox iOS changes](./files/0009/option-c-firefox-ios.diff) |
| 95 | + - [Firefox Desktop changes](./files/0009/option-c-firefox-desktop.diff) |
| 96 | + |
| 97 | +This option assumes that UniFFI will provide something similar to `WorkerQueue`, so we don't need to define/implement that in app-services or the consumer repos. |
| 98 | +This requires changes to UniFFI core, so none of this code works today. |
| 99 | +However, I think we can be fairly confident that these changes will work since we have a long-standing [UniFFI PR](https://github.com/mozilla/uniffi-rs/pull/1837) that implements a similar feature -- in fact it's a more complex version. |
| 100 | + |
| 101 | +Summary of changes: essentially the same as `B`, but we don't need to define/implement the `WorkerQueue` and `RustTask` interfaces. |
| 102 | + |
| 103 | +#### Option D |
| 104 | + |
| 105 | +I'm not completely sure how this one would work in practice, but I assume that it would mean TOML configuration for Kotlin/Swift similar to the current Desktop configuration: |
| 106 | + |
| 107 | +``` |
| 108 | +[suggest.async_wrappers] |
| 109 | +# All functions/methods are wrapped to be async by default and must be `await`ed. |
| 110 | +enable = true |
| 111 | +# These are exceptions to the async wrapping. These functions must not be `await`ed. |
| 112 | +main_thread = [ |
| 113 | + "raw_suggestion_url_matches", |
| 114 | + "SuggestStore.new", |
| 115 | + "SuggestStore.interrupt", |
| 116 | + "SuggestStoreBuilder.new", |
| 117 | + "SuggestStoreBuilder.data_path", |
| 118 | + "SuggestStoreBuilder.load_extension", |
| 119 | + "SuggestStoreBuilder.remote_settings_bucket_name", |
| 120 | + "SuggestStoreBuilder.remote_settings_server", |
| 121 | + "SuggestStoreBuilder.build", |
| 122 | +] |
| 123 | +``` |
| 124 | + |
| 125 | +This would auto-generate async wrapper code. |
| 126 | +For example, the Kotlin code would look something like this: |
| 127 | + |
| 128 | +``` |
| 129 | +class SuggestStore { |
| 130 | +
|
| 131 | + /** |
| 132 | + * Queries the database for suggestions. |
| 133 | + */ |
| 134 | + suspend fun query(query: SuggestionQuery): List<Suggestion> = |
| 135 | + withContext(Dispatchers.IO) { |
| 136 | + // ...current auto-generated code here |
| 137 | + } |
| 138 | +``` |
| 139 | + |
| 140 | +## Decision Outcome |
| 141 | + |
| 142 | +## Pros and Cons of the Options |
| 143 | + |
| 144 | +### (A) Do nothing |
| 145 | + |
| 146 | +* Good, because it requires the least amount of work |
| 147 | +* Good, because it's proven to work |
| 148 | +* Bad, because async is handled very differently in JS vs Kotlin+Swift |
| 149 | +* Bad, because our component functionality implemented in wrapper code. |
| 150 | + It needs to be duplicated or will be only available on one platform. |
| 151 | + For example, the Kotlin functionality to interrupt pending `query()` calls would also be useful on Swift, but we never implemented that. |
| 152 | +* Bad, because our component functionality is implemented in other repos, which makes it likely to go stale. |
| 153 | + However, we could mitigate this moving that Kotlin/Swift wrapper code into app-services. |
| 154 | +* Bad, because it makes our documentation system worse. For example, our docs for [query()](https://mozilla.github.io/application-services/kotlin/kotlin-components-docs/mozilla.appservices.suggest/-suggest-store/query.html) say it's a sync function, but it practice it's actually async. |
| 155 | + |
| 156 | +### (B) Async Rust using the current UniFFI |
| 157 | + |
| 158 | +* Good, because we'll have a common system for Async that works similarly all platforms |
| 159 | +* Good, because our generated docs will match how the methods are used in practice. |
| 160 | +* Good, because it encourages us to move async complexity into the core code. |
| 161 | + This makes it available to all platforms and more likely to be maintained. |
| 162 | +* Good because it opens the door for more efficient thread usage. |
| 163 | + For example, we could make methods more fine-grained and only use the work queue for SQL operations, but not for network requests. |
| 164 | +* Bad, because we're taking on risk by introducing the async UniFFI code to app-services. |
| 165 | +* Bad, because our consumers need to define all a `WorkerQueue` implementations, which is a bit of a burden. |
| 166 | + This feels especially bad on JS, where the whole concept of a work queue feels alien. |
| 167 | +* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++. |
| 168 | + Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work. |
| 169 | + |
| 170 | +### (C) Async Rust with additional UniFFI support |
| 171 | + |
| 172 | +* Good/Bad/Risky for mostly the same reasons as (B). |
| 173 | +* Good, because it removes the need for us and consumers to define `WorkerQueue` traits/impls. |
| 174 | +* Good, because it can simplify the `WorkerQueue` code. |
| 175 | + In particular, we can guarantee that the task is only executed once, which removes the need for `RustTaskContainer`. |
| 176 | +* Bad, because we'll have to maintain the UniFFI worker queue code. |
| 177 | + |
| 178 | +### (D) Extend the uniffi-bindgen-gecko-js config system to Mobile |
| 179 | + |
| 180 | +* Good, because we'll have a common system for Async that works similarly all platforms |
| 181 | +* Mostly good, because our generated docs will match how the methods are used in practice. |
| 182 | + However, it could be weird to write docstrings for sync functions that are wrapped to be async. |
| 183 | +* Good, because it's less Risky than B/C. |
| 184 | + The system would just be auto-generating the kind of wrappers we already used. |
| 185 | +* Bad, because it's hard for consumer engineers to configure. |
| 186 | + For example, how can firefox-ios devs pick which QOS they want to use with a `DispatchQueue`? |
| 187 | + They would probably have to specify it in the config file, which is less convent than passing it in from code. |
| 188 | +* Bad, because it's not clear how we could handle complex cases like using both a low-priority and high-priority queue. |
| 189 | + |
| 190 | +## Decision Outcome |
| 191 | +? |
0 commit comments