|
| 1 | +# Replace logging and error reporting infrastructure with tracing. |
| 2 | + |
| 3 | +* Status: proposed |
| 4 | +* Deciders: |
| 5 | + * AppServices component teams |
| 6 | + * sync and related components: (decider name, date) |
| 7 | + * suggest, remote-settings, etc (ie, DISCO): (decider name, date) |
| 8 | + * (please add your team and decider name here!) |
| 9 | + * moz-central releng/build teams: ? |
| 10 | +* Date: May 2025 |
| 11 | +* Feedback deadline: May 15 2025. |
| 12 | + |
| 13 | +## Context and Problem Statement |
| 14 | + |
| 15 | +Rust code in [application-services](https://github.com/mozilla/application-services/) needs support for diagnostics. |
| 16 | +Specifically, we need support for logging and for explicit error reporting, as described below. |
| 17 | + |
| 18 | +### Logging |
| 19 | + |
| 20 | +Our components must be able to generate logging. These logs must be useful both when a component is being developed |
| 21 | +(eg, sent to the console when running unit tests etc), but also useful when being run in a product |
| 22 | +(eg, sent to the browser and captured normally for that browser - eg, the "console" on desktop, logcat on Android, etc.) |
| 23 | + |
| 24 | +We currently use the `log` crate and wire this up best we can. |
| 25 | + |
| 26 | +### Error Reporting |
| 27 | + |
| 28 | +There's an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error), |
| 29 | +designed explicitly for error reporting for the applications. Unlike logins, these error reports are intended to |
| 30 | +be reported to an external service - eg, Android and iOS both use Sentry; Desktop doesn't have this, but it should. |
| 31 | + |
| 32 | +### Problems with the current approaches. |
| 33 | + |
| 34 | +#### Problems using `log` |
| 35 | + |
| 36 | +The main problem with the `log` module is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html) |
| 37 | +that's set for all crates. |
| 38 | +Gecko sets the global max level to `Info` - anything more verbose causes [performance regressions in other crates](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215). |
| 39 | + |
| 40 | +In practice, this means there are cases where our browsers are unable to see debug logs from our Rust code. |
| 41 | + |
| 42 | +#### Problems with the error reporter |
| 43 | + |
| 44 | +None identified for mobile - but is not implemented at all for Desktop. |
| 45 | + |
| 46 | +The opportunity is to better align the logging and error-reporting requirements into a single facility while introducing this capability |
| 47 | +to desktop. In the first instance, we will arrange for Javascript code running in Desktop Firefox to be notified on such reports, |
| 48 | +but a final decision about what it does with them is TBD. |
| 49 | + |
| 50 | +## The Rust tracing crate. |
| 51 | + |
| 52 | +An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate, |
| 53 | +which comes from the tokio project. |
| 54 | + |
| 55 | +`tracing` has a very similar API to the `log` crate - `log::trace!` becomes `tracing::trace!` etc. |
| 56 | +It has richer semantics than `log` (eg, async support) and largely acts a replacement - |
| 57 | +with appropriate configuration it supports the `RUST_LOG` environment variable and writes to `stdout`, |
| 58 | +so developers who are running tests and our CI etc should notice no difference. |
| 59 | + |
| 60 | +Importantly, it has a [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) |
| 61 | +mechanism that allows efficient, more targettted filtering instead of a global `max_level`. |
| 62 | +Each subscriber is able to filter log events without impacting crates they aren't subscribed to. |
| 63 | +This does mean it's impossible to have a global subscriber to see "all" logs - someone must explicitly subscribe by trace name. |
| 64 | +However, it is precisely this characteristic which makes this option avoid causing performance issues in unrelated crates. |
| 65 | + |
| 66 | +This means that it should be viable to have some crates capture `trace!()` output without impacting any other crates |
| 67 | +or the overall performance of the application. |
| 68 | + |
| 69 | +The merino project performed a [similar evaluation](https://github.com/mozilla-services/merino/blob/main/docs/adrs/adr_0001_logging.md) |
| 70 | +and although with different constraints, also ended up choosing `tracing`. |
| 71 | + |
| 72 | +## Proposal: Move to tracing for all app-services crates. |
| 73 | + |
| 74 | +his document proposes that we move to a system based on `tracing`. We will do this by: |
| 75 | + |
| 76 | +* Leveraging that all crates already depend on `error_reporting`, so have it export logging functionality based on `tracing`, including testing facilities. |
| 77 | + |
| 78 | +* All app-services crates move to `error_reporting::debug!()` instead of `log::debug!()` etc. |
| 79 | + |
| 80 | +* All exiting handling of log events be replaced with a new mechanism based on |
| 81 | + [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) |
| 82 | + to move logs and error-reports across the FFI. |
| 83 | + |
| 84 | +* We use the same mechanism for error reporting, leveraging the richer metadata offered by tracing. |
| 85 | + |
| 86 | +## Implementation: Move to tracing for all app-services crates. |
| 87 | + |
| 88 | +This section describes the changes necessary to implement the above. |
| 89 | + |
| 90 | +### Move all our crates to `tracing` |
| 91 | + |
| 92 | +Most crates use `log::debug!()`/`log::error!()`; these all change to `use error_support::debug`/`info!("hi")` calls. |
| 93 | + |
| 94 | +Testss starting with `env_logger::try_init().unwrap()` or similar will have an equivalent test helper. |
| 95 | + |
| 96 | +### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism. |
| 97 | + |
| 98 | +This subscriber mechanism requires the application to know all tracing `targets` it cares about. |
| 99 | +Each crate is its own target and the app must explicitly "subscribe" to all targets individually. |
| 100 | +It will *not* be possible to subscribe to all targets and it's unlikely we'll allow |
| 101 | +any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match. |
| 102 | + |
| 103 | +This requires our applications to configure their own subscriptions to each `target` with the level for that target, |
| 104 | +making it possible to avoid a single, global max-level. |
| 105 | + |
| 106 | +We'll implement this subscriber with a simple `HashMap` mapping the target name to a level. |
| 107 | +Once we have determined a subscriber matches an event, we can perform relatively expensive operations |
| 108 | +on the event because we assume action will be taken on the event. |
| 109 | +This assumption seems fine because we own all the subscribers. |
| 110 | + |
| 111 | +An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data, |
| 112 | +and using them to format a string, and dispatching the end result to the underlying logging |
| 113 | +system. |
| 114 | + |
| 115 | +This is generic enough to handle traditional "log" messages and our error reporting requirements. It's a general event reporting system. |
| 116 | + |
| 117 | +[There's a WIP for all the above here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing) |
| 118 | + |
| 119 | +### Replace all existing "subscribers" |
| 120 | + |
| 121 | +There are 3 main places which would change in the first instance, broken down by platform |
| 122 | + |
| 123 | +#### Mobile |
| 124 | + |
| 125 | +[A WIP for this is also included here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing) |
| 126 | + |
| 127 | +#### Desktop |
| 128 | + |
| 129 | +Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). |
| 130 | +This would be removed entirely and a uniffi-based callback mechanism used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire qnd forget", ending up in the main thread automatically. |
| 131 | + |
| 132 | +* The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate has all app-services code removed. |
| 133 | +* New `gecko-tracing` crate next to `gwcko-logger` owns the single tracing-subscriber. app-services is the only `tracing_subscriber::Layer` provider, but more subscribers seems likely. |
| 134 | +* [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) is removed. |
| 135 | +* A JS "subscriber" will be implemented, a thin layer over the tracing crate and its `target` filtering. |
| 136 | + |
| 137 | +#### Application Services |
| 138 | + |
| 139 | +* All crates move to `tracing` instead of `log` |
| 140 | + |
| 141 | +* A new crate would be added which defines the application callback interfaces (via UniFFI) and |
| 142 | + also the new tracing-subscriber implementation. |
| 143 | + |
| 144 | +* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface |
| 145 | + but would have their internal implementation replaced with the subscriber. This is for backwards |
| 146 | + compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete |
| 147 | + these crates entirely. |
0 commit comments