|
| 1 | +# Replace logging and error reporting infrastructure with tracing. |
| 2 | + |
| 3 | +* Status: proposed |
| 4 | +* Deciders: |
| 5 | + * AppServices component teams |
| 6 | + * sync and related components: (decider name, date) |
| 7 | + * suggest, remote-settings, etc (ie, DISCO): (decider name, date) |
| 8 | + * (please add your team and decider name here!) |
| 9 | + * moz-central releng/build teams: ? |
| 10 | +* Date: May 2025 |
| 11 | +* Feedback deadline: May 15 2025. |
| 12 | + |
| 13 | +## Context and Problem Statement |
| 14 | + |
| 15 | +Rust code in [application-services](https://github.com/mozilla/application-services/) needs support for diagnostics. |
| 16 | +Specifically, we need support for logging and for explicit error reporting, as described below. |
| 17 | + |
| 18 | +### Logging |
| 19 | + |
| 20 | +Our components must be able to generate logging. These logs must be useful both when a component is being developed |
| 21 | +(eg, sent to the console when running unit tests etc), but also useful when being run in a product |
| 22 | +(eg, sent to the browser and captured normally for that browser - eg, the "console" on desktop, logcat on Android, etc.) |
| 23 | + |
| 24 | +We currently use the `log` crate and wire this up best we can. |
| 25 | + |
| 26 | +### Error Reporting |
| 27 | + |
| 28 | +There's an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error), |
| 29 | +designed explicitly for error reporting for the applications. Unlike logins, these error reports are intended to |
| 30 | +be reported to an external service - eg, Android and iOS both use Sentry; Desktop doesn't have this, but it should. |
| 31 | + |
| 32 | +### Problems with the current approaches. |
| 33 | + |
| 34 | +#### Problems using `log` |
| 35 | + |
| 36 | +The main problem with the `log` module is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html) |
| 37 | +that's set for all crates. |
| 38 | +Gecko sets the global max level to `Info` - anything more verbose causes [performance regressions in other crates](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215). |
| 39 | + |
| 40 | +In practice, this means there are cases where our browsers are unable to see debug logs from our Rust code. |
| 41 | + |
| 42 | +#### Problems with the error reporter |
| 43 | + |
| 44 | +None identified for mobile - but is not implemented at all for Desktop. |
| 45 | + |
| 46 | +The opportunity is to better align the logging and error-reporting requirements into a single facility while introducing this capability |
| 47 | +to desktop. In the first instance, we will arrange for Javascript code running in Desktop Firefox to be notified on such reports, |
| 48 | +but a final decision about what it does with them is TBD. |
| 49 | + |
| 50 | +## The Rust tracing crate. |
| 51 | + |
| 52 | +An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate, |
| 53 | +which comes from the tokio project. |
| 54 | + |
| 55 | +`tracing` has a very similar API to the `log` crate - `log::trace!` becomes `tracing::trace!` etc. |
| 56 | +It has richer semantics than `log` (eg, async support) and largely acts a replacement - |
| 57 | +with appropriate configuration it supports the `RUST_LOG` environment variable and writes to `stdout`, |
| 58 | +so developers who are running tests and our CI etc should notice no difference. |
| 59 | + |
| 60 | +Importantly, it has a [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) |
| 61 | +mechanism that allows efficient, more targettted filtering instead of a global `max_level`. |
| 62 | +Each subscriber is able to filter log events without impacting crates they aren't subscribed to. |
| 63 | +This does mean it's impossible to have a global subscriber to see "all" logs - someone must explicitly subscribe by trace name. |
| 64 | +However, it is precisely this characteristic which makes this option avoid causing performance issues in unrelated crates. |
| 65 | + |
| 66 | +This means that it should be viable to have some crates capture `trace!()` output without impacting any other crates |
| 67 | +or the overall performance of the application. |
| 68 | + |
| 69 | +The merino project performed a [similar evaluation](https://github.com/mozilla-services/merino/blob/main/docs/adrs/adr_0001_logging.md) |
| 70 | +and although with different constraints, also ended up choosing `tracing`. |
| 71 | + |
| 72 | +## Proposal: Move to tracing for all app-services crates. |
| 73 | + |
| 74 | +This document proposes that: |
| 75 | + |
| 76 | +* All app-services crates move to using `tracing` instead of `log`. |
| 77 | + |
| 78 | +* All exiting handling of log events be replaced with a new mechanism based on |
| 79 | + [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) |
| 80 | + to move logs and error-reports across the FFI. |
| 81 | + |
| 82 | +* We use the same mechanism for error reporting, leveraging the richer metadata offered by tracing. |
| 83 | + |
| 84 | +There are ways to make both `log` and `tracing` work, but we should avoid that if we can and just lean into tracing. |
| 85 | + |
| 86 | +## Implementation: Move to tracing for all app-services crates. |
| 87 | + |
| 88 | +This section describes the changes necessary to implement the above. |
| 89 | + |
| 90 | +### Move all our crates to `tracing` |
| 91 | + |
| 92 | +Most crates use `log::debug!()`/`log::error!()`, which changes to `tracing::...` (do we change them to just `debug!()`?) |
| 93 | + |
| 94 | +Many tests start with `env_logger::try_init().unwrap()` or similar - we'll have an equivalent test helper. |
| 95 | + |
| 96 | +### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism. |
| 97 | + |
| 98 | +This subscriber mechanism requires the application to know all tracing `targets` it cares about. |
| 99 | +Each crate is its own target and the app must explicitly "subscribe" to all targets individually. |
| 100 | +It will *not* be possible to subscribe to all targets and it's unlikely we'll allow |
| 101 | +any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match. |
| 102 | + |
| 103 | +This requires our applications to configure their own subscriptions to each `target` with the level for that target, |
| 104 | +making it possible to avoid a single, global max-level. |
| 105 | + |
| 106 | +We'll implement this subscriber with a simple `HashMap` mapping the target name to a level. |
| 107 | +Once we have determined a subscriber matches an event, we can perform relatively expensive operations |
| 108 | +on the event because we assume action will be taken on the event. |
| 109 | +This assumption seems fine because we own all the subscribers. |
| 110 | + |
| 111 | +An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data, |
| 112 | +and using them to format a string, and dispatching the end result to the underlying logging |
| 113 | +system. |
| 114 | + |
| 115 | +This is generic enough to handle traditional "log" messages and our error reporting requirements. It's a general event reporting system. |
| 116 | + |
| 117 | +[There's a WIP for all the above here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing) |
| 118 | + |
| 119 | +### Replace all existing "subscribers" |
| 120 | + |
| 121 | +There are 3 main places which would change in the first instance, broken down by platform |
| 122 | + |
| 123 | +#### Mobile |
| 124 | + |
| 125 | +[A WIP for this is also included here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing) |
| 126 | + |
| 127 | +#### Desktop |
| 128 | + |
| 129 | +Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). |
| 130 | +This would be removed entirely and a uniffi-based callback mechanism used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically. |
| 131 | + |
| 132 | +The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change: |
| 133 | +* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) - |
| 134 | +app-services would not rely on `log` at all in this world. |
| 135 | +* `gecko-logger` (or more likely a new crate `gecko-tracing` next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber. |
| 136 | + |
| 137 | +The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber. |
| 138 | + |
| 139 | +A JS "subscriber" will be implemented which allows for fine-grained control over individual targets. |
| 140 | +In the first instance this would support logging to both the existing sync logging infrastructure and |
| 141 | +simple logging via the `console` facilities. Error reports would initially be logged to the console, with a followup being to decide how to report them externally. |
| 142 | + |
| 143 | +#### Application Services |
| 144 | + |
| 145 | +* All crates move to `tracing` instead of `log` |
| 146 | + |
| 147 | +* A new crate would be added which defines the application callback interfaces (via UniFFI) and |
| 148 | + also the new tracing-subscriber implementation. |
| 149 | + |
| 150 | +* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface |
| 151 | + but would have their internal implementation replaced with the subscriber. This is for backwards |
| 152 | + compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete |
| 153 | + these crates entirely. |
0 commit comments