Skip to content

Commit ff9cbbd

Browse files
committed
Add adr-9: Replace logging and error reporting infrastructure with tracing.
1 parent 6375ad5 commit ff9cbbd

File tree

2 files changed

+155
-0
lines changed

2 files changed

+155
-0
lines changed

docs/adr/0009-use-tracing.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Replace logging and error reporting infrastructure with tracing.
2+
3+
* Status: proposed
4+
* Deciders:
5+
* AppServices component teams
6+
* sync and related components: (decider name, date)
7+
* suggest, remote-settings, etc (ie, DISCO): (decider name, date)
8+
* (please add your team and decider name here!)
9+
* moz-central releng/build teams: ?
10+
* Date: May 2025
11+
* Feedback deadline: May 15 2025.
12+
13+
## Context and Problem Statement
14+
15+
Rust code in [application-services](https://github.com/mozilla/application-services/) needs support for diagnostics.
16+
Specifically, we need support for logging and for explicit error reporting, as described below.
17+
18+
### Logging
19+
20+
Our components must be able to generate logging. These logs must be useful both when a component is being developed
21+
(eg, sent to the console when running unit tests etc), but also useful when being run in a product
22+
(eg, sent to the browser and captured normally for that browser - eg, the "console" on desktop, logcat on Android, etc.)
23+
24+
We currently use the `log` crate and wire this up best we can.
25+
26+
### Error Reporting
27+
28+
There's an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error),
29+
designed explicitly for error reporting for the applications. Unlike logins, these error reports are intended to
30+
be reported to an external service - eg, Android and iOS both use Sentry; Desktop doesn't have this, but it should.
31+
32+
### Problems with the current approaches.
33+
34+
#### Problems using `log`
35+
36+
The main problem with the `log` module is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html)
37+
that's set for all crates.
38+
Gecko sets the global max level to `Info` - anything more verbose causes [performance regressions in other crates](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215).
39+
40+
In practice, this means there are cases where our browsers are unable to see debug logs from our Rust code.
41+
42+
#### Problems with the error reporter
43+
44+
None identified for mobile - but is not implemented at all for Desktop.
45+
46+
The opportunity is to better align the logging and error-reporting requirements into a single facility while introducing this capability
47+
to desktop. In the first instance, we will arrange for Javascript code running in Desktop Firefox to be notified on such reports,
48+
but a final decision about what it does with them is TBD.
49+
50+
## The Rust tracing crate.
51+
52+
An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate,
53+
which comes from the tokio project.
54+
55+
`tracing` has a very similar API to the `log` crate - `log::trace!` becomes `tracing::trace!` etc.
56+
It has richer semantics than `log` (eg, async support) and largely acts a replacement -
57+
with appropriate configuration it supports the `RUST_LOG` environment variable and writes to `stdout`,
58+
so developers who are running tests and our CI etc should notice no difference.
59+
60+
Importantly, it has a [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
61+
mechanism that allows efficient, more targettted filtering instead of a global `max_level`.
62+
Each subscriber is able to filter log events without impacting crates they aren't subscribed to.
63+
This does mean it's impossible to have a global subscriber to see "all" logs - someone must explicitly subscribe by trace name.
64+
However, it is precisely this characteristic which makes this option avoid causing performance issues in unrelated crates.
65+
66+
This means that it should be viable to have some crates capture `trace!()` output without impacting any other crates
67+
or the overall performance of the application.
68+
69+
The merino project performed a [similar evaluation](https://github.com/mozilla-services/merino/blob/main/docs/adrs/adr_0001_logging.md)
70+
and although with different constraints, also ended up choosing `tracing`.
71+
72+
## Proposal: Move to tracing for all app-services crates.
73+
74+
This document proposes that:
75+
76+
* All app-services crates move to using `tracing` instead of `log`.
77+
78+
* All exiting handling of log events be replaced with a new mechanism based on
79+
[`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
80+
to move logs and error-reports across the FFI.
81+
82+
* We use the same mechanism for error reporting, leveraging the richer metadata offered by tracing.
83+
84+
There are ways to make both `log` and `tracing` work, but we should avoid that if we can and just lean into tracing.
85+
86+
## Implementation: Move to tracing for all app-services crates.
87+
88+
This section describes the changes necessary to implement the above.
89+
90+
### Move all our crates to `tracing`
91+
92+
Most crates use `log::debug!()`/`log::error!()`, which changes to `tracing::...` (do we change them to just `debug!()`?)
93+
94+
Many tests start with `env_logger::try_init().unwrap()` or similar - we'll have an equivalent test helper.
95+
96+
### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism.
97+
98+
This subscriber mechanism requires the application to know all tracing `targets` it cares about.
99+
Each crate is its own target and the app must explicitly "subscribe" to all targets individually.
100+
It will *not* be possible to subscribe to all targets and it's unlikely we'll allow
101+
any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match.
102+
103+
This requires our applications to configure their own subscriptions to each `target` with the level for that target,
104+
making it possible to avoid a single, global max-level.
105+
106+
We'll implement this subscriber with a simple `HashMap` mapping the target name to a level.
107+
Once we have determined a subscriber matches an event, we can perform relatively expensive operations
108+
on the event because we assume action will be taken on the event.
109+
This assumption seems fine because we own all the subscribers.
110+
111+
An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data,
112+
and using them to format a string, and dispatching the end result to the underlying logging
113+
system.
114+
115+
This is generic enough to handle traditional "log" messages and our error reporting requirements. It's a general event reporting system.
116+
117+
[There's a WIP for all the above here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
118+
119+
### Replace all existing "subscribers"
120+
121+
There are 3 main places which would change in the first instance, broken down by platform
122+
123+
#### Mobile
124+
125+
[A WIP for this is also included here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
126+
127+
#### Desktop
128+
129+
Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120).
130+
This would be removed entirely and a uniffi-based callback mechanism used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically.
131+
132+
The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change:
133+
* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) -
134+
app-services would not rely on `log` at all in this world.
135+
* `gecko-logger` (or more likely a new crate `gecko-tracing` next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber.
136+
137+
The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber.
138+
139+
A JS "subscriber" will be implemented which allows for fine-grained control over individual targets.
140+
In the first instance this would support logging to both the existing sync logging infrastructure and
141+
simple logging via the `console` facilities. Error reports would initially be logged to the console, with a followup being to decide how to report them externally.
142+
143+
#### Application Services
144+
145+
* All crates move to `tracing` instead of `log`
146+
147+
* A new crate would be added which defines the application callback interfaces (via UniFFI) and
148+
also the new tracing-subscriber implementation.
149+
150+
* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface
151+
but would have their internal implementation replaced with the subscriber. This is for backwards
152+
compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete
153+
these crates entirely.

docs/adr/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ This log lists the architectural decisions for MADR.
1111
* [ADR-0004](0004-early-startup-experiments.md) - Running experiments on first run early startup
1212
* [ADR-0005](0005-remote-settings-client.md) - A remote-settings client for our mobile browsers.
1313
* [ADR-0007](0007-limit-visits-migration-to-10000.md) - Limit Visits Migrated to Places History in Firefox iOS
14+
* [ADR-0008](0008-use-tracing.md) - Replace logging and error reporting infrastructure with tracing.
15+
1416

1517
<!-- adrlogstop -->
1618

0 commit comments

Comments
 (0)