Skip to content

Commit 0d287a5

Browse files
committed
Add adr-8: Replace logging and error reporting infrastructure with tracing.
1 parent 815d73c commit 0d287a5

File tree

2 files changed

+138
-0
lines changed

2 files changed

+138
-0
lines changed

docs/adr/0008-use-tracing.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Replace logging and error reporting infrastructure with tracing.
2+
3+
* Status: proposed
4+
* Deciders:
5+
* AppServices team: ..
6+
* moz-central releng/build teams: ?
7+
* Date: Mar 2025
8+
* Feedback deadline: ?
9+
10+
## Context and Problem Statement
11+
12+
Rust code in [application-services](https://github.com/mozilla/application-services/) needs support for diagnostics.
13+
Specifically, we need support for logging and for error reporting.
14+
15+
### Logging
16+
17+
Our components must be able to send logging to the browser - all platforms capture some logging.
18+
19+
We currently use the `log` crate and wire this up best we can.
20+
21+
### Error Reporting
22+
There's an [error support crate](https://github.com/mozilla/application-services/tree/main/components/support/error),
23+
designed explicitly for error reporting for the applications.
24+
Android and iOS both hook into this to report errors to Sentry; Desktop doesn't have this, but it should.
25+
26+
### Problems with the current approaches.
27+
28+
#### Problems using `log`
29+
30+
The main problem with the `log` module is the concept of a [global "max level"](https://docs.rs/cli-log/latest/cli_log/fn.set_max_level.html)
31+
that's set for all crates.
32+
Gecko sets the global max level to `Info` - any more verbose causes [performance regressions in other crates](https://bugzilla.mozilla.org/show_bug.cgi?id=1874215).
33+
34+
In practice, this means our browsers are unable to see debug logs from our Rust code.
35+
36+
#### Problems with the error reporter
37+
38+
None for mobile - but is not implemented at all for Desktop.
39+
40+
The opportunity is to better align the logging and error-reporting requirements into a single facility while introducing this capability
41+
to desktop.
42+
43+
## The Rust tracing crate.
44+
45+
An alternative to the `log` crate is the [`tracing`](https://docs.rs/tracing/latest/tracing/) crate,
46+
which comes from the tokio project.
47+
48+
`tracing` has a very similar API to the `log` crate - `log::trace!` becomes `tracing::trace!` etc.
49+
It has richer semantics than `log` (eg, async support) and largely acts a replacement -
50+
it supports the `RUST_LOG` environment variable and writes to `stdout`,
51+
so developers who are running tests and our CI etc should notice no difference.
52+
53+
Importantly, it has a [`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
54+
mechanism that allows efficient, more targettted filtering instead of a global `max_level`.
55+
Each subscriber is able to filter log events without impacting crates they aren't subscribed to.
56+
57+
This means that it should be viable to have some crates capture `trace!()` output without impacting any other crates
58+
or the overall performance of the application.
59+
60+
## Proposal: Move to tracing for all app-services crates.
61+
62+
This document proposes that:
63+
64+
* All app-services crates move to using `tracing` instead of `log`.
65+
66+
* All exiting handling of log events be replaced with a new mechanism based on
67+
[`tracing-subscriber`](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/)
68+
to move logs and error-reports across the FFI.
69+
70+
* We use the same mechanism for error reporting, leveraging the richer metadata offered by tracing.
71+
72+
There are ways to make both `log` and `tracing` work, but we should avoid that if we can, just lean into tracing.
73+
74+
## Implementation: Move to tracing for all app-services crates.
75+
76+
This section describes the changes necessary to implement the above.
77+
78+
### Move all our crates to `tracing`
79+
80+
Most crates use `log::debug!()`/`log::error!()`, which changes to `tracing::...` (do we change them to just `debug!()`?)
81+
82+
Many tests start with `env_logger::try_init().unwrap()` or similar - we'll have a test helper.
83+
84+
### Implement a `tracing_subscriber::Layer`-based "subscriber" mechanism.
85+
86+
This subscriber mechanism requires the application to know all tracing `targets` it cares about.
87+
Each crate is its own target and the app must explicitly "subscribe" to all targets individually.
88+
It will *not* be possible to subscribe to all targets and it's unlikely we'll allow
89+
any "matching" capabilities (eg, regular expressions or similar) - each target will use an exact string match.
90+
91+
This requires our applications to configure their own subscriptions to each `target` with the level for that target,
92+
making it possible to avoid a single, global max-level.
93+
94+
We'll implement this subscriber with a simple `HashMap` mapping the target name to a level.
95+
Once we have determined a subscriber matches an event, we can perform relatively expensive operations
96+
on the event because we assume action will be taken on the event.
97+
This assumption seems fine because we own all the subscribers.
98+
99+
An example of these "relatively expensive" operations is fetching event "fields", such as the message or other meta-data,
100+
and using them to format a string, and dispatching the end result to the underlying logging
101+
system.
102+
103+
Note that this is generic enough to handle traditional "log" messages and our error reporting requirements. It's a general event reporting system.
104+
105+
[There's a WIP for all the above here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
106+
107+
### Replace all existing "subscribers"
108+
109+
There are 3 main places which would change in the first instance, broken down by platform
110+
111+
#### Mobile
112+
113+
[A WIP for this is also included here](https://github.com/mozilla/application-services/compare/main...mhammond:application-services:log-to-tracing)
114+
115+
#### Desktop
116+
117+
Desktop has a [hand-written xpcom-based log adaptor](https://searchfox.org/mozilla-central/source/services/sync/golden_gate/src/log.rs#119-120). This would be removed entirely and a uniffi-based callback mechanism is used. Rust code calling back into Javascript has the same semantics as `golden_gate` - the log calls are "fire and forget", ending up in the main thread automatically.
118+
119+
The [`gecko-logger`](https://searchfox.org/mozilla-central/source/xpcom/rust/gecko_logger/src/lib.rs) crate would change:
120+
* All application-services log-related code would be removed entirely (eg, [here](https://searchfox.org/mozilla-central/source/services/interfaces/mozIAppServicesLogger.idl) and [here](https://searchfox.org/mozilla-central/source/services/common/app_services_logger)) -
121+
app-services would not rely on `log` at all in this world.
122+
* `gecko-logger` (or a similar crate next to it) would grow support for owning the single tracing-subscriber. It would be responsible for adding a single app-services owned `tracing_subscriber::Layer` instance to the single subscriber.
123+
124+
The [app-services-logger](https://searchfox.org/mozilla-central/source/services/common/app_services_logger/src/lib.rs) would lose all xpcom-related code and instead lean on uniffi and tracing-subscriber.
125+
126+
#### Application Services
127+
128+
* All crates move to `tracing` instead of `log`
129+
130+
* A new crate would be added which defines the application callback interfaces (via UniFFI) and
131+
also the new tracing-subscriber implementation.
132+
133+
* The crates `rust-log-forwarder` and `error-reporter` crates would keep their external interface
134+
but would have their internal implementation replaced with the subscriber. This is for backwards
135+
compatibility with mobile - eventually we'd expose the new callback interfaces to mobile and delete
136+
these crates entirely.

docs/adr/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ This log lists the architectural decisions for MADR.
1111
* [ADR-0004](0004-early-startup-experiments.md) - Running experiments on first run early startup
1212
* [ADR-0005](0005-remote-settings-client.md) - A remote-settings client for our mobile browsers.
1313
* [ADR-0007](0007-limit-visits-migration-to-10000.md) - Limit Visits Migrated to Places History in Firefox iOS
14+
* [ADR-0008](0008-use-tracing.md) - Replace logging and error reporting infrastructure with tracing.
15+
1416

1517
<!-- adrlogstop -->
1618

0 commit comments

Comments
 (0)