Provide high-level architecture overview for rust-otel #2570
Description
In addition to the opentelemetry library guidelines and otel-rust's own contributing guide it would be helpful to have a high-level design for otel-rust itself. This will make it easier for contributors to quickly grok the project and provide consistent PRs, and will provide a point of reference to help resolve PR discussions. This could be as simple as a couple of paragraphs highlighting the high-level concepts that persist between signals, and perhaps a bit of mermaidjs
for clarity.
I propose that this be founded by one of the maintainers, effectively committing their mental model to markdown.
Structural
There are elements of the design that are particular to otel-rust that expect consistency across signals, but are not enforceable by the type system because of the independent nature of the signals. By way of example, consider the relationship between a .*Provider
, an .*Exporter
, and a .*Client
, which is fairly consistent between signals. Logs and traces follow a similar pattern: the .*Provider
(e.g., LoggerProvider
, TracerProvider
) uses a .*Processor
to manage batching and interacts with a .*Exporter
, which delegates transport logic to a .*Client
(e.g., OtlpHttpClient
). Metrics, however, omits the processor entirely. Is this difference significant, and why? A high-level design would be able to clearly establish the role and shape of these components in general, and highlight any exceptions.
Aesthetic
Likewise, there are aesthetic differences that seem to have crept in by accident; a brief design document would establish the expected convention and allow these to be resolved. Here's one example:
opentelemetry::logs::logger::LoggerProvider
opentelemetry::metrics::meter::MeterProvider
opentelemetry::trace::tracer_provider::TracerProvider
Plurality changes in the namespaces (logs/trace), namespace depth changes (trace->tracer_provider), domain name changes (metric/meter). This is merely aesthetic, but there's no way for us to know which of these is the target pattern.
Future
This would also be a great place to document architectural decisions, architectural decision record style going forward, and the justification for them. For instance, we've spent a bunch of time discussing which and how many HTTP client libraries to support for export, and different patterns for pushing telemetry through - both sync and async. Much like the above, the current state of these discussions can to some extent be inferred by digging through issues, but I think here too it would be helpful for contributors to "put a line under it" and update a central document describing the design.