Skip to content

OpenTelemetry Tracing API vs Tokio-Tracing API for Distributed Tracing #1571

Open
@cijothomas

Description

@cijothomas

Background

The Rust ecosystem has two prominent tracing APIs: the OpenTelemetry Tracing API (Otel for short), delivered through the opentelemetry crate, and the Tokio tracing API, provided by the
tracing crate. The OTel Tracing API adheres to the OpenTelemetry specification, ensuring alignment with OpenTelemetry Tracing implementations in other languages like C++, Java etc. Conversely, the Tokio tracing ecosystem, which predates
OpenTelemetry, boasts widespread adoption, with many popular libraries already instrumented. The tracing-opentelemetry crate, maintained outside of OpenTelemetry repositories, act as a "bridge", enabling applications instrumented with tracing to work with OpenTelemetry.

The issue

The coexistence of the OTel Tracing API and Tokio-Tracing poses a dilemma, forcing end users to choose between two competing APIs. This situation complicates the decision-making process due to the absence of comprehensive
documentation comparing the two options. A significant concern is the lack of tested interoperability between the APIs, which can result in issues, especially in applications where different layers use different tracing APIs, potentially
leading to incomplete traces. This also impacts the log correlation scenarios as well.

A Comparison with OTel .NET

The OpenTelemetry .NET community encountered a similar challenge when the OTel Tracing API was introduced, as the .NET runtime library (shipped as the
DiagnosticSource package) already had a similar API in place. This issue was resolved through collaboration between OTel .NET maintainers and the .NET
runtime team, leading to the alignment of the .NET runtime's tracing API with the OTel specifications. This approach was later applied to the Metrics API as well. While the decision by OTel .NET to prioritize the .NET Runtime library's
API over its own for tracing/metrics has generally been successful, it has not been without its challenges. Despite declaring stability years ago, OTel .NET has yet to implement certain aspects of the OTel specification fully.

Although the outcomes in the .NET ecosystem might not directly forecast the success of similar efforts in Rust, they provide a valuable reference point.

Options for Consideration

  1. Deprecate Tokio-Tracing: This approach would align Rust with the OpenTelemetry strategies adopted by other languages. However, considering the popularity and active maintenance of the tracing crate in the Rust ecosystem, this path has highest friction and is highly improbable.

  2. Deprecate OTel Tracing: Promoting Tokio-Tracing as the standard could be a feasible option, albeit requiring comprehensive evaluation. This strategy would cause OTel Rust to deviate from its counterparts in other languages.
    Potential alignment of Tokio-Tracing with OTel Tracing specifications could mitigate this concern but necessitates groundwork to identify gaps and propose solutions. Tokio-Tracing maintainers have shown willingness to accommodate
    reasonable changes, pending a clear set of requirements. This option does not eliminate the OTel Tracing API completely, but it'll still remain to compensate for things missing from Tokio-Tracing - only those APIs which are overlapping/competing with Tokio-Tracing needs to be deprecated/removed.

  3. Maintain Both APIs: This alternative emphasizes the importance of ensuring seamless interoperability between the two APIs, allowing users to choose based on preference or specific needs without compromising trace completeness. Achieving this goal requires significant effort to identify and bridge any existing gaps in the interoperability story. Users should be able freely chose between, without worrying about any broken traces.

  4. Do nothing.: OTel Rust has some special accommodations done to help tracing crate (and vice-versa). We can just remove them, and let each crate follow their own destiny. (Highly undesirable state, just listed for completion)

Are there more options? Please let us know in the comments!

Current State

The Rust tracing ecosystem is at a critical juncture. Active discussions between the OTel Rust team and the Tracing Rust team are taking place, with updates and deliberations shared on Cloud Native
Slack
. Interested individuals are encouraged to join the discussion on Slack (or right in this Github issue). All decisions and considerations will be posted on GitHub as well for wider visibility and to gather feedbacks.

Timeline

Resolving this issue is a prerequisite (though not the only one) for declaring the Tracing signal as GA (General Availability) for OTel Rust. Given the goal to achieve Tracing GA (alongside other milestones) soon, it's crucial that this issue is resolved promptly. A tentative deadline to reach a decision on the chosen path forward is set for April 30th, 2024, approximately 2 months from today.

Related issues

#1378 Tracing Propagation.
#1394 (comment)
Broken Trace example : #1690

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-logArea: Issues related to logsA-traceArea: issues related to tracingrelease:required-for-stableMust be resolved before GA release, or nice to have before GA.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions