Skip to content
This repository was archived by the owner on Nov 1, 2023. It is now read-only.
This repository was archived by the owner on Nov 1, 2023. It is now read-only.

Revisit logging to enhance observability #312

Open
@ranweiler

Description

@ranweiler
Member

Revisit our logging, and move to a model that allows:

  • Distributed tracing concepts, such as spans with inclusion and correlation
    • Ideally this should map to spans at the service level, which would need to be implemented
  • Structured logging
  • Fan-out to multiple backends at different levels
    • Telemetry ingested by App Insights, ideally via OpenTelemetry
    • stderr, controlled by an env var
    • A circular VM-local log file

Since we use tokio, the tracing library with an OpenTelemetry backend would achieve all of the above.

AB#36002

Activity

self-assigned this
on Nov 8, 2021
ranweiler

ranweiler commented on Nov 13, 2021

@ranweiler
MemberAuthor

Current ecosystem support for OpenTelemetry + Application Insights:

Rust

No first-party OpenTelemetry/App Insights support here, even at the Preview level. There is a third-party Application Insights exporter for the opentelemetry SDK crate.

All together, we can use tracing, opentelemetry, tracing-opentelemetry, and opentelemetry-application-insights to generate and export async-compatible span data. We can even export log-style events as App Insights Trace telemetry, correctly-associated with spans.

image

I don't yet see an off-the-shelf mechanism for Custom Events, but it seems like it'd be easy to add. We could also have a separate telemetry channel that uses the appinsights crate just for specialized telemetry like Custom Events. This may be specifically preferable for the optional non-identifying global telemetry.

Python

First-party support, but only in preview. May get some wins if we focus on spans (without events), or use libraries that are getting early attention for pervasive OpenTelemetry instrumentation (FastAPI?).

We can use OpenTelemetry with Python via opentelemetry-sdk/opentelemetry-api, and export spans to Application Insights via azuremonitor-opentelemetry-exporter. The latter is in preview. It currently appears to drop all span-associated events (#21747). Haven't yet checked if there's a way to auto-instrument logging to be span-aware, but seems unlikely (especially since the OpenTelemetry logging spec is not yet stabilized).

ranweiler

ranweiler commented on Nov 13, 2021

@ranweiler
MemberAuthor

I don't yet see an off-the-shelf mechanism for Custom Events, but it seems like it'd be easy to add.

Confirmed, this was very easy to add to the exporter backend. The design question then becomes: how do we determine when a span-parented Event from tracing should be exported as Application Insights "Trace Telemetry", vs. a "Custom Event"? The presence/absence of a level field is not a viable cue, because all normally-created tracing events currently have a Level.

In the long run, OpenTelemetry Logging will make the "event"/"log message" distinction clear in a way that tracing libraries can propagate in a more principled way. In the short term, we wouldn't be any worse off than we already are (most telemetry would become "trace event" items). Exceptions are special-cased in the Rust backend. Also, Custom Events are not displayed in a nicer way in the Transaction Timeline view of Application Insights, nor are they in any way more queryable than Trace Events.

For our "Custom Events" that are more properly treated as metric data, there is a (now-frozen) OpenTelemetry Metrics API that has feature-flagged support in the Rust Application Insights exporter.

removed their assignment
on Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ranweiler@mgreisen@bmc-msft

        Issue actions

          Revisit logging to enhance observability · Issue #312 · microsoft/onefuzz