-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEP: Recording exceptions as log based events #4333
base: main
Are you sure you want to change the base?
Conversation
I think this is a related issue: |
b06a09f
to
76c7d85
Compare
1a1ea49
to
5ddfd05
Compare
db27087
to
e9f38aa
Compare
A small doubt:
Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this: currentSpan.recordException(e);
logger.logRecordBuilder
.addException(e); Is this the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.
When I receive a span today I know that I've also received all of the exceptions associated with that span: they come in the same payload, at the same time. If the exceptions are sent as logs (or child spans, anything except the same span) they may arrive at any time, e.g. an hour before or after the span was received in the case of a networking issue. Or they may never arrive at all. |
Sorry, I don't follow...Here's an example showing my thinking:
^ This will have the same effect as doing the commented out part (span.recordException). |
Now convince all of your users to put that into their code. Configuring OTEL correctly is already ~50 LOC, now it's 100 LOC. Users won't do it. And as explained above it's not really possible to do this reliably in a collector / backend. |
If a user wants to do it (see exception inside Span), they can. I just showed it is possible. Whether users do it or not - I can't comment. if # of lines is a concern, Otel SDKs can provide an option to do it automatically. #4333 (comment) has a link to a (Log->SpanEvent) option in one of the languages. |
In my opinion what matters is not what's possible, what matters is what the real world experience is going to be for users and backend implementers. And the reality is that this change may have a serious negative impact on both. |
I don't think Otel ever recommended Exceptions MUST/SHOULD be reported via SpanEvents. It had conventions for reporting exception via SpanEvent and Logs. (logs convention came later than Span), but never recommended one over other. This OTEP would be the first time Otel officially makes a recommendation on the preferred way of reporting exception. (That is my read. Happy to be corrected!) |
I agree with your interpretation. My point is that recording events via logs does have downsides. Right now you have to opt into it so it's okay to have to do something like the |
I think this depends on the backend/vendor. Recording exceptions via logs has downsides. Recording exceptions via SpanEvents also has downsides too. Having no recommendation from Otel is also not good. It is definitely possible to have something in the Otel Spec for SDKs, to have a feature flag to control this based on user preference. (The feature flag, can do the conversion of exceptions from LogRecord to SpanEvents OR vice-versa, with or without duplicating.). It is also possible to have feature-flag for instrumentations, but it may be easier to have an SDK level thing that'll ensure consistent behavior, irrespective of the instrumentation used. |
to @alexmojaki
You could bring them on the SDK level with a events-to-span-events processor. It's a great question how the span-events -> logs migration would look like and whether such processor should be provided by contrib component, individual distros or by default.
Correlated logs with errors and other details without parent span are pretty valuable regardless of tracing or sampling.
this is part of the previous OTEP on Events - https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0265-event-vision.md#relationship-to-span-events. In the long term, we hope to replace span events with log-based events with migration story for consumers.
We report
Does it include the stack trace? The rest is (or can be) stored for a single 'terminal' exception that the span ends with. Or is your goal to store exception chain and/or handled exceptions that happened during span lifetime? The counter-argument I have is that there are a lot of exceptions that happen during span execution and most of them are already recorded as logs by runtimes, client libs, frameworks, etc. I.e. you're already in the world where exceptions are not exported along with spans. It's a fair ask though that user app may want to associate arbitrary exceptions with a span today and we're taking it away when moving to logs. |
Co-authored-by: Trask Stalnaker <[email protected]>
Co-authored-by: Trask Stalnaker <[email protected]>
Co-authored-by: Trask Stalnaker <[email protected]>
@adriangb @alexmojaki @samuelcolvin would you mind summarizing where we are with each of your concerns above as separate review comments (on the Files changed tab at the top, pick a relevant-ish line and add the summary of where we're at there) so we can have "threaded" discussions on each one of them? I really want to follow along and see what needs to be addressed, but am having trouble following with everything as mainline comments 😅 thank you ❤️ |
2. Recording exceptions as logs will result in UX degradation for users | ||
leveraging trace-only backends such as Jaeger. | ||
|
||
3. Having exceptions exported and stored along with span is beneficial for some backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here to follow the request of commenting somewhere vaguely relevant for a threaded discussion.
Let me check if I understand things correctly. Currently this code:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from requests import ConnectionError
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
tracer = provider.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
raise ConnectionError("bar")
prints a span containing:
{
"status": {
"status_code": "ERROR",
"description": "ConnectionError: bar"
},
"attributes": {},
"events": [
{
"name": "exception",
"timestamp": "2025-02-07T10:35:52.726398Z",
"attributes": {
"exception.type": "requests.exceptions.ConnectionError",
"exception.message": "bar",
"exception.stacktrace": "Traceback...",
"exception.escaped": "False"
}
}
]
}
Am I correct that the goal is to instead emit the following?
{
"status": {
"status_code": "ERROR",
"description": "bar"
},
"attributes": {
"error.type": "requests.exceptions.ConnectionError"
}
}
The differences being:
- Span events are gone, and the stacktrace will only be found in a child event-log.
- The span event attribute
exception.type
containing the fully qualified exception type name is now in the span attributeerror.type
- The status description no longer contains the (unqualified) exception name.
Hi, I have a question regarding the proposed In a more traditional logging based stack, when logging an exception, the developer should add a human readable error message explaining the underlying exception's effect on the current process. Without these messages, an observer has a very hard time figuring out what broke from a business perspective. {
"level": "ERROR",
"message": "Could not generate invoice for order XYZ.",
"exception": "java.lang.NullPointerException",
"stack_trace": "..."
} If I understand this OTEP correctly, this log message would not change much and the corresponding trace would look like this: {
"status": {
"status_code": "ERROR",
"description": "Could not generate invoice for order XYZ."
},
"attributes": {
"error.type": "java.lang.NullPointerException"
}
} Is this correct? |
Related to open-telemetry/semantic-conventions#1536
Changes
Recording exceptions as span events is problematic since it
This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.
If accepted, the follow-up spec changes are expected to replace existing (stable) documents:
Related OTEP(s) #CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessary