Describe the bug
When running Quarkus with OpenTelemetry on AWS Lambda, telemetry can be lost when the function completes very quickly.
With quarkus.otel.simple=true, batching is disabled for traces and logs, but sometimes packages are still not reliably flushed before the Lambda execution environment is frozen. In AWS Lambda, the environment may freeze immediately after the response is returned. Because of that, some metrics are never exported.
This is especially visible for short-lived invocations. In some cases, telemetry from a previous invocation is only exported later when the same Lambda execution environment is reused.
Expected behavior
AWS Lambda supports internal and external extensions. Registering the application as an internal extension gives the runtime up to 500 ms of additional time before the environment is terminated. https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html
Quarkus should use that extra time to flush all pending OpenTelemetry data after the response has been sent, including metrics, so that telemetry is exported reliably even for fast Lambda invocations.
As an alternative, Quarkus could flush synchronously before returning the response, but that would increase request latency and is therefore less desirable.
Actual behavior
OpenTelemetry data is not always flushed before the Lambda environment is frozen.
As a result:
metrics are sometimes missing for fast invocations
telemetry may be delayed and only appear on a later invocation when the Lambda environment is reused
This is how it sometimes looks in our tracing tool when the execution environment is reused
How to Reproduce?
Create an AWS Lambda function using Quarkus.
Add the OpenTelemetry extension and enable telemetry export.
Set quarkus.otel.simple=true.
Record traces, logs, and metrics during the invocation.
Invoke the function with a short-running request.
Check the exported telemetry.
Output of uname -a or ver
No response
Output of java -version
No response
Quarkus version or git rev
No response
Build tool (ie. output of mvnw --version or gradlew --version)
No response
Additional information
@brunobat We talked about that a while ago. Sorry, I took very long to create that issue. I also have a PR which still needs the otel changes. I am unsure how to add that. #53465 here is the PullRequest, please feel free for advice or changes. I was also not sure of default values for timeouts and such.
It is currently a draft. Thanks!
Describe the bug
When running Quarkus with OpenTelemetry on AWS Lambda, telemetry can be lost when the function completes very quickly.
With
quarkus.otel.simple=true, batching is disabled for traces and logs, but sometimes packages are still not reliably flushed before the Lambda execution environment is frozen. In AWS Lambda, the environment may freeze immediately after the response is returned. Because of that, some metrics are never exported.This is especially visible for short-lived invocations. In some cases, telemetry from a previous invocation is only exported later when the same Lambda execution environment is reused.
Expected behavior
AWS Lambda supports internal and external extensions. Registering the application as an internal extension gives the runtime up to 500 ms of additional time before the environment is terminated. https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html
Quarkus should use that extra time to flush all pending OpenTelemetry data after the response has been sent, including metrics, so that telemetry is exported reliably even for fast Lambda invocations.
As an alternative, Quarkus could flush synchronously before returning the response, but that would increase request latency and is therefore less desirable.
Actual behavior
OpenTelemetry data is not always flushed before the Lambda environment is frozen.
As a result:
metrics are sometimes missing for fast invocations
telemetry may be delayed and only appear on a later invocation when the Lambda environment is reused
This is how it sometimes looks in our tracing tool when the execution environment is reused
How to Reproduce?
Create an AWS Lambda function using Quarkus.
Add the OpenTelemetry extension and enable telemetry export.
Set quarkus.otel.simple=true.
Record traces, logs, and metrics during the invocation.
Invoke the function with a short-running request.
Check the exported telemetry.
Output of
uname -aorverNo response
Output of
java -versionNo response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --versionorgradlew --version)No response
Additional information
@brunobat We talked about that a while ago. Sorry, I took very long to create that issue. I also have a PR which still needs the otel changes. I am unsure how to add that. #53465 here is the PullRequest, please feel free for advice or changes. I was also not sure of default values for timeouts and such.
It is currently a draft. Thanks!