Skip to content

deadlock with opentelemetry and tracing #250

@aome510

Description

@aome510

Version

tracing-opentelemetry v0.32.1 (/Users/aome510/Projects/tracing-opentelemetry)
│   └── tracing v0.1.44
│       ├── tracing-attributes v0.1.31 (proc-macro)
│       └── tracing-core v0.1.36
├── tracing v0.1.44 (*)
├── tracing-core v0.1.36 (*)
├── tracing-log v0.2.0
│   └── tracing-core v0.1.36 (*)
└── tracing-subscriber v0.3.22
    └── tracing-core v0.1.36 (*)
│   │       │   │   └── tracing v0.1.44 (*)
│   │       │   └── tracing v0.1.44 (*)
│   │       │   └── tracing v0.1.44 (*)
│   │   │   └── tracing v0.1.44 (*)
│   └── tracing v0.1.44 (*)
├── tracing v0.1.44 (*)
├── tracing-error v0.2.1
│   ├── tracing v0.1.44 (*)
│   └── tracing-subscriber v0.3.22 (*)
└── tracing-subscriber v0.3.22 (*)

Description

I recently encountered a deadlock happened because an interaction between tracing, opentelemetry and this crate. opentelemetry uses tracing for their internal logging (enabled by default). If the logging API is called inside the below block (opentelemetry has a few places that do that inside the block)

self.with_started_cx(otel_data, &|cx| {
let guard = cx.clone().attach();
GUARD_STACK.with(|stack| stack.borrow_mut().push(id.clone(), guard));
});

it will result in a deadlock. Stacktrace when the deadlock happens

0x0000723f78fcc23f in clock_nanosleep () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#0  0x0000723f7900795d in syscall () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x000059c6f9599bd9 in std::sys::pal::unix::futex::futex_wait () at library/std/src/sys/pal/unix/futex.rs:72
#2  std::sys::sync::rwlock::futex::RwLock::write_contended () at library/std/src/sys/sync/rwlock/futex.rs:257
#3  0x000059c6fdda9033 in <tracing_subscriber::registry::sharded::Data as tracing_subscriber::registry::SpanData>::extensions_mut ()
#4  0x000059c6f9e77ba5 in <tracing_opentelemetry::layer::OpenTelemetryLayer<S,T> as tracing_subscriber::layer::Layer<S>>::on_event ()
#5  0x000059c6f9e8c5b4 in <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::event ()
#6  0x000059c6fdf42601 in tracing_core::event::Event::dispatch ()
#7  0x000059c6fd2b904a in std::thread::local::LocalKey<T>::with ()
#8  0x000059c6fd2c16f9 in opentelemetry::context::Context::attach ()
#9  0x000059c6f9e7765c in <tracing_opentelemetry::layer::OpenTelemetryLayer<S,T> as tracing_subscriber::layer::Layer<S>>::on_enter ()
#10 0x000059c6f9e8c2a3 in <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::enter ()

To reproduce the deadlock, simply add tracing::info! to the above block and run any example. For example, update

diff --git a/src/layer.rs b/src/layer.rs
index acf423b..43a7cb9 100644
--- a/src/layer.rs
+++ b/src/layer.rs
@@ -1205,6 +1205,7 @@ where
         if self.context_activation {
             if let Some(otel_data) = extensions.get_mut::<OtelData>() {
                 self.with_started_cx(otel_data, &|cx| {
+                    tracing::info!("Entering span {:?}", id);
                     let guard = cx.clone().attach();
                     GUARD_STACK.with(|stack| stack.borrow_mut().push(id.clone(), guard));
                 });

and run cargo run --example basic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions