Description
(edit(2024-01-24): Updated title and description to clarfiy that it is only self wakes coming from tokio::task::yield_now()
that are affected)
What is the issue?
When observing an instrumented program built with Tokio 1.23 or later, Tokio Console isn't able to detect self wakes coming from tokio::task::yield_now()
.
This is due to tokio-rs/tokio#5223, where the self wake was changed to be deferred so that self waking tasks wouldn't starve the resource drivers.
How can the bug be reproduced?
The following code should show a task with a single self wake:
_ = tokio::spawn(async {
tokio::task::yield_now().await;
})
.await;
If build with Tokio 1.22.0 then we see the wake counted as a self wake in Tokio Console. We can confirm this using ari-subscriber
where we get the output (truncated):
2024-01-23T16:24:05.167259Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} new
2024-01-23T16:24:05.167786Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} enter
2024-01-23T16:24:05.167911Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} tokio::task::waker: op="waker.wake_by_ref", task.id=1
2024-01-23T16:24:05.168007Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} exit
Specifically there is a wake operation (op="waker.wake_by_ref"
) inside the runtime.spawn
span.
When updating to Tokio 1.23.0, Tokio Console no longer counts the single wake as a self wake. Again, we can confirm using ari-subscriber
(truncated output):
2024-01-23T16:33:26.834206Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} new
2024-01-23T16:33:26.834805Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} enter
2024-01-23T16:33:26.834966Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} tokio::task::waker: op="waker.clone", task.id=1
2024-01-23T16:33:26.835048Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} exit
2024-01-23T16:33:26.835145Z TRACE tokio::task::waker: op="waker.wake", task.id=1
Here we see that the waker is cloned (op="waker.clone"
) within the runtime.spanw
span for the task, but the wake itself happens by value after the span has exited (op="waker.wake"
).
Note: The
task.id
on thetokio::task::waker
traces corresponds to the tracing span Id, which is shown for traces in brackets after the name, e.g.runtime.spawn[1]
.
Logs, error output, etc
No response
Versions
Tokio 1.23
Possible solution
We may need to modify the instrumentation in Tokio itself to be able to detect a self wake. Since Tokio knows that it is performing a self wake (the deferred waking code is only used from tokio::task::yield_now()
), it may be possible to explicitly include in the instrumentation when a wake is a self wake.
It's not necessarily straight forward though, this code passes through the std library Waker
object.
Additional context
No response
Would you like to work on fixing this bug?
yes