-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Version
tokio v1.48.0
tokio-util v0.7.17
Platform
Linux machine 6.12.57+deb13-rt-amd64 #1 SMP PREEMPT_RT Debian 6.12.57-1 (2025-11-05) x86_64 GNU/Linux
The machine is tuned for running RT applications.
Description
I have an async method that spawns a regular thread. The async method has a CancellationToken that is used in a loop like this:
while !token.is_cancelled() && !tx_thread.is_finished() {
tick.tick().await;
//print statistics
}And the sync thread is doing almost the same:
adjust_prio(51, SCHED_RR);
while !token.is_cancelled() {
do_work();
thread::sleep(Duration::from_micros(100));
}The sync thread is running on an isol_cpu, the IRQs have been moved away from it. Nothing else is running on this particular core. So I ran FTrace to figure out what was going on, and the results looked something like this:
MyProg-442147 [013] ..... 242755.807682: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.807793: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.807916: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.808037: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.808150: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.808270: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ..... 242755.808381: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147 [013] ...1. 242755.808492: sys_futex(uaddr: 7f9f3011b810, op: 89, val: 2, utime: 0, uaddr2: 0, val3: 7f9effffffff)
MyProg-442147 [013] ...1. 242755.810545: sys_futex -> 0x0
MyProg-442147 [013] ...1. 242755.810547: sys_futex(uaddr: 7f9f3011b810, op: 81, val: 1, utime: 0, uaddr2: 7f9effffffff, val3: 0)
MyProg-442147 [013] ...1. 242755.810558: sys_futex -> 0x1
MyProg-442147 [013] ..... 242755.810620: tracing_mark_write: DELAY=2.123ms LOOP_START=6.4µs
We are writing do_work ourselves, so I was fairly certain it does not contain any futex related objects or activities. But somehow, ~one an hour for a particular workload, we received a DELAY between our do_work() calls which were beyond reasonable timing.
Ultimately it turned out I did not check what is_cancelled does, It actually acquires a mutex! That was very unexpected, I expected it to be a simple AtomicBool check or something like that.
tldr: The is_cancelled method of the CancellationToken acquires a mutex, and the documentation does not mention the potential blocking behavior! Also it would be nice to have a non-blocking version (I don't care about it providing an incorrect cached value) as I already spread the CancellationToken around my code-base and it would be nice to re-use it, even in timing critical loops.