Skip to content

CancellationToken is_cancelled locks on a mutex, caused occasional >2ms delays #7775

@KarstenB

Description

@KarstenB

Version
tokio v1.48.0
tokio-util v0.7.17

Platform
Linux machine 6.12.57+deb13-rt-amd64 #1 SMP PREEMPT_RT Debian 6.12.57-1 (2025-11-05) x86_64 GNU/Linux

The machine is tuned for running RT applications.

Description
I have an async method that spawns a regular thread. The async method has a CancellationToken that is used in a loop like this:

while !token.is_cancelled() && !tx_thread.is_finished() {
    tick.tick().await;
    //print statistics
}

And the sync thread is doing almost the same:

adjust_prio(51, SCHED_RR);
while !token.is_cancelled() {
   do_work();
   thread::sleep(Duration::from_micros(100));
}

The sync thread is running on an isol_cpu, the IRQs have been moved away from it. Nothing else is running on this particular core. So I ran FTrace to figure out what was going on, and the results looked something like this:

MyProg-442147  [013] ..... 242755.807682: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.807793: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.807916: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.808037: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.808150: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.808270: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ..... 242755.808381: common_nsleep <-__x64_sys_clock_nanosleep
MyProg-442147  [013] ...1. 242755.808492: sys_futex(uaddr: 7f9f3011b810, op: 89, val: 2, utime: 0, uaddr2: 0, val3: 7f9effffffff)
MyProg-442147  [013] ...1. 242755.810545: sys_futex -> 0x0
MyProg-442147  [013] ...1. 242755.810547: sys_futex(uaddr: 7f9f3011b810, op: 81, val: 1, utime: 0, uaddr2: 7f9effffffff, val3: 0)
MyProg-442147  [013] ...1. 242755.810558: sys_futex -> 0x1
MyProg-442147  [013] ..... 242755.810620: tracing_mark_write: DELAY=2.123ms LOOP_START=6.4µs

We are writing do_work ourselves, so I was fairly certain it does not contain any futex related objects or activities. But somehow, ~one an hour for a particular workload, we received a DELAY between our do_work() calls which were beyond reasonable timing.

Ultimately it turned out I did not check what is_cancelled does, It actually acquires a mutex! That was very unexpected, I expected it to be a simple AtomicBool check or something like that.

tldr: The is_cancelled method of the CancellationToken acquires a mutex, and the documentation does not mention the potential blocking behavior! Also it would be nice to have a non-blocking version (I don't care about it providing an incorrect cached value) as I already spread the CancellationToken around my code-base and it would be nice to re-use it, even in timing critical loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-tokioArea: The main tokio crateM-syncModule: tokio/sync

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions