Description
The coroutines are using atomic reference implementation backed by Atomic*FieldUpdater
which is 2x slower for compareAndSet
and set
when compared to AtomicReference
on Android devices.
Running the benchmark on Pixel 4a I see the following results:
29.0 ns atomicReference_getAndSwap
71.3 ns atomicRef_getAndSwap
50.7 ns atomicReference_compareAndSet
135 ns atomicRef_compareAndSet
4.3 ns atomicReference_get
4.2 ns atomicRef_get
18.0 ns atomicReference_lazySet
79.4 ns atomicRef_lazySet
In the benchmark above, AtomicReference
is 2x to 4x faster on write than atomicfu
. Looking at method traces from Compose benchmarks, I see that certain Job
and CoroutineContext
operations run multiple atomic operations depending on Job
graph complexity.
You can see the trace from one of the abovementioned benchmarks here (it should be focused on coroutine work). It does not represent exact timing due to overhead of tracing every method, but does represent the amount of work being done. Most of it is dominated by Atomic*FieldUpdater
operations and class instance checks that Atomic*FieldUpdater
does.
As an experiment, I forked atomicfu
to be backed by AtomicReference
and got ~5% improvement in the same Compose benchmark, which is pretty significant given amount work Compose executes outside of coroutine context.
I filed a separate issue for Android internally to potentially fix this on the runtime side, but any fix will affect only relatively small subset of modern devices that support runtime updates.