Description
Problem statement, the short version
We receive a lot of questions about using the test module, the answer to which is to avoid using advanceUntilIdle
/advanceTimeBy
.
advanceTimeBy
, advanceUntilIdle
, and runCurrent
can be replaced by delay
, yield
, or cross-coroutine communication in most tests, with better results and much clearer semantics. Because runTest
uses virtual time, even delay(1.seconds)
is instantaneous. Manual time control is error-prone, discouraged, has surprising behavior, and is increasingly incompatible with the rest of the framework.
We should think of ways to make these functions less prominent so that people don't use them unless they absolutely need to, providing alternatives that more directly and correctly express the intent behind their typical usage.
Request for comments: if you use advanceTimeBy
, runCurrent
, or advanceUntilIdle
in your code and don't think they could be reasonably replaced with the suggested alternatives when we implement them, please provide examples of your use cases.
Alternatives to each manual execution control function
advanceTimeBy
This function is used when some specific event that's known to be scheduled into the future needs to happen before its results are validated.
Examples:
fun CoroutineScope.runAfter(n: Duration, blockAction: () -> Unit) {
launch {
delay(n)
blockAction()
}
}
// The code before: the simple scenario
@Test
fun testRunAfter1() = runTest {
var entered = false
runAfter(10.seconds) { entered = true }
advanceTimeBy(11.seconds)
assertTrue(entered)
}
// The code before: the complex scenario
@Test
fun testRunAfter2() = runTest {
var entered = false
runAfter(10.seconds) { entered = true }
advanceTimeBy(10.seconds)
assertFalse(entered)
runCurrent()
assertTrue(entered)
}
When you have advanceTimeBy(n)
directly in your test (which is almost all of its usages), it can be replaced with delay(n-1.milliseconds)
.
-1
is needed becauseadvanceTimeBy
does not execute the tasks scheduled for the momentcurrentTime() + n
, whereasdelay(n)
will also execute the tasks that were already scheduled forcurrentTime() + n
beforedelay(n)
was called.- However, the virtual time will be one millisecond off compared to what it was before, which may need to be accounted for downstream.
So, these tests become
// The code after: the simple scenario
@Test
fun testRunAfter1() = runTest {
var entered = false
runAfter(10.seconds) { entered = true }
delay(11.seconds) // here, there's no difference whether to subtract one millisecond
assertTrue(entered)
}
// The code after: the complex scenario
@Test
fun testRunAfter2() = runTest {
var entered = false
runAfter(10.seconds) { entered = true }
delay(10.seconds - 1.milliseconds)
assertFalse(entered)
delay(1.milliseconds) // can't run `runCurrent`, as it was not yet time for the block to execute
assertTrue(entered)
}
advanceUntilIdle
This function has the behavior of running the tasks present in the test scheduler, and only there, while there are tasks outside of backgroundScope
. Almost always, the upper bound on the virtual time it takes to execute all code in the test dispatcher is known. Typically, it's just 1.milliseconds
. So, functionally, advanceUntilIdle
can almost always be replaced by delay(20.hours)
.
However, in many cases, there are problems with this replacement:
- The most important thing is, it doesn't demonstrate the intent: the programmer intends to wait until the system is idle, and
20.hours
is a completely arbitrary constant.- Then again,
advanceUntilIdle
doesn't always do what it says on the box, as work on other dispatchers is not taken into account, so idleness is a fuzzy concept anyway. Though it's inarguable that whenadvanceUntilIdle
does work, it also shows the intent.
- Then again,
- If there is work in
backgroundScope
that happens, say, every100.milliseconds
,delay(20.hours)
will waste a lot of processing power on that work.
We must first present a viable alternative to confidently claim that advanceUntilIdle
is no longer needed.
In most cases we've seen, what people intend is actually for the spawned coroutines to finish, regardless of whether these coroutines execute on the test dispatcher. So, we could introduce a function like
/**
* Suspends until all the coroutines spawned in this [CoroutineScope] are completed.
*/
suspend fun CoroutineScope.awaitAllChildren(): Unit {
while (true) {
val children = this.coroutineContext[Job]!!.children.toList()
if (children.all { !isActive }) break
children.forEach { it.join() }
}
}
Replacing advanceUntilIdle
with awaitAllChildren()
wouldn't always work. For example, here, it would lead to a deadlock:
fun testFoo() = runTest {
val channel = Channel<Unit>()
launch { channel.receive() }
advanceUntilIdle()
channel.send(Unit)
}
However, in all the cases we've seen so far where this happens, advanceUntilIdle
is not needed at all and was probably put there just in case. We'll need to investigate further.
runCurrent
As shown in the previous example, delay(1.milliseconds)
usually does the trick.
Still, this replacement is problematic:
- It doesn't demonstrate the intent.
- It is not a direct equivalent to
runCurrent
, asrunCurrent
ensures no more tasks are scheduled for the current moment by running all of them.
A proper replacement would probably be a function like
/**
* Waits until all the tasks left are the ones scheduled for a later point of virtual time.
*/
suspend fun TestCoroutineScheduler.awaitIdle(): Unit
This is also needed to properly support using Espresso with the test framework #242
Problem statement, the long version
The current understanding of how people (need to) test coroutines
Before the large test framework refactoring, there were effectively two APIs: a low-level one (TestCoroutineScope
, runCurrent
, advanceUntilIdle
, advanceTimeBy
) and a higher-level one, runBlockingTest
, built on top of it.
There were three patterns for testing coroutines.
Library vs framework
- The first pattern looked like this:
// OLD API, no longer available
fun testFooBar() {
val scope = TestCoroutineScope()
try {
scope.launch { foo() }
advanceUntilIdle()
scope.launch { bar() }
advanceUntilIdle()
} finally {
cleanupTestCoroutines()
}
}
It takes an "off from the side" look at coroutines: the testing facilities were treated as a library of functions to be mixed and matched, with behavior being requested from them. One would explicitly schedule work on the test facilities, explicitly ask for it to be executed, and explicitly ensure the correct behavior.
- Another approach was like this:
// OLD API, no longer available
fun testFooBar() {
runBlockingTest {
foo()
bar()
}
}
Here, testing is conducted from inside a coroutine. runBlockingTest
itself called advanceUntilIdle
and cleanupTestCoroutines
, ensuring the correct behavior in the common case, but giving up some of the flexibility. runBlockingTest
was a framework for coroutines to be tested in. runBlockingTest
was itself implemented on top of the primitives provided by the library-like interface.
- There is also a mixed approach:
// OLD API, no longer available
fun testFooBar() {
runBlockingTest {
launch { foo() }
advanceUntilIdle()
launch { bar() }
advanceUntilIdle()
}
}
After the big refactoring, where every coroutine test must be wrapped in runTest
, tests that used to be in the library form took this approach instead of translating to the pure framework-like one. What follows is an explanation of why both the pure library approach and the mixed approach are suboptimal for the common use cases and one should relegate the execution control to runTest
.
The issues with manual execution control
At a glance, it may look like the library approach is clearly the better one: everything is explicit, with minimal magical behavior that needs to be understood, and one can engage with the scheduled coroutines as closely as needed, whereas runBlockingTest
just does its thing somehow on its own in mysterious ways.
Unfortunately, careful study of hundreds of tests has shown that what people need is a framework for testing, not a library.
Tests not testing anything
People were misusing the testing library all the time. For example, consider this "test":
// OLD API, no longer available
@Test
fun testFoo() {
val testScope = TestCoroutineScope()
testScope.launch {
val a = foo()
assertEquals(x, a)
}
}
This test runs foo
until the first suspension and then stops doing anything at all. assertEquals
will never get called.
The root of the issue is that it's just incorrect to create a TestCoroutineScope
and not call cleanupTestCoroutines
at some point. This test is more than useless: it gives a false sense of security, leaving the impression that foo
is properly tested.
No interoperating with asynchronous resumptions
Problem statement
Manual time control only knows about tasks that the test dispatcher needs to execute. It doesn't know anything about tasks that happen on other dispatchers.
fun foo() = withContext(Dispatchers.IO) {
// a network call
}
// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
runBlockingTest {
launch {
val a = foo()
assertEquals(x, a)
}
advanceUntilIdle()
launch {
val b = foo()
assertEquals(x, b)
}
}
}
advanceUntilIdle
here will progress the first launched coroutine until the point when foo
is entered. After that, the test dispatcher has no control over what happens, Dispatchers.IO
has to take it from there.
So, by the time the second launch
happens, a network call may or may have not been executed. Most likely, it wasn't. There are race conditions in this test, making it flaky and unpredictable.
Luckily, most tests were not written this way: most likely, the network call wouldn't have enough time to finish by the end of the test, and runBlockingTest
would crash with an exception notifying that some coroutines spawned inside it didn't complete. This behavior was mysterious to many, as understanding it requires a solid model of the internals of coroutines, and so people were just generally discouraged from using non-test dispatchers inside tests.
Add to it the previous problem of people not even using runBlockingTest
and forgetting to clean up coroutines, and you get a recipe for disaster: even if you didn't forget advanceUntilIdle
at the end of the test, it still meant little. Some adventurous souls went out of their way to make tests pass, rewriting tests not to use runBlockingTest
or cleanupTestCoroutines
as they "caused the test to crash."
A partial fix
The library approach can be made to properly work with external dispatches like this:
// OLD API, no longer available
@Test
fun testFoo() {
val scope = TestCoroutineScope()
val job = scope.launch {
// ____________ A ___________
// this is the main test body
val a = foo()
assertEquals(x, a)
val b = foo()
assertEquals(x, b)
// ____________ B ____________
}
try {
// ____________ C ____________
scope.advanceUntilIdle()
while (job.isActive) {
delay(100.milliseconds)
scope.advanceUntilIdle()
}
// _____________ D ____________
} finally {
scope.cleanupTestCoroutines()
}
}
This way, until the last line of the test body has finished executing (after which job.isActive
becomes false
), we run all the tasks scheduled in the test dispatcher, and if we're not done after that, also wait for 100.milliseconds
for some other dispatchers to asynchronously pass more tasks to the test dispatcher.
However, note that the only part of the test relevant to us is between lines A
and B
. All the other things are simply boilerplate that virtually every test needs, but is very difficult to come up with.
Manual execution control inherently can't support asynchronous work in general
All of the above is a good indication that manual execution control is error-prone in general in the case of asynchronous resumptions: every advanceUntilIdle
must be replaced with code between lines C
and D
in order to properly support them.
The problem is, the role of job
is not always obvious. For example, consider this test:
fun CoroutineScope.bar() {
launch(Dispatchers.IO) {
// something
}
}
// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
runBlockingTest {
bar()
advanceUntilIdle()
}
}
If one attempted to replace advanceUntilIdle
here with the C
-D
construct, there would be no one to fulfill the role of job
. bar
does call launch
, but does it as an implementation detail, without exposing the resulting Job
. Calling advanceUntilIdle
here is simply useless, even though it doesn't seem this way at a glance: from the point of view of the test framework, the system is idle while non-test dispatchers are busy.
Replacing advanceUntilIdle
with delay(2.seconds)
would not fix the behavior in this case, but it would still clear up the conceptual model. Since, inside the test dispatcher, delays are instantaneous, from its point of view, waiting for Dispatchers.IO
to finish doing its thing requires infinite time. The problem of "I don't know when asynchronous work will be completed" needs to be solved programmatically somehow: you can't test asynchronous work if you don't know when it finishes, but the false concept of system idleness significantly muddies the mental model behind the testing framework.
Interactions between manual execution control and real-time require thread blocking
Another issue stems from the fact that manual execution control functions are blocking and can't suspend.
For an example of where this causes problems, see #3179. We have a reasonable request to disable virtual time in parts of tests. However, the current design of manual execution control prevents us from reaching a clear solution.
fun testFoo() = runTest {
val job1 = launch(NoDelaySkipping) {
delay(1.seconds)
println(1)
}
launch {
withContext(Dispatchers.IO) {
Thread.sleep(500) // 0.5 seconds
}
println(0)
}
val job2 = launch {
delay(2.seconds)
println(2)
}
// _______ A _________
job1.join()
println(currentTime())
}
What virtual time should be printed? In what order should the numbers be printed?
There are seemingly two viable answers:
- It doesn't matter, as disabling virtual time is only needed in fairly specific circumstances, or
- The current time should be
1.seconds
, the order should be0
(in half a second),1
(in a second since the start of the test), and then2
(also in a second since the start of the test).
It's unlikely that someone would expect job1
to complete after job2
just because it's required to use the real-world time for job1
, as the abstraction behind virtual time is that it is just like the actual time, but for tests, it's passing infinitely faster.
The next question is, what should happen if line A
is replaced with advanceUntilIdle
or advanceTimeBy(2.seconds)
. Neither of them is allowed to wait for one real-life second, as they are blocking functions. They would have to block the thread during waiting, which is not even expressible in the JS implementation of the test framework, but would be a strange behavior in any case. In any case, after half a second of waiting, their blocking would also need to be interrupted, as a new task arrived to the test scheduler, ready for execution.
All of this could be easily mitigated if advanceUntilIdle
and advanceTimeBy
were suspend
functions: they would just suspend, freeing the thread to do other things until the real-time wait is over or a new task arrives. This is exactly the behavior that writing delay(2.seconds)
instead of advanceTimeBy
has out of the box!
There is no way to make time control functions play nicely in these scenarios. It seems like the only sensible way to implement them would be to throw UnsupporteOperationException
if they encounter a task that requires waiting for a given amount of real time.
Most tests don't need this kind of complex API
Out of the hundreds of tests we've seen during the research, there was a single-digit number of them that did use manual execution control in a way that the framework style wasn't better suited for.
When implementing low-level libraries over the coroutines (for example, when writing your own schedulers), one may need to validate the behavior in cases of interleavings that are difficult to emulate with just runBlockingTest
, mocking calls, and calling delay
. The majority of tests were worse off not using runBlockingTest
than otherwise, being more brittle.
Why did we keep manual execution control after the refactoring if it's so problematic?
There are two reasons:
- A lot of code was already written using manual execution control. We took backward compatibility seriously, despite the library being marked as experimental, and attempted to provide a streamlined way to migrate from the old API to the new one: https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-test/MIGRATION.md Barring a few exceptions, we mostly succeeded. It wouldn't be possible if we just removed manual execution control.
- At the time, we didn't fully understand that manual execution control is more trouble than it's worth. Many other entities in the test framework needed our attention, and manual execution control was comparatively benign.
- Some tests do benefit from manual execution control. These are mostly low-level tests of tricky things that need to do something uncommon, like calling
runCurrent
oradvanceTimeBy
from outside the test body, where they can't be replaced with adelay
.