Discourage manual execution control in the test framework

## Problem statement, the short version

We receive a lot of questions about using the test module, the answer to which is to avoid using `advanceUntilIdle`/`advanceTimeBy`.

`advanceTimeBy`, `advanceUntilIdle`, and `runCurrent` can be replaced by `delay`, `yield`, or cross-coroutine communication in most tests, with better results and much clearer semantics. Because `runTest` uses virtual time, even `delay(1.seconds)` is instantaneous. Manual time control is error-prone, discouraged, has surprising behavior, and is increasingly incompatible with the rest of the framework.

We should think of ways to make these functions less prominent so that people don't use them unless they absolutely need to, providing alternatives that more directly and correctly express the intent behind their typical usage.

**Request for comments: if you use `advanceTimeBy`, `runCurrent`, or `advanceUntilIdle` in your code and don't think they could be reasonably replaced with the suggested alternatives when we implement them, please provide examples of your use cases.**

## Alternatives to each manual execution control function

### `advanceTimeBy`

This function is used when some specific event that's known to be scheduled into the future needs to happen before its results are validated.

Examples:

```kotlin
fun CoroutineScope.runAfter(n: Duration, blockAction: () -> Unit) {
  launch {
    delay(n)
    blockAction()
  }
}

// The code before: the simple scenario
@Test
fun testRunAfter1() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  advanceTimeBy(11.seconds)
  assertTrue(entered)
}

// The code before: the complex scenario
@Test
fun testRunAfter2() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  advanceTimeBy(10.seconds)
  assertFalse(entered)
  runCurrent()
  assertTrue(entered)
}
```

When you have `advanceTimeBy(n)` directly in your test (which is almost all of its usages), it can be replaced with `delay(n-1.milliseconds)`.
* `-1` is needed because `advanceTimeBy` does not execute the tasks scheduled for the moment `currentTime() + n`, whereas `delay(n)` will also execute the tasks that were already scheduled for `currentTime() + n` before `delay(n)` was called.
* However, the virtual time will be one millisecond off compared to what it was before, which may need to be accounted for downstream.

So, these tests become

```kotlin
// The code after: the simple scenario
@Test
fun testRunAfter1() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  delay(11.seconds) // here, there's no difference whether to subtract one millisecond
  assertTrue(entered)
}

// The code after: the complex scenario
@Test
fun testRunAfter2() = runTest {
  var entered = false
  runAfter(10.seconds) { entered = true }
  delay(10.seconds - 1.milliseconds)
  assertFalse(entered)
  delay(1.milliseconds) // can't run `runCurrent`, as it was not yet time for the block to execute
  assertTrue(entered)
}
```

### `advanceUntilIdle`

This function has the behavior of running the tasks present in the *test scheduler*, and only there, while there are tasks outside of `backgroundScope`. Almost always, the upper bound on the virtual time it takes to execute all code in the test dispatcher is known. Typically, it's just `1.milliseconds`. So, functionally, `advanceUntilIdle` can almost always be replaced by `delay(20.hours)`.

However, in many cases, there are problems with this replacement:
* The most important thing is, it doesn't demonstrate the intent: the programmer *intends* to wait until the system is idle, and `20.hours` is a completely arbitrary constant.
  * Then again, `advanceUntilIdle` doesn't always do what it says on the box, as work on other dispatchers is not taken into account, so idleness is a fuzzy concept anyway. Though it's inarguable that when `advanceUntilIdle` does work, it also shows the intent.
* If there is work in `backgroundScope` that happens, say, every `100.milliseconds`, `delay(20.hours)` will waste a lot of processing power on that work.

We must first present a viable alternative to confidently claim that `advanceUntilIdle` is no longer needed.

In most cases we've seen, what people intend is actually for the spawned coroutines to finish, regardless of whether these coroutines execute on the test dispatcher. So, we could introduce a function like

```kotlin
/**
 * Suspends until all the coroutines spawned in this [CoroutineScope] are completed.
 */
suspend fun CoroutineScope.awaitAllChildren(): Unit {
    while (true) {
        val children = this.coroutineContext[Job]!!.children.toList()
        if (children.all { !isActive }) break
        children.forEach { it.join() }
    }
}
```

Replacing `advanceUntilIdle` with `awaitAllChildren()` wouldn't always work. For example, here, it would lead to a deadlock:

```kotlin
fun testFoo() = runTest {
  val channel = Channel<Unit>()
  launch { channel.receive() }
  advanceUntilIdle()
  channel.send(Unit)
}
```

However, in all the cases we've seen so far where this happens, `advanceUntilIdle` is not needed at all and was probably put there just in case. We'll need to investigate further.

### `runCurrent`

As shown in the previous example, `delay(1.milliseconds)` usually does the trick.

Still, this replacement is problematic:
* It doesn't demonstrate the intent.
* It is not a direct equivalent to `runCurrent`, as `runCurrent` ensures no more tasks are scheduled for the current moment by running all of them.

A proper replacement would probably be a function like
```kotlin
/**
 * Waits until all the tasks left are the ones scheduled for a later point of virtual time.
 */
suspend fun TestCoroutineScheduler.awaitIdle(): Unit
```

This is also needed to properly support using Espresso with the test framework https://github.com/Kotlin/kotlinx.coroutines/issues/242

## Problem statement, the long version

### The current understanding of how people (need to) test coroutines

Before the large test framework refactoring, there were effectively two APIs: a low-level one (`TestCoroutineScope`, `runCurrent`, `advanceUntilIdle`, `advanceTimeBy`) and a higher-level one, `runBlockingTest`, built on top of it.

There were three patterns for testing coroutines.

#### Library vs framework

1. The first pattern looked like this:

```kotlin
// OLD API, no longer available
fun testFooBar() {
  val scope = TestCoroutineScope()
  try {
    scope.launch { foo() }
    advanceUntilIdle()
    scope.launch { bar() }
    advanceUntilIdle()
  } finally {
    cleanupTestCoroutines()
  }
}
```

It takes an "off from the side" look at coroutines: the testing facilities were treated as a *library* of functions to be mixed and matched, with behavior being *requested* from them. One would explicitly schedule work on the test facilities, explicitly ask for it to be executed, and explicitly ensure the correct behavior.

2. Another approach was like this:

```kotlin
// OLD API, no longer available
fun testFooBar() {
  runBlockingTest {
    foo()
    bar()
  }
}
```

Here, testing is conducted from *inside* a coroutine. `runBlockingTest` itself called `advanceUntilIdle` and `cleanupTestCoroutines`, ensuring the correct behavior in the common case, but giving up some of the flexibility. `runBlockingTest` was a *framework* for coroutines to be tested in. `runBlockingTest` was itself implemented on top of the primitives provided by the library-like interface.

3. There is also a mixed approach:

```
// OLD API, no longer available
fun testFooBar() {
  runBlockingTest {
    launch { foo() }
    advanceUntilIdle()
    launch { bar() }
    advanceUntilIdle()
  }
}
```

After the big refactoring, where every coroutine test must be wrapped in `runTest`, tests that used to be in the library form took this approach instead of translating to the pure framework-like one. What follows is an explanation of why both the pure library approach and the mixed approach are suboptimal for the common use cases and one should relegate the execution control to `runTest`.

#### The issues with manual execution control

At a glance, it may look like the library approach is clearly the better one: everything is explicit, with minimal magical behavior that needs to be understood, and one can engage with the scheduled coroutines as closely as needed, whereas `runBlockingTest` just does its thing somehow on its own in mysterious ways.

Unfortunately, careful study of hundreds of tests has shown that what people need *is* a framework for testing, not a library.

##### Tests not testing anything

People were misusing the testing library all the time. For example, consider this "test":

```kotlin
// OLD API, no longer available
@Test
fun testFoo() {
  val testScope = TestCoroutineScope()
  testScope.launch {
    val a = foo()
    assertEquals(x, a)
  }
}
```

This test runs `foo` until the first suspension and then stops doing anything at all. `assertEquals` will never get called.

The root of the issue is that it's just incorrect to create a `TestCoroutineScope` and not call `cleanupTestCoroutines` at some point. This test is more than useless: it gives a false sense of security, leaving the impression that `foo` is properly tested.

##### No interoperating with asynchronous resumptions

###### Problem statement

Manual time control only knows about tasks that the test dispatcher needs to execute. It doesn't know anything about tasks that happen on other dispatchers.

```kotlin
fun foo() = withContext(Dispatchers.IO) {
  // a network call
}

// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
  runBlockingTest {
    launch {
      val a = foo()
      assertEquals(x, a)
    }
    advanceUntilIdle()
    launch {
      val b = foo()
      assertEquals(x, b)
    }
  }
}
```

`advanceUntilIdle` here will progress the first launched coroutine until the point when `foo` is entered. After that, the test dispatcher has no control over what happens, `Dispatchers.IO` has to take it from there.

So, by the time the second `launch` happens, a network call may or may have not been executed. Most likely, it wasn't. There are race conditions in this test, making it flaky and unpredictable.

Luckily, most tests were not written this way: most likely, the network call wouldn't have enough time to finish by the end of the test, and `runBlockingTest` would crash with an exception notifying that some coroutines spawned inside it didn't complete. This behavior was mysterious to many, as understanding it requires a solid model of the internals of coroutines, and so people were just generally discouraged from using non-test dispatchers inside tests.

Add to it the previous problem of people not even using `runBlockingTest` and forgetting to clean up coroutines, and you get a recipe for disaster: even if you didn't forget `advanceUntilIdle` at the end of the test, it still meant little. Some adventurous souls went out of their way to make tests pass, rewriting tests not to use `runBlockingTest` or `cleanupTestCoroutines` as they "caused the test to crash."

###### A partial fix

The library approach *can* be made to properly work with external dispatches like this:

```kotlin
// OLD API, no longer available
@Test
fun testFoo() {
  val scope = TestCoroutineScope()
  val job = scope.launch {
    // ____________ A ___________
    // this is the main test body
    val a = foo()
    assertEquals(x, a)
    val b = foo()
    assertEquals(x, b)
    // ____________ B ____________
  }
  try {
    // ____________ C ____________
    scope.advanceUntilIdle()
    while (job.isActive) {
      delay(100.milliseconds)
      scope.advanceUntilIdle()
    }
    // _____________ D ____________
  } finally {
    scope.cleanupTestCoroutines()
  }
}
```

This way, until the last line of the test body has finished executing (after which `job.isActive` becomes `false`), we run all the tasks scheduled in the test dispatcher, and if we're not done after that, also wait for `100.milliseconds` for some other dispatchers to asynchronously pass more tasks to the test dispatcher.

However, note that the only part of the test relevant to us is between lines `A` and `B`. All the other things are simply boilerplate that virtually every test needs, but is very difficult to come up with.

###### Manual execution control inherently can't support asynchronous work in general

All of the above is a good indication that manual execution control is error-prone in general in the case of asynchronous resumptions: every `advanceUntilIdle` must be replaced with code between lines `C` and `D` in order to properly support them.

The problem is, the role of `job` is not always obvious. For example, consider this test:

```kotlin
fun CoroutineScope.bar() {
  launch(Dispatchers.IO) {
    // something
  }
}

// OLD API, no longer available
// this test would most likely crash
@Test
fun testFoo() {
  runBlockingTest {
    bar()
    advanceUntilIdle()
  }
}
```

If one attempted to replace `advanceUntilIdle` here with the `C`-`D` construct, there would be no one to fulfill the role of `job`. `bar` does call `launch`, but does it as an implementation detail, without exposing the resulting `Job`. Calling `advanceUntilIdle` here is simply useless, even though it doesn't seem this way at a glance: from the point of view of the test framework, the system *is* idle while non-test dispatchers are busy.

Replacing `advanceUntilIdle` with `delay(2.seconds)` would not fix the behavior in this case, but it would still clear up the conceptual model. Since, inside the test dispatcher, delays are instantaneous, from its point of view, waiting for `Dispatchers.IO` to finish doing its thing requires infinite time. The problem of "I don't know when asynchronous work will be completed" needs to be solved programmatically somehow: you can't test asynchronous work if you don't know when it finishes, but the false concept of system idleness significantly muddies the mental model behind the testing framework.

##### Interactions between manual execution control and real-time require thread blocking

Another issue stems from the fact that manual execution control functions are blocking and can't suspend.

For an example of where this causes problems, see https://github.com/Kotlin/kotlinx.coroutines/issues/3179. We have a reasonable request to disable virtual time in parts of tests. However, the current design of manual execution control prevents us from reaching a clear solution.

```kotlin
fun testFoo() = runTest {
  val job1 = launch(NoDelaySkipping) {
    delay(1.seconds)
    println(1)
  }
  launch {
    withContext(Dispatchers.IO) {
      Thread.sleep(500) // 0.5 seconds
    }
    println(0)
  }
  val job2 = launch {
    delay(2.seconds)
    println(2)
  }
  // _______ A _________
  job1.join()
  println(currentTime())
}
```

What virtual time should be printed? In what order should the numbers be printed?

There are seemingly two viable answers:
1. It doesn't matter, as disabling virtual time is only needed in fairly specific circumstances, or
2. The current time should be `1.seconds`, the order should be `0` (in half a second), `1` (in a second since the start of the test), and then `2` (also in a second since the start of the test).

It's unlikely that someone would *expect* `job1` to complete after `job2` just because it's required to use the real-world time for `job1`, as the abstraction behind virtual time is that it is just like the actual time, but for tests, it's passing infinitely faster.

The next question is, what should happen if line `A` is replaced with `advanceUntilIdle` or `advanceTimeBy(2.seconds)`. Neither of them is allowed to wait for one real-life second, as they are blocking functions. They would have to block the thread during waiting, which is not even expressible in the JS implementation of the test framework, but would be a strange behavior in any case. In any case, after half a second of waiting, their blocking would also need to be interrupted, as a new task arrived to the test scheduler, ready for execution.

All of this could be easily mitigated if `advanceUntilIdle` and `advanceTimeBy` were `suspend` functions: they would just suspend, freeing the thread to do other things until the real-time wait is over or a new task arrives. This is exactly the behavior that writing `delay(2.seconds)` instead of `advanceTimeBy` has out of the box!

There is no way to make time control functions play nicely in these scenarios. It seems like the only sensible way to implement them would be to throw `UnsupporteOperationException` if they encounter a task that requires waiting for a given amount of real time.

##### Most tests don't need this kind of complex API

Out of the hundreds of tests we've seen during the research, there was a single-digit number of them that did use manual execution control in a way that the framework style wasn't better suited for.

When implementing low-level libraries over the coroutines (for example, when writing your own schedulers), one may need to validate the behavior in cases of interleavings that are difficult to emulate with just `runBlockingTest`, mocking calls, and calling `delay`. The majority of tests were worse off not using `runBlockingTest` than otherwise, being more brittle.

##### Why did we keep manual execution control after the refactoring if it's so problematic?

There are two reasons:

* *A lot* of code was already written using manual execution control. We took backward compatibility seriously, despite the library being marked as experimental, and attempted to provide a streamlined way to migrate from the old API to the new one: https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-test/MIGRATION.md Barring a few exceptions, we mostly succeeded. It wouldn't be possible if we just removed manual execution control.
* At the time, we didn't fully understand that manual execution control is more trouble than it's worth. Many other entities in the test framework needed our attention, and manual execution control was comparatively benign.
* Some tests *do* benefit from manual execution control. These are mostly low-level tests of tricky things that need to do something uncommon, like calling `runCurrent` or `advanceTimeBy` from outside the test body, where they can't be replaced with a `delay`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discourage manual execution control in the test framework #3919

Problem statement, the short version

Alternatives to each manual execution control function

`advanceTimeBy`

`advanceUntilIdle`

`runCurrent`

Problem statement, the long version

The current understanding of how people (need to) test coroutines

Library vs framework

The issues with manual execution control

Tests not testing anything

No interoperating with asynchronous resumptions

Problem statement

A partial fix

Manual execution control inherently can't support asynchronous work in general

Interactions between manual execution control and real-time require thread blocking

Most tests don't need this kind of complex API

Why did we keep manual execution control after the refactoring if it's so problematic?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discourage manual execution control in the test framework #3919

Description

Problem statement, the short version

Alternatives to each manual execution control function

advanceTimeBy

advanceUntilIdle

runCurrent

Problem statement, the long version

The current understanding of how people (need to) test coroutines

Library vs framework

The issues with manual execution control

Tests not testing anything

No interoperating with asynchronous resumptions

Problem statement

A partial fix

Manual execution control inherently can't support asynchronous work in general

Interactions between manual execution control and real-time require thread blocking

Most tests don't need this kind of complex API

Why did we keep manual execution control after the refactoring if it's so problematic?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`advanceTimeBy`

`advanceUntilIdle`

`runCurrent`