Description
Currently, all panics on tasks are caught and exposed to the user via
Joinhandle
. However, it is somewhat uncommon to use the JoinHandle
.
Background tasks are spawned and may silently fail resulting in the rest of the
application to hang. Also, in tests, a background task that panics can result in
the test hanging indefinitely, making debugging annoying.
That said, the current behavior is the correct default. Even if it weren't,
changing it now would be too late. A task boundary is a logical boundary to
separate failure. When implementing a sever, it is not desirable to have an
uncommon bug in one request handler to take down the entire process.
So, because different scenarios merit different behaviors, a runtime
configuration option could provide the user with the ability to pick the
behavior best suited for their case.
There are a few ways panics could be handled:
- Forward to the
JoinHandle
and ignore otherwise (what happens today). - Forward to the
Joinhandle
but if theJoinHandle
drops (ignores the result)
then shutdown the runtime. - Always shutdown the runtime on panic.
- Pass the panic to a user provided callback to pick which of the above
strategies to take.
So, to expose the different options to the user:
#[non_exhaustive]
// TODO: naming?
enum UnhandledPanic {
Ignore,
ShutdownRuntime,
ShutdownRuntimeIfIgnored,
}
type PanicError = Box<dyn Any + Send + 'static>;
impl runtime::Builder {
fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }
fn on_unhandled_panic(&mut self, f: Fn(PanicError) -> UnhandledPanic) { ... }
}
Runtime shutdown
What does it mean to "shutdown the runtime" on unhandled panic. First, the
current shutdown behavior is executed. All in-flight tasks are forcibly aborted
and runtime resources are disabled. The next question is how to expose the
unhandled panic.
If the user enables "shutdown runtime on unhandled panic" and a panic does get
through, it seems likely that this is a bug. The Runtime
methods in question
are:
spawn
block_on
spawn
could maintain the current behavior when called after a runtime has
shutdown: immediately drop the task and complete the JoinHandle
with an error.
The block_on
method does not return result. The only option I see is for it to
panic when the runtime has seen an unhandled panic.
To compensate, we could add methods on Runtime
to query the runtime state,
e.g. Runtime::status() -> Running | Shutdown | UnhandledPanic | ...
Initial implementation
As an initial step to get the feature going. I suggest implementing an MVP
version of the feature as an unstable API and only for the current_thread
runtime. This would let us explore the space more and try things out. The
initial implementation could also start by only letting the user pick between
the current behavior and ShutdownRuntime
. So:
#[non_exhaustive]
enum UnhandledPanic {
Ignore,
ShutdownRuntime,
}
type PanicError = Box<dyn Any + Send + 'static>;
impl runtime::Builder {
fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }
}
When the multi-threaded runtime is selected, these option would have no effect.
Implementing for the multi-threaded runtime would be required before stabilizing
the API but because the implementation is much harder, we should first gather data.
Open questions
- How should unhandled panics be propagated? Should they be sent to
block_on
or theJoinHandle
(ref: rt: provide options to configure unhandled panic behavior #4516). - How should
LocalSet
andJoinSet
work. Should they track their own settings or inherit from the runtime? Should there be aLocalSet::builder()
?
Known issues
- Switching the "current' scheduler context then panicking (Add LocalSet::enter #4765 (comment)). In this case, "current" does not reference the runtime that should intercept the panic.