Description
We need an easier opt-out of the behavior introduced in #1186, because most output I'm seeing come from threads is really confusing or even lost, where the previous behavior of sending all thread output to the current cell would be more appropriate. Either a kernel-level opt-out or a per-thread opt-out or both should be required, and we should perhaps reconsider the default behavior.
Two examples that I don't think are that unusual, demonstrating where the current behavior is incorrect and currently impossible to avoid with public APIs:
ipyparallel
For example, IPython Parallel runs its IO in a background thread, and sometimes this produces output (e.g. when streaming output from engines). It produces output in response to direct, blocking action in the main thread, but the output is produced in a long-running background thread. Sending this output to the cell that created the Client is not desirable. Producing the output in the main thread is not particularly feasible either, because it's used in e.g. streaming output in user code:
with async_result.stream_output(): <- this instructs a background thread to start streaming to sys.stdout, etc.
do_blocking_things() <- blocks the main thread, not controlled by ipyparallel
In all situations, the right thing to do for output produced by this thread is to go to the current cell.
The same is true for some log output, e.g. stopping clusters - which again produces output in a background thread as a direct result of synchronized action taken on the main thread, where placing the output near the action taken that triggered the output is less surprising than the far disconnected initial launch of the thread.
ThreadPoolExecutor
Another, fully standard library situation is ThreadPoolExecutor, where outputs from all tasks will go to the initial thread-spawning Cell. It's not just the cell that creates the pool, since spawning threads may actually be deferred until the first task submission requiring the thread, producing this very surprising output order:
which may be related to the joblib issue in #402 which is closed, but appears to actually be unresolved.
I think there is an assumption in #1186 that threads, once spawned, are long running and do not interact with the main thread. This holds for some examples of "fire and forget" type threads, but is definitely not true in general, and I'm not even sure it's true more often than not.
We at least need a way for packages/libraries to indicate that a thread producing output shouldn't be routed to the originating cell.
If we want to be really fiddly and try to guess the right thing to do (as we are doing now with threads, where the guess is often incorrect, if clear and predictable), we could assume that threaded output should go to the current cell if the current cell is blocking while the output is produced. This would definitely do the wrong thing sometimes in the cases where a background thread should route to the originating cell. That seems to be the rare exception, however.
While the async routing is nice and I suspect more robust, I think the guess for threads is more often incorrect than correct, so I think perhaps it should be made opt-in instead of opt-out. At the very least, I think we should make sure that the current thread output routing does not apply to threads created by ThreadPoolExecutor or similar.
Looking at jupyter-widgets/ipywidgets#2358 which uses OutputWidget, allowing OutputWidget to set the thread-local parent header in a sticky way would still solve the motivating issue, even if default print statements were routed to the latest cell, which I think is probably the better default behavior. I think #1186 gave us what we need to allow with OutputWIdget()
to work in a way that persists for the current context (via stream._parent_header.set
).
I don't actually understand how with output_widget
captures output anymore, since apparently setting self.msg_id
is all it does, but presumably with output_widget
should set the parent_header in a way that's persistent for the context within the current thread and not overridden by concurrent executions in the main thread.
What do you think is the right path forward, @krassowski?
Activity