-
Notifications
You must be signed in to change notification settings - Fork 239
Description
Description
I have an instance of Enterprise Gateway deployed that launches Yarn-based kernels which generally take between 3 to 5 minutes to fully start up and connect back to a JupyterLab client. Often when kernels are in the middle of starting we see the following uncaught exception in the GET api/kernels endpoint:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/tornado/web.py", line 1713, in _execute
result = await result
File "/opt/conda/lib/python3.10/site-packages/enterprise_gateway/services/kernels/handlers.py", line 122, in get
await super().get()
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/handlers.py", line 47, in get
kernels = await ensure_async(km.list_kernels())
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 490, in list_kernels
model = self.kernel_model(kernel_id)
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 476, in kernel_model
"last_activity": isoformat(kernel.last_activity),
AttributeError: 'RemoteKernelManager' object has no attribute 'last_activity'I found an issue: jupyter/notebook#5345 that led me to my working understanding of the problem:
- kernel_manager instances don't initialize a
last_activityfield in their constructor, rather it is added later in the kernel start method. - since our kernels start slow, there is a larger period of time where 1. is true, and if any concurrent
api/kernelsrequests come through, EG does not handle these starting kernels gracefully.
Can anyone with deeper understanding of the class hierarchies confirm this understanding?
Naively, I would think last_activity could be set to utcnow() in the constructor to get around this, but perhaps there is a valid reason for not doing so?
Reproduce
- Connect JupyterLab (I've tested using 3.6.x) to an instance of Enterprise Gateway
- Start a kernel or a few, these should be slow starting kernels, adding a time.sleep() to the launch script would suffice I think
- Hammer the GET
/api/kernelsseparately while the kernels are starting
Expected behavior
I would expect EG to either return a list of kernels with starting kernels in some designated starting state, or just keep them out of the list of kernels in that api response.
Context
- Operating System and version: linux (containerized) for both EG and JupyterLab
- Browser and version:
- Jupyter Server version: Lab and EG both using jupyter_server 1.24.0
Troubleshoot Output
Paste the output from running `jupyter troubleshoot` from the command line here. You may want to sanitize the paths in the output.``` $PATH: /opt/conda/bin /opt/conda/condabin /opt/conda/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin
sys.path:
/opt/conda/bin
/opt/conda/lib/python310.zip
/opt/conda/lib/python3.10
/opt/conda/lib/python3.10/lib-dynload
/opt/conda/lib/python3.10/site-packages
sys.executable:
/opt/conda/bin/python
sys.version:
3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
platform.platform():
Linux-6.1.82-talos-x86_64-with-glibc2.35
which -a jupyter:
/opt/conda/bin/jupyter
/opt/conda/bin/jupyter
</details>
<details><summary>Command Line Output</summary>
<pre>
Paste the output from your command line running `jupyter lab` here, use `--debug` if possible.
</pre>
</details>
<details><summary>Browser Output</summary>
<!--See https://webmasters.stackexchange.com/a/77337 for how to access the JavaScript console-->
<pre>
Paste the output from your browser Javascript console here, if applicable.
</pre>
</details>