Description
Describe the bug
When running .predict
on a model in a Jupyter notebook cell, the intervals between progress bar output can become so long that the Jupyter kernel decides the cell has finished, and can put the kernel in an idle
(basically, "dead") state after enough inactivity. This is very possible for long-running predictions (say, running overnight), where the user steps away and does not touch Jupyter for many hours. When the Neo4j server finally finishes prediction, the results (say, from .predict.stream
) of the many hours of processing are lost, since the notebook is dead. I suspect the same problem can occur with other long-running GDS operations.
To Reproduce
I don't know if the eventual slow-down in progress bar output happens for all long-running use cases or server configurations. In my case, it usually happens but sometimes not.
GDS version: 2.5.3
Neo4j version: 5.11.0
Operating system: Amazon Linux
My specific Jupyter environment: JupyterLab 4.0.8, Python 3 (ipykernel) kernel, on AWS EC2 with Amazon Linux
Steps to reproduce the behavior:
- Start a long-running (say, 10 hour long)
.predict.stream
operation in a Jupyter cell, and do not touch Jupyter the whole time. - After a good amount of progress, the progress bar output will freeze at some percentage of completion and the kernel will go to
idle
state. - If you
CALL gds.listProgress
in the Browser, you will see that the prediction is still running (if it has not yet completed). - After prediction complets on the server, the (dead) notebook does not not display any new content, and no following cells are executed.
Expected behavior
The Jupyter notebook should never go idle
while any long-running GDS operation is still in progress.
Probably just need to ensure that output is regularly produced (say, every x minutes).