Cancelled error with langgraph runs #1601
-
I have an inconsistent issue as CancelledError().
and
The below is the error trace: |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 23 replies
-
I get the same error when execution time reaches about 60s |
Beta Was this translation helpful? Give feedback.
-
Hi @kedarsp-informa you shouldn't use |
Beta Was this translation helpful? Give feedback.
-
I get the same error when attempting to run a subprocess to install a linter on startup. I am using langgraph cloud. Unsure about what "shouldn't use asyncio.run in add_node" means. this is a very simple graph, which also successfully runs my subprocess to install a linter, but fails afterwards. Deploys successfully without this line: subprocess.run(["bash", "build_linter.sh"], check=True) Registering graph with id 'agent' During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
-
@nfcampos I have this issue too - wrote to support and shared traces. In my case there are no asyncio.run calls - only asyncio.gather (but that one has to be for parallelisation). The rest is async/await. |
Beta Was this translation helpful? Give feedback.
-
Could this issue be related to having no listeners on the stream (as the run it purely a background one)? It seems to occur only under those circumstances on LangGraph Cloud. |
Beta Was this translation helpful? Give feedback.
-
We are also facing this and are not doing any asyncio.run in nodes. Its getting a timeout from redis which is caused by cancelled error
|
Beta Was this translation helpful? Give feedback.
-
I'm running a similar issue with the prebuilt ReAct agent:
My environment:
Unfortunately I've not been to consistently reproduce it. |
Beta Was this translation helpful? Give feedback.
-
I have the similar issue CancelledError('Cancelled by cancel scope 3e9aadaf5ed0', <Task cancelled name='Task-393' coro=<AsyncExitStack.aexit() done, defined at /usr/local/lib/python3.11/contextlib.py:698>>)Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/init.py", line 2080, in astream File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 495, in atick File "/usr/local/lib/python3.11/asyncio/tasks.py", line 428, in wait File "/usr/local/lib/python3.11/asyncio/tasks.py", line 535, in _wait asyncio.exceptions.CancelledError: Cancelled by cancel scope 3e9aadaf5ed0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/init.py", line 2033, in astream File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/loop.py", line 1103, in aexit File "/usr/local/lib/python3.11/contextlib.py", line 698, in aexit asyncio.exceptions.CancelledError: ('Cancelled by cancel scope 3e9aadaf5ed0', <Task cancelled name='Task-393' coro=<AsyncExitStack.aexit() done, defined at /usr/local/lib/python3.11/contextlib.py:698>>) |
Beta Was this translation helpful? Give feedback.
-
We're facing a similar issue.
Similarly to what's reported above; it seems to sometimes trigger when a node takes more than 1 minute to execute. Doesn't seem to do so reliably though. We don't use asyncio.run(). @nfcampos this discussion is marked as resolved but it is not - should we create a new one? |
Beta Was this translation helpful? Give feedback.
-
I had been getting a similar with prebuilt react agent for a while. Adding a step timeout to the compile graph solved this for me.
|
Beta Was this translation helpful? Give feedback.
-
Here is how I increased the timeout for my langgraph code. async def get_message_stream(self, message: Optional[str], base64_image: Optional[str], thread_id: str):
|
Beta Was this translation helpful? Give feedback.
-
You may also want to try setting the env var Some cancellations may be caused because your instance is being restarted due to failed health checks. A common cause of failed health checks is that you have some synchronous processes blocking the main event loop and slowing the server down. We have made a lot of improvements to address this but you may want to keep this in mind if you are seeing inexplicable cancellations in your code today. We are working to add even more isolation in the future. |
Beta Was this translation helpful? Give feedback.
-
I am facing a similar issue.
This is my call code:
Can anyone help me fix this? |
Beta Was this translation helpful? Give feedback.
-
I am facing the same issue when I the execution reaches above 60 second. All the code is asynchronous using Anyone got any ideas on what I could try next? |
Beta Was this translation helpful? Give feedback.
-
I'm facing the same issue, after weeks of development work and finally getting it deployed into an AKS cluster I'm now seeing this error. I've never experienced this locally and neither has any of the team after vigorous testing - just as it's about to get released :-( The graph runs all the way through and then at the end I hit this error about 90% of the time. "('Cancelled by cancel scope 7fdc90710c80', <Task cancelled name='Task-74196' coro=<AsyncExitStack.aexit() done, defined at /usr/local/lib/python3.12/contextlib.py:707>>)" A bit more information: Exception in callback functools.partial(<bound method FuturesDict.on_done of {<Task finished name='create' coro=<arun_with_retry() done, defined at /app/venv/lib/python3.12/site-packages/langgraph/pregel/retry.py:105> result={'messages': ...challenging.'}>: PregelExecutableTask(name='create', This is in a sub-graph which also has a sub-graph. It doesn't always happen but happens enough to not to be able to use it in production. The more I look at this the more it points to a client timeout from the calling code, i.e. our BFF. One for next week I think, if I get to the bottom of it I'll update this. |
Beta Was this translation helpful? Give feedback.
-
We got the same error in our project, we are using fastAPI, not the langgraph platform |
Beta Was this translation helpful? Give feedback.
Hi @kedarsp-informa you shouldn't use
asyncio.run
inadd_node
we have native support for async functions in nodes. Usingasyncio.run
would create a new event loop for each node, which is very inefficient and can cause issues such as cancellation not propagating from graph to node etc