Skip to content

Fix async durability checkpoint backlog#7112

Open
Alexxigang wants to merge 3 commits intolangchain-ai:mainfrom
Alexxigang:fix/async-durability-checkpoint-backlog
Open

Fix async durability checkpoint backlog#7112
Alexxigang wants to merge 3 commits intolangchain-ai:mainfrom
Alexxigang:fix/async-durability-checkpoint-backlog

Conversation

@Alexxigang
Copy link

Summary

Fixes unbounded checkpoint task buildup when running graphs with durability="async".

The root cause is that a new checkpoint write can be chained before the previous async checkpoint future has finished, which allows _checkpointer_put_after_previous tasks to accumulate across supersteps when checkpoint persistence is slower than graph execution.

This change keeps the intended one-superstep overlap for async durability, but waits for the in-flight checkpoint future before chaining the next one.

Changes

  • wait for the pending checkpoint future before calling loop.after_tick() in the sync Pregel stream path
  • wait for the pending checkpoint future before calling loop.after_tick() in the async Pregel stream path
  • add regression coverage for:
    • async durability backlog stays bounded
    • async durability still preserves single-superstep overlap
    • sync durability still waits for checkpoint completion
    • exit durability still defers checkpoint work until completion
    • final async state is flushed before returning

Testing

python -m py_compile libs/langgraph/langgraph/pregel/main.py libs/langgraph/tests/test_async_durability.py
python -m pytest --noconftest libs/langgraph/tests/test_async_durability.py -q

Passed locally:

  • 5 passed in 11.80s

Issue

Closes #7094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak: _checkpointer_put_after_previous coroutine chains accumulate with default durability="async"

1 participant