Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 121 additions & 136 deletions src/oss/langgraph/persistence.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Persistence



LangGraph has a built-in persistence layer, implemented through checkpointers. When you compile a graph with a checkpointer, the checkpointer saves a `checkpoint` of the graph state at every super-step. Those checkpoints are saved to a `thread`, which can be accessed after graph execution. Because `threads` allow access to graph's state after execution, several powerful capabilities including human-in-the-loop, memory, time travel, and fault-tolerance are all possible. Below, we'll discuss each of these concepts in more detail.
LangGraph has a built-in persistence layer that saves graph state as checkpoints. When you compile a graph with a checkpointer, a snapshot of the graph state is saved at every step of execution, organized into threads. This enables human-in-the-loop workflows, conversational memory, time travel debugging, and fault-tolerant execution.

![Checkpoints](/oss/images/checkpoints.jpg)

Expand All @@ -13,7 +13,19 @@ LangGraph has a built-in persistence layer, implemented through checkpointers. W
When using the [Agent Server](/langsmith/agent-server), you don't need to implement or configure checkpointers manually. The server handles all persistence infrastructure for you behind the scenes.
</Info>

## Threads
## Why use persistence

Persistence is required for the following features:

- **Human-in-the-loop**: Checkpointers facilitate [human-in-the-loop workflows](/oss/langgraph/interrupts) by allowing humans to inspect, interrupt, and approve graph steps. Checkpointers are needed for these workflows as the person has to be able to view the state of a graph at any point in time, and the graph has to be able to resume execution after the person has made any updates to the state. See [Interrupts](/oss/langgraph/interrupts) for examples.
- **Memory**: Checkpointers allow for ["memory"](/oss/concepts/memory) between interactions. In the case of repeated human interactions (like conversations) any follow up messages can be sent to that thread, which will retain its memory of previous ones. See [Add memory](/oss/langgraph/add-memory) for information on how to add and manage conversation memory using checkpointers.
- **Time travel**: Checkpointers allow for ["time travel"](/oss/langgraph/use-time-travel), allowing users to replay prior graph executions to review and / or debug specific graph steps. In addition, checkpointers make it possible to fork the graph state at arbitrary checkpoints to explore alternative trajectories.
- **Fault-tolerance**: Checkpointing provides fault-tolerance and error recovery: if one or more nodes fail at a given superstep, you can restart your graph from the last successful step.
- **Pending writes**: When a graph node fails mid-execution at a given [super-step](#super-steps), LangGraph stores pending checkpoint writes from any other nodes that completed successfully at that super-step. When you resume graph execution from that super-step you don't re-run the successful nodes.

## Core concepts

### Threads

A thread is a unique ID or thread identifier assigned to each checkpoint saved by a checkpointer. It contains the accumulated state of a sequence of [runs](/langsmith/assistants#execution). When a run is executed, the [state](/oss/langgraph/graph-api#state) of the underlying graph of the assistant will be persisted to the thread.

Expand All @@ -39,15 +51,13 @@ A thread's current and historical state can be retrieved. To persist state, a th

The checkpointer uses `thread_id` as the primary key for storing and retrieving checkpoints. Without it, the checkpointer cannot save state or resume execution after an [interrupt](/oss/langgraph/interrupts), since the checkpointer uses `thread_id` to load the saved state.

## Checkpoints
### Checkpoints

The state of a thread at a particular point in time is called a checkpoint. Checkpoint is a snapshot of the graph state saved at each super-step and is represented by `StateSnapshot` object with the following key properties:
The state of a thread at a particular point in time is called a checkpoint. A checkpoint is a snapshot of the graph state saved at each [super-step](#super-steps) and is represented by a `StateSnapshot` object (see [StateSnapshot fields](#statesnapshot-fields) for the full field reference).

* `config`: Config associated with this checkpoint.
* `metadata`: Metadata associated with this checkpoint.
* `values`: Values of the state channels at this point in time.
* `next` A tuple of the node names to execute next in the graph.
* `tasks`: A tuple of `PregelTask` objects that contain information about next tasks to be executed. If the step was previously attempted, it will include error information. If a graph was interrupted [dynamically](/oss/langgraph/interrupts#pause-using-interrupt) from within a node, tasks will contain additional data associated with interrupts.
#### Super-steps

LangGraph created a checkpoint at each **super-step** boundary. A super-step is a single "tick" of the graph where all nodes scheduled for that step execute (potentially in parallel). For a sequential graph like `START -> A -> B -> END`, there are separate super-steps for the input, node A, and node B — producing a checkpoint after each one. Understanding super-step boundaries is important for [time travel](/oss/langgraph/use-time-travel), because you can only resume execution from a checkpoint (i.e., a super-step boundary).

Checkpoints are persisted and can be used to restore the state of a thread at a later time.

Expand Down Expand Up @@ -145,6 +155,40 @@ After we run the graph, we expect to see exactly 4 checkpoints:
Note that the `bar` channel values contain outputs from both nodes as we have a reducer for the `bar` channel.
:::

#### Checkpoint namespace

Each checkpoint has a `checkpoint_ns` (checkpoint namespace) field that identifies which graph or subgraph it belongs to:

- **`""`** (empty string): The checkpoint belongs to the parent (root) graph.
- **`"node_name:uuid"`**: The checkpoint belongs to a subgraph invoked as the given node. For nested subgraphs, namespaces are joined with `|` separators (e.g., `"outer_node:uuid|inner_node:uuid"`).

You can access the checkpoint namespace from within a node via the config:

:::python
```python
from langchain_core.runnables import RunnableConfig

def my_node(state: State, config: RunnableConfig):
checkpoint_ns = config["configurable"]["checkpoint_ns"]
# "" for the parent graph, "node_name:uuid" for a subgraph
```
:::

:::js
```typescript
import { RunnableConfig } from "@langchain/core/runnables";

function myNode(state: typeof State.Type, config: RunnableConfig) {
const checkpointNs = config.configurable?.checkpoint_ns;
// "" for the parent graph, "node_name:uuid" for a subgraph
}
```
:::

See [Subgraphs](/oss/langgraph/use-subgraphs) for more details on working with subgraph state and checkpoints.

## Get and update state

### Get state

:::python
Expand Down Expand Up @@ -227,6 +271,36 @@ StateSnapshot {
```
:::

#### StateSnapshot fields

:::python

| Field | Type | Description |
|-------|------|-------------|
| `values` | `dict` | State channel values at this checkpoint. |
| `next` | `tuple[str, ...]` | Node names to execute next. Empty `()` means the graph is complete. |
| `config` | `dict` | Contains `thread_id`, `checkpoint_ns`, and `checkpoint_id`. |
| `metadata` | `dict` | Execution metadata. Contains `source` (`"input"`, `"loop"`, or `"update"`), `writes` (node outputs), and `step` (super-step counter). |
| `created_at` | `str` | ISO 8601 timestamp of when this checkpoint was created. |
| `parent_config` | `dict \| None` | Config of the previous checkpoint. `None` for the first checkpoint. |
| `tasks` | `tuple[PregelTask, ...]` | Tasks to execute at this step. Each task has `id`, `name`, `error`, `interrupts`, and optionally `state` (subgraph snapshot, when using `subgraphs=True`). |

:::

:::js

| Field | Type | Description |
|-------|------|-------------|
| `values` | `object` | State channel values at this checkpoint. |
| `next` | `string[]` | Node names to execute next. Empty `[]` means the graph is complete. |
| `config` | `object` | Contains `thread_id`, `checkpoint_ns`, and `checkpoint_id`. |
| `metadata` | `object` | Execution metadata. Contains `source` (`"input"`, `"loop"`, or `"update"`), `writes` (node outputs), and `step` (super-step counter). |
| `createdAt` | `string` | ISO 8601 timestamp of when this checkpoint was created. |
| `parentConfig` | `object \| null` | Config of the previous checkpoint. `null` for the first checkpoint. |
| `tasks` | `PregelTask[]` | Tasks to execute at this step. Each task has `id`, `name`, `error`, `interrupts`, and optionally `state` (subgraph snapshot, when using `subgraphs: true`). |

:::

### Get state history

:::python
Expand Down Expand Up @@ -420,142 +494,74 @@ In our example, the output of `getStateHistory` will look like this:

![State](/oss/images/get_state.jpg)

### Replay

It's also possible to play-back a prior graph execution. If we `invoke` a graph with a `thread_id` and a `checkpoint_id`, then we will _re-play_ the previously executed steps _before_ a checkpoint that corresponds to the `checkpoint_id`, and only execute the steps _after_ the checkpoint.
#### Find a specific checkpoint

* `thread_id` is the ID of a thread.
* `checkpoint_id` is an identifier that refers to a specific checkpoint within a thread.

You must pass these when invoking the graph as part of the `configurable` portion of the config:
You can filter the state history to find checkpoints matching specific criteria:

:::python
```python
config = {"configurable": {"thread_id": "1", "checkpoint_id": "0c62ca34-ac19-445d-bbb0-5b4984975b2a"}}
graph.invoke(None, config=config)
```
:::

:::js
```typescript
const config = {
configurable: {
thread_id: "1",
checkpoint_id: "0c62ca34-ac19-445d-bbb0-5b4984975b2a",
},
};
await graph.invoke(null, config);
```
:::

Importantly, LangGraph knows whether a particular step has been executed previously. If it has, LangGraph simply _re-plays_ that particular step in the graph and does not re-execute the step, but only for the steps _before_ the provided `checkpoint_id`. All of the steps _after_ `checkpoint_id` will be executed (i.e., a new fork), even if they have been executed previously. See this [how to guide on time-travel to learn more about replaying](/oss/langgraph/use-time-travel).

![Replay](/oss/images/re_play.png)

### Update state

:::python
In addition to re-playing the graph from specific `checkpoints`, we can also _edit_ the graph state. We do this using @[`update_state`]. This method accepts three different arguments:
:::

:::js
In addition to re-playing the graph from specific `checkpoints`, we can also _edit_ the graph state. We do this using `graph.updateState()`. This method accepts three different arguments:
:::

#### `config`
history = list(graph.get_state_history(config))

The config should contain `thread_id` specifying which thread to update. When only the `thread_id` is passed, we update (or fork) the current state. Optionally, if we include `checkpoint_id` field, then we fork that selected checkpoint.
# Find the checkpoint before a specific node executed
before_node_b = next(s for s in history if s.next == ("node_b",))

#### `values`
# Find a checkpoint by step number
step_2 = next(s for s in history if s.metadata["step"] == 2)

These are the values that will be used to update the state. Note that this update is treated exactly as any update from a node is treated. This means that these values will be passed to the [reducer](/oss/langgraph/graph-api#reducers) functions, if they are defined for some of the channels in the graph state. This means that @[`update_state`] does NOT automatically overwrite the channel values for every channel, but only for the channels without reducers. Let's walk through an example.
# Find checkpoints created by update_state
forks = [s for s in history if s.metadata["source"] == "update"]

Let's assume you have defined the state of your graph with the following schema (see full example above):

:::python
```python
from typing import Annotated
from typing_extensions import TypedDict
from operator import add

class State(TypedDict):
foo: int
bar: Annotated[list[str], add]
# Find the checkpoint where an interrupt occurred
interrupted = next(
s for s in history
if s.tasks and any(t.interrupts for t in s.tasks)
)
```
:::

:::js
```typescript
import { StateSchema, ReducedValue } from "@langchain/langgraph";
import * as z from "zod";
const history: StateSnapshot[] = [];
for await (const state of graph.getStateHistory(config)) {
history.push(state);
}

const State = new StateSchema({
foo: z.number(),
bar: new ReducedValue(
z.array(z.string()).default(() => []),
{
inputSchema: z.array(z.string()),
reducer: (x, y) => x.concat(y),
}
),
});
```
:::
// Find the checkpoint before a specific node executed
const beforeNodeB = history.find((s) => s.next.includes("nodeB"));

Let's now assume the current state of the graph is
// Find a checkpoint by step number
const step2 = history.find((s) => s.metadata.step === 2);

:::python
```
{"foo": 1, "bar": ["a"]}
```
:::
// Find checkpoints created by updateState
const forks = history.filter((s) => s.metadata.source === "update");

:::js
```typescript
{ foo: 1, bar: ["a"] }
// Find the checkpoint where an interrupt occurred
const interrupted = history.find(
(s) => s.tasks.length > 0 && s.tasks.some((t) => t.interrupts.length > 0)
);
```
:::

If you update the state as below:
### Replay

:::python
```python
graph.update_state(config, {"foo": 2, "bar": ["b"]})
```
:::
Replay re-executes steps from a prior checkpoint. Invoke the graph with a prior `checkpoint_id` to re-run nodes after that checkpoint. Nodes before the checkpoint are skipped (their results are already saved). Nodes after the checkpoint re-execute, including any LLM calls, API requests, or [interrupts](/oss/langgraph/interrupts) — which are always re-triggered during replay.

:::js
```typescript
await graph.updateState(config, { foo: 2, bar: ["b"] });
```
:::
See [Time travel](/oss/langgraph/use-time-travel) for full details and code examples on replaying past executions.

![Replay](/oss/images/re_play.png)

Then the new state of the graph will be:
### Update state

:::python
```
{"foo": 2, "bar": ["a", "b"]}
```
You can edit the graph state using @[`update_state`]. This creates a new checkpoint with the updated values — it does not modify the original checkpoint. The update is treated the same as a node update: values are passed through [reducer](/oss/langgraph/graph-api#reducers) functions when defined, so channels with reducers _accumulate_ values rather than overwrite them.

The `foo` key (channel) is completely changed (because there is no reducer specified for that channel, so @[`update_state`] overwrites it). However, there is a reducer specified for the `bar` key, and so it appends `"b"` to the state of `bar`.
You can optionally specify `as_node` to control which node the update is treated as coming from, which affects which node executes next. See [Time travel: `as_node`](/oss/langgraph/use-time-travel#control-which-node-runs-next-with-as_node) for details.
:::

:::js
```typescript
{ foo: 2, bar: ["a", "b"] }
```
You can edit the graph state using `graph.updateState()`. This creates a new checkpoint with the updated values — it does not modify the original checkpoint. The update is treated the same as a node update: values are passed through [reducer](/oss/langgraph/graph-api#reducers) functions when defined, so channels with reducers _accumulate_ values rather than overwrite them.

The `foo` key (channel) is completely changed (because there is no reducer specified for that channel, so `updateState` overwrites it). However, there is a reducer specified for the `bar` key, and so it appends `"b"` to the state of `bar`.
:::

#### `as_node`

:::python
The final thing you can optionally specify when calling @[`update_state`] is `as_node`. If you provided it, the update will be applied as if it came from node `as_node`. If `as_node` is not provided, it will be set to the last node that updated the state, if not ambiguous. The reason this matters is that the next steps to execute depend on the last node to have given an update, so this can be used to control which node executes next. See this [how to guide on time-travel to learn more about forking state](/oss/langgraph/use-time-travel).
:::

:::js
The final thing you can optionally specify when calling `updateState` is `asNode`. If you provide it, the update will be applied as if it came from node `asNode`. If `asNode` is not provided, it will be set to the last node that updated the state, if not ambiguous. The reason this matters is that the next steps to execute depend on the last node to have given an update, so this can be used to control which node executes next. See this [how to guide on time-travel to learn more about forking state](/oss/langgraph/use-time-travel).
You can optionally specify `asNode` to control which node the update is treated as coming from, which affects which node executes next. See [Time travel: `asNode`](/oss/langgraph/use-time-travel#control-which-node-runs-next-with-as_node) for details.
:::

![Update](/oss/images/checkpoints_full_story.jpg)
Expand Down Expand Up @@ -1159,24 +1165,3 @@ checkpointer.setup()
When running on LangSmith, encryption is automatically enabled whenever `LANGGRAPH_AES_KEY` is present, so you only need to provide the environment variable. Other encryption schemes can be used by implementing @[`CipherProtocol`] and supplying it to @[`EncryptedSerializer`].

:::
## Capabilities

### Human-in-the-loop

First, checkpointers facilitate [human-in-the-loop workflows](/oss/langgraph/interrupts) by allowing humans to inspect, interrupt, and approve graph steps. Checkpointers are needed for these workflows as the human has to be able to view the state of a graph at any point in time, and the graph has to be to resume execution after the human has made any updates to the state. See [the how-to guides](/oss/langgraph/interrupts) for examples.

### Memory

Second, checkpointers allow for ["memory"](/oss/concepts/memory) between interactions. In the case of repeated human interactions (like conversations) any follow up messages can be sent to that thread, which will retain its memory of previous ones. See [Add memory](/oss/langgraph/add-memory) for information on how to add and manage conversation memory using checkpointers.

### Time travel

Third, checkpointers allow for ["time travel"](/oss/langgraph/use-time-travel), allowing users to replay prior graph executions to review and / or debug specific graph steps. In addition, checkpointers make it possible to fork the graph state at arbitrary checkpoints to explore alternative trajectories.

### Fault-tolerance

Lastly, checkpointing also provides fault-tolerance and error recovery: if one or more nodes fail at a given superstep, you can restart your graph from the last successful step. Additionally, when a graph node fails mid-execution at a given superstep, LangGraph stores pending checkpoint writes from any other nodes that completed successfully at that superstep, so that whenever we resume graph execution from that superstep we don't re-run the successful nodes.

#### Pending writes

Additionally, when a graph node fails mid-execution at a given superstep, LangGraph stores pending checkpoint writes from any other nodes that completed successfully at that superstep, so that whenever we resume graph execution from that superstep we don't re-run the successful nodes.
Loading