Skip to content

Commit 7a45b8a

Browse files
committed
docs: add encyclopedia page
1 parent 62903ab commit 7a45b8a

6 files changed

Lines changed: 146 additions & 56 deletions

File tree

docs/develop/python/data-handling/index.mdx

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,31 @@ id: data-handling
33
title: Data handling - Python SDK
44
sidebar_label: Data handling
55
slug: /develop/python/data-handling
6-
description: Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large payload storage.
7-
toc_max_heading_level: 2
6+
description:
7+
Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large
8+
payload storage.
9+
toc_max_heading_level: 3
810
tags:
911
- Python SDK
1012
- Temporal SDKs
1113
- Data Converters
1214
---
1315

14-
All data sent to and from the Temporal Service passes through the **Data Converter**.
15-
The Data Converter has three layers that handle different concerns:
16+
All data sent to and from the Temporal Service passes through the **Data Converter**. The Data Converter has three
17+
layers that handle different concerns:
1618

1719
```
1820
Application data → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service
1921
```
2022

21-
Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON serialization. The PayloadCodec and ExternalStorage layers are optional.
23+
Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON
24+
serialization. The PayloadCodec and ExternalStorage layers are optional. You only need to customize these layers when
25+
your application requires non-JSON types, encryption, or payload offloading.
2226

23-
| | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) |
24-
| --- | --- | --- | --- |
25-
| **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store |
26-
| **Must be deterministic** | Yes | No | No |
27-
| **Default** | JSON serialization | None (passthrough) | None (passthrough) |
28-
29-
By default, Temporal uses JSON serialization with no codec and no external storage.
30-
You only need to customize these layers when your application requires non-JSON types, encryption, or payload offloading.
27+
| | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) |
28+
| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
29+
| **Purpose** | Serialize application data to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store |
30+
| **Must be deterministic** | Yes | No | No |
31+
| **Default** | JSON serialization | None (passthrough) | None (passthrough) |
3132

3233
For a deeper conceptual explanation, see the [Data Conversion encyclopedia](/dataconversion).

docs/develop/python/data-handling/large-payload-storage.mdx

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,34 +3,39 @@ id: large-payload-storage
33
title: Large payload storage - Python SDK
44
sidebar_label: Large payload storage
55
slug: /develop/python/data-handling/large-payload-storage
6-
toc_max_heading_level: 2
6+
toc_max_heading_level: 3
77
tags:
88
- Python SDK
99
- Temporal SDKs
1010
- Data Converters
1111
description: Offload large payloads to external storage using the claim check pattern in the Python SDK.
1212
---
1313

14-
The Temporal Service enforces a ~2 MB per payload limit.
15-
When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead.
16-
This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern).
14+
The Temporal Service enforces a ~2 MB per payload limit. When your Workflows or Activities handle data larger than the
15+
limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the event
16+
history instead. This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern).
1717

1818
External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec:
1919

2020
```
2121
User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service
2222
```
2323

24-
When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference.
25-
Payloads below the threshold stay inline in the event history.
26-
On the way back, reference payloads are retrieved from external storage before the codec decodes them.
24+
When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external
25+
store and replaces it with a lightweight reference. Payloads below the threshold stay inline in the event history. On
26+
the way back, reference payloads are retrieved from external storage before the codec decodes them.
2727

28-
Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store.
28+
Because external storage runs after the codec. If you use an encryption codec, payloads are already encrypted before
29+
they're uploaded to your store.
2930

3031
## Store and retrieve large payloads using external storage
3132

32-
To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`.
33-
The driver needs a `store()` method to upload payloads and a `retrieve()` method to fetch them back.
33+
To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`. The driver needs a
34+
`store()` method to upload payloads and a `retrieve()` method to fetch them back.
35+
36+
Once you implement a storage driver, configure it on your `DataConverter` and use it when creating your Client and
37+
Worker. All Workflows and Activities running on the Worker will use the storage drive automatically without changes to
38+
your business logic. You can also configure the size threshold and use multiple storage drivers.
3439

3540
### Implement a storage driver
3641

@@ -66,8 +71,9 @@ class S3StorageDriver(StorageDriver):
6671

6772
### Store payloads
6873

69-
The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload.
70-
A claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or URL.
74+
The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload. A
75+
claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or
76+
URL.
7177

7278
```python
7379
Sample implementation:
@@ -109,8 +115,8 @@ converter = DataConverter(
109115

110116
### Adjust the size threshold
111117

112-
The `payload_size_threshold` controls which payloads get offloaded.
113-
Payloads smaller than this value stay inline in the event history.
118+
The `payload_size_threshold` controls which payloads get offloaded. Payloads smaller than this value stay inline in the
119+
event history.
114120

115121
```python
116122
ExternalStorage(
@@ -123,7 +129,8 @@ Set it to `None` to externalize all payloads regardless of size.
123129

124130
### Use multiple storage drivers
125131

126-
When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that chooses which driver handles each payload:
132+
When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that
133+
chooses which driver handles each payload:
127134

128135
```python
129136
hot_driver = S3StorageDriver("hot-bucket")

docs/encyclopedia/data-conversion/dataconversion.mdx

Lines changed: 33 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
id: dataconversion
33
title: How does Temporal handle application data?
44
sidebar_label: Data conversion
5-
description: This guide explores Data Converters in the Temporal Platform, detailing how they handle serialization and encoding for Workflow inputs and outputs, ensuring data stays secure and manageable.
5+
description:
6+
This guide explores Data Converters in the Temporal Platform, detailing how they handle serialization and encoding for
7+
Workflow inputs and outputs, ensuring data stays secure and manageable.
68
slug: /dataconversion
79
toc_max_heading_level: 4
810
keywords:
@@ -23,25 +25,31 @@ import { CaptionedImage } from '@site/src/components';
2325

2426
This guide provides an overview of data handling using a Data Converter on the Temporal Platform.
2527

26-
Data Converters in Temporal are SDK components that handle the serialization and encoding of data entering and exiting a Temporal Service.
27-
Workflow inputs and outputs need to be serialized and deserialized so they can be sent as JSON to a Temporal Service.
28+
Data Converters in Temporal are SDK components that handle the serialization and encoding of data entering and exiting a
29+
Temporal Service. Workflow inputs and outputs need to be serialized and deserialized so they can be sent as JSON to a
30+
Temporal Service.
2831

29-
<CaptionedImage
30-
src="/diagrams/default-data-converter.svg"
31-
title="Data Converter encodes and decodes data"
32-
/>
32+
<CaptionedImage src="/diagrams/default-data-converter.svg" title="Data Converter encodes and decodes data" />
3333

34-
The Data Converter encodes data from your application to a [Payload](/dataconversion#payload) before it is sent to the Temporal Service in the Client call.
35-
When the Temporal Server sends the encoded data back to the Worker, the Data Converter decodes it for processing within your application.
36-
This ensures that all your sensitive data exists in its original format only on hosts that you control.
34+
The Data Converter encodes data from your application to a [Payload](/dataconversion#payload) before it is sent to the
35+
Temporal Service in the Client call. When the Temporal Server sends the encoded data back to the Worker, the Data
36+
Converter decodes it for processing within your application. This ensures that all your sensitive data exists in its
37+
original format only on hosts that you control.
3738

38-
Data Converter steps are followed when data is sent to a Temporal Service (as input to a Workflow) and when it is returned from a Workflow (as output).
39-
Due to how Temporal provides access to Workflow output, this implementation is asymmetric:
39+
Data Converter steps are followed when data is sent to a Temporal Service (as input to a Workflow) and when it is
40+
returned from a Workflow (as output). Due to how Temporal provides access to Workflow output, this implementation is
41+
asymmetric:
4042

41-
- Data encoding is performed automatically using the default converter provided by Temporal or your custom Data Converter when passing input to a Temporal Service. For example, plain text input is usually serialized into a JSON object.
42-
- Data decoding may be performed by your application logic during your Workflows or Activities as necessary, but decoded Workflow results are never persisted back to the Temporal Service. Instead, they are stored encoded on the Temporal Service, and you need to provide an additional parameter when using [`temporal workflow show`](/cli/workflow#show) or when browsing the Web UI to view output.
43+
- Data encoding is performed automatically using the default converter provided by Temporal or your custom Data
44+
Converter when passing input to a Temporal Service. For example, plain text input is usually serialized into a JSON
45+
object.
46+
- Data decoding may be performed by your application logic during your Workflows or Activities as necessary, but decoded
47+
Workflow results are never persisted back to the Temporal Service. Instead, they are stored encoded on the Temporal
48+
Service, and you need to provide an additional parameter when using [`temporal workflow show`](/cli/workflow#show) or
49+
when browsing the Web UI to view output.
4350

44-
Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which consists of binary data and key-value metadata.
51+
Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which
52+
consists of binary data and key-value metadata.
4553

4654
For details, see the API references:
4755

@@ -52,10 +60,16 @@ For details, see the API references:
5260

5361
### What is a Payload? {#payload}
5462

55-
A [Payload](https://api-docs.temporal.io/#temporal.api.common.v1.Payload) represents binary data such as input and output from Activities and Workflows.
56-
Payloads also contain metadata that describe their data type or other parameters for use by custom encoders/converters.
63+
A [Payload](https://api-docs.temporal.io/#temporal.api.common.v1.Payload) represents binary data such as input and
64+
output from Activities and Workflows. Payloads also contain metadata that describe their data type or other parameters
65+
for use by custom encoders/converters.
5766

58-
When processed through the SDK, the [default Data Converter](/default-custom-data-converters#default-data-converter) serializes your data/value to a Payload before sending it to the Temporal Server.
59-
The default Data Converter processes supported type values to Payloads. You can create a custom [Payload Converter](/payload-converter) to apply different conversion steps.
67+
When processed through the SDK, the [default Data Converter](/default-custom-data-converters#default-data-converter)
68+
serializes your data/value to a Payload before sending it to the Temporal Server. The default Data Converter processes
69+
supported type values to Payloads. You can create a custom [Payload Converter](/payload-converter) to apply different
70+
conversion steps.
6071

6172
You can additionally apply [custom codecs](/payload-codec), such as for encryption or compression, on your Payloads.
73+
74+
When Payloads are too large for the Temporal Service's ~2 MB limit, you can use [External Storage](/external-storage) to
75+
offload them to an external store like S3 and keep only a reference in the Event History.
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
id: external-storage
3+
title: External Storage
4+
sidebar_label: External Storage
5+
description:
6+
External Storage offloads large payloads to an external store like S3, keeping only a small reference in the event
7+
history.
8+
slug: /external-storage
9+
toc_max_heading_level: 4
10+
keywords:
11+
- external-storage
12+
- storage-driver
13+
- large-payloads
14+
- claim-check
15+
- data-converters
16+
- payloads
17+
tags:
18+
- Concepts
19+
- Data Converters
20+
---
21+
22+
The Temporal Service enforces a ~2 MB per-payload limit. When your Workflows or Activities handle data larger than this
23+
limit, you can use External Storage to offload payloads to an external store (such as S3) and pass a small reference
24+
token through the Event History instead. This is sometimes called the
25+
[claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern).
26+
27+
## How External Storage fits in the data pipeline {#data-pipeline}
28+
29+
External Storage sits at the end of the data pipeline, after both the [Payload Converter](/payload-converter) and the
30+
[Payload Codec](/payload-codec):
31+
32+
```
33+
User code → Payload Converter → Payload Codec → External Storage → Temporal Service
34+
```
35+
36+
When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces
37+
it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, the
38+
codec receives reference payloads from external storage before decoding them.
39+
40+
Because External Storage runs after the Payload Codec, if you use an encryption codec, payloads are already encrypted
41+
before they're uploaded to your store.
42+
43+
## Storage drivers
44+
45+
A Storage Driver is the part you implement to connect External Storage to your backing store. Each driver provides two
46+
operations:
47+
48+
- **Store**. Upload payloads and return a claim, which is a set of key-value pairs the driver uses to locate the payload
49+
later.
50+
- **Retrieve**. Download payloads using the claims that `store` produced.
51+
52+
You can configure multiple storage drivers and use a selector function to route payloads to different drivers based on
53+
size, type, or other criteria such as hot and cold storage tiers.
54+
55+
## Configuration
56+
57+
Configure External Storage on the Data Converter. The key settings are:
58+
59+
- **Drivers**. One or more storage driver implementations.
60+
- **Size threshold**. The driver offloads payloads larger than this value, which typically defaults to 256 KiB. Turn off
61+
the threshold to externalize all payloads regardless of size.
62+
- **Driver selector**. When using multiple drivers, a function that chooses which driver handles each payload.
63+
64+
For SDK-specific implementation details, see:
65+
66+
- [Python SDK: Large payload storage](/develop/python/data-handling/large-payload-storage)
67+
- [TypeScript SDK: Large payload storage](/develop/typescript/data-handling/large-payload-storage)

docs/troubleshooting/blob-size-limit-error.mdx

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,22 +29,22 @@ To resolve this error, reduce the size of the blob so that it is within the 4 MB
2929

3030
There are multiple strategies you can use to avoid this error:
3131

32-
1. Use compression with a [custom payload codec](/payload-codec) for large payloads.
32+
1. Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The Temporal SDKs support this natively through the claim check pattern: when a payload exceeds a size threshold, a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider implementing External Storage if their size could grow over time.
3333

34-
- This addresses the immediate issue of the blob size limit; however, if blob sizes continue to grow this problem can arise again.
34+
For SDK-specific guides, see:
35+
- [Python: Large payload storage](/develop/python/data-handling/large-payload-storage)
36+
- [TypeScript: Large payload storage](/develop/typescript/data-handling/large-payload-storage)
3537

36-
2. Break larger batches of commands into smaller batch sizes:
38+
2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, but if payload sizes continue to grow, the problem can arise again.
39+
40+
3. Break larger batches of commands into smaller batch sizes:
3741

3842
- Workflow-level batching:
39-
1. Modify the Workflow to process Activities or Child Workflows into smaller batches.
43+
1. Change the Workflow to process Activities or Child Workflows into smaller batches.
4044
2. Iterate through each batch, waiting for completion before moving to the next.
4145
- Workflow Task-level batching:
4246
1. Execute Activities in smaller batches within a single Workflow Task.
43-
2. Introduce brief pauses or sleeps (for example, 1ms) between batches.
44-
45-
3. Consider offloading large payloads to an object store to reduce the risk of exceeding blob size limits:
46-
1. Pass references to the stored payloads within the Workflow instead of the actual data.
47-
2. Retrieve the payloads from the object store when needed during execution.
47+
2. Introduce brief pauses or sleeps between batches.
4848

4949
## Workflow termination due to oversized response
5050

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,7 @@ module.exports = {
906906
'encyclopedia/data-conversion/failure-converter',
907907
'encyclopedia/data-conversion/remote-data-encoding',
908908
'encyclopedia/data-conversion/codec-server',
909+
'encyclopedia/data-conversion/external-storage',
909910
'encyclopedia/data-conversion/key-management',
910911
],
911912
},

0 commit comments

Comments
 (0)