Skip to content

Commit de4f036

Browse files
authored
[doc] Document the Python memory value contract and its divergence from Java (#840)
1 parent d7f8911 commit de4f036

1 file changed

Lines changed: 23 additions & 6 deletions

File tree

docs/content/docs/development/memory/sensory_and_short_term_memory.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,29 @@ The root of the sensory memory and short-term memory is `MemoryObject`. User can
6565

6666
### Supported Value Types
6767

68-
The key of the pairs store in `MemoryObject` must be string, and the value can be follow types
68+
The key of the pairs stored in `MemoryObject` must be a string. The supported value types differ between Java and Python.
69+
70+
**Java** supports a broad set of types:
6971

7072
- **Primitive Types**: integer, float, boolean, string
7173
- **Collections**: list, map
7274
- **Java POJOs**: See [Flink POJOs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#pojos) for details.
73-
- **General Class Types**: Any objects can be serialized by kryo. See [General Class Types](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#general-class-types) for details.
74-
- **Memory Object**: The value can also be a `MemoryObject`, which means user can store nested objects.
75+
- **General Class Types**: Any objects that can be serialized by kryo. See [General Class Types](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#general-class-types) for details.
76+
- **Memory Object**: The value can also be a `MemoryObject`, which means users can store nested objects.
77+
78+
**Python** is restricted to recursively *checkpoint-stable* values:
79+
80+
- **Primitive Types**: `None`, `bool`, `int`, `float`, `str`
81+
- **Collections**: `list`, and `dict` with `str` keys (values are recursively validated)
82+
- **Memory Object**: A nested `MemoryObject` created via `new_object()`.
83+
84+
Anything else — Pydantic models, `uuid.UUID`, `Enum`, custom classes, `tuple`, `set`, or a `dict` with non-`str` keys — is **rejected by `set()` with a `TypeError`**. `bytes` is not supported yet.
85+
86+
This is because Python values are converted across the Pemja boundary into Flink state, and only the types above materialize into native, checkpoint-stable JVM values; other objects would be stored as wrappers that fail on state restore. To store a richer object, materialize it to a primitive form first (e.g. `model.model_dump(mode="json")` for a Pydantic model, or `str(value)` for a UUID) and reconstruct it on read.
87+
88+
{{< hint warning >}}
89+
Python memory values must be checkpoint-stable primitives, unlike the Java contract which also supports POJOs and Kryo-serializable objects. Python values materialize across the Pemja boundary before reaching Flink state, so models and other objects must be materialized first with `model_dump(mode="json")` (or `str(...)`) and reconstructed on read.
90+
{{< /hint >}}
7591

7692
### Read & Write
7793

@@ -86,16 +102,17 @@ def process_event(event: Event, ctx: RunnerContext) -> None:
86102
memory.set("primitive", 123)
87103
# store collection
88104
memory.set("collection", [1, 2, 3])
89-
# store general class types
90-
memory.set("object", Prompt.from_text("the test {content}"))
105+
# store a Pydantic model by materializing it to a checkpoint-stable dict first
106+
memory.set("model", my_model.model_dump(mode="json"))
91107
# store memory object
92108
obj1: MemoryObject = memory.new_object("obj1")
93109
obj1.set("field1", "foo")
94110

95111
# read values from memory
96112
value1: int = memory.get("primitive")
97113
value2: List[int] = memory.get("collection")
98-
value3: Prompt = memory.get("object")
114+
# reconstruct the Pydantic model on read
115+
model: MyModel = MyModel.model_validate(memory.get("model"))
99116
value4: MemoryObject = memory.get("obj1")
100117
value5: str = value4.get("field1")
101118
```

0 commit comments

Comments
 (0)