Skip to content

Commit bc61653

Browse files
authored
Support custom imported module serialization with cloudpickle (#8286)
* Support custom imported module serialization with cloudpickle * Fix styling and modules_to_serialize's prevention of None values * Update the tutorial for saving programs with the new module serialization functionality * Fix code in new saving tutorial and small typos
1 parent 99d0e70 commit bc61653

File tree

2 files changed

+49
-4
lines changed

2 files changed

+49
-4
lines changed

docs/docs/tutorials/saving/index.md

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This guide demonstrates how to save and load your DSPy program. At a high level,
77

88
## State-only Saving
99

10-
State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other informaiton like
10+
State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other information like
1111
the `lm` to use for each `dspy.Predict` in the program. It also includes configurable attributes of other DSPy modules like
1212
`k` for `dspy.retrievers.Retriever`. To save the state of a program, use the `save` method and set `save_program=False`. You can
1313
choose to save the state to a JSON file or a pickle file. We recommend saving the state to a JSON file because it is safer and readable.
@@ -94,7 +94,39 @@ assert str(compiled_dspy_program.signature) == str(loaded_dspy_program.signature
9494
```
9595

9696
With whole program saving, you don't need to recreate the program, but can directly load the architecture along with the state.
97-
You can pick the suitable saviing approach based on your needs.
97+
You can pick the suitable saving approach based on your needs.
98+
99+
### Serializing Imported Modules
100+
101+
When saving a program with `save_program=True`, you might need to include custom modules that your program depends on.
102+
103+
You can specify which custom modules should be serialized with your program by passing them to the `modules_to_serialize`
104+
parameter when calling `save`. This ensures that any dependencies your program relies on are included during serialization and
105+
available when loading the program later.
106+
107+
This uses cloudpickle's `cloudpickle.register_pickle_by_value` function in order to register a module as picklable by value. When
108+
a module is registered this way, cloudpickle will serialize the module by value rather than by reference, ensuring that the
109+
module contents are preserved with the saved program.
110+
111+
For example, if your program uses custom modules:
112+
113+
```python
114+
import dspy
115+
import my_custom_module
116+
117+
compiled_dspy_program = dspy.ChainOfThought(my_custom_module.custom_signature)
118+
119+
# Save the program with the custom module
120+
compiled_dspy_program.save(
121+
"./dspy_program/",
122+
save_program=True,
123+
modules_to_serialize=[my_custom_module]
124+
)
125+
```
126+
127+
This ensures that the required modules are properly serialized and available when loading the program later. Any number of
128+
modules can be passed to `modules_to_serialize`. If you don't specify `modules_to_serialize`, no additional modules will be
129+
registered for serialization.
98130

99131
## Backward Compatibility
100132

@@ -104,4 +136,4 @@ are that loading a saved file in a different version of DSPy will not raise an e
104136
the program was saved.
105137

106138
Starting from `dspy>=2.7`, we will guarantee the backward compatibility of the saved program in major releases, i.e., programs saved in `dspy==2.7.0`
107-
should be loadeable in `dspy==2.7.10`.
139+
should be loadable in `dspy==2.7.10`.

dspy/primitives/module.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ def load_state(self, state):
163163
for name, param in self.named_parameters():
164164
param.load_state(state[name])
165165

166-
def save(self, path, save_program=False):
166+
def save(self, path, save_program=False, modules_to_serialize=None):
167167
"""Save the module.
168168
169169
Save the module to a directory or a file. There are two modes:
@@ -172,6 +172,12 @@ def save(self, path, save_program=False):
172172
- `save_program=True`: Save the whole module to a directory via cloudpickle, which contains both the state and
173173
architecture of the model.
174174
175+
If `save_program=True` and `modules_to_serialize` are provided, it will register those modules for serialization
176+
with cloudpickle's `register_pickle_by_value`. This causes cloudpickle to serialize the module by value rather
177+
than by reference, ensuring the module is fully preserved along with the saved program. This is useful
178+
when you have custom modules that need to be serialized alongside your program. If None, then no modules
179+
will be registered for serialization.
180+
175181
We also save the dependency versions, so that the loaded model can check if there is a version mismatch on
176182
critical dependencies or DSPy version.
177183
@@ -180,6 +186,9 @@ def save(self, path, save_program=False):
180186
and a directory when `save_program=True`.
181187
save_program (bool): If True, save the whole module to a directory via cloudpickle, otherwise only save
182188
the state.
189+
modules_to_serialize (list): A list of modules to serialize with cloudpickle's `register_pickle_by_value`.
190+
If None, then no modules will be registered for serialization.
191+
183192
"""
184193
metadata = {}
185194
metadata["dependency_versions"] = get_dependency_versions()
@@ -198,6 +207,10 @@ def save(self, path, save_program=False):
198207
path.mkdir(parents=True)
199208

200209
try:
210+
modules_to_serialize = modules_to_serialize or []
211+
for module in modules_to_serialize:
212+
cloudpickle.register_pickle_by_value(module)
213+
201214
with open(path / "program.pkl", "wb") as f:
202215
cloudpickle.dump(self, f)
203216
except Exception as e:

0 commit comments

Comments
 (0)