Skip to content

Support custom imported module serialization with cloudpickle #8286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions docs/docs/tutorials/saving/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This guide demonstrates how to save and load your DSPy program. At a high level,

## State-only Saving

State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other informaiton like
State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other information like
the `lm` to use for each `dspy.Predict` in the program. It also includes configurable attributes of other DSPy modules like
`k` for `dspy.retrievers.Retriever`. To save the state of a program, use the `save` method and set `save_program=False`. You can
choose to save the state to a JSON file or a pickle file. We recommend saving the state to a JSON file because it is safer and readable.
Expand Down Expand Up @@ -94,7 +94,39 @@ assert str(compiled_dspy_program.signature) == str(loaded_dspy_program.signature
```

With whole program saving, you don't need to recreate the program, but can directly load the architecture along with the state.
You can pick the suitable saviing approach based on your needs.
You can pick the suitable saving approach based on your needs.

### Serializing Imported Modules

When saving a program with `save_program=True`, you might need to include custom modules that your program depends on.

You can specify which custom modules should be serialized with your program by passing them to the `modules_to_serialize`
parameter when calling `save`. This ensures that any dependencies your program relies on are included during serialization and
available when loading the program later.

This uses cloudpickle's `cloudpickle.register_pickle_by_value` function in order to register a module as picklable by value. When
a module is registered this way, cloudpickle will serialize the module by value rather than by reference, ensuring that the
module contents are preserved with the saved program.

For example, if your program uses custom modules:

```python
import dspy
import my_custom_module

compiled_dspy_program = dspy.ChainOfThought(my_custom_module.custom_signature)

# Save the program with the custom module
compiled_dspy_program.save(
"./dspy_program/",
save_program=True,
modules_to_serialize=[my_custom_module]
)
```

This ensures that the required modules are properly serialized and available when loading the program later. Any number of
modules can be passed to `modules_to_serialize`. If you don't specify `modules_to_serialize`, no additional modules will be
registered for serialization.

## Backward Compatibility

Expand All @@ -104,4 +136,4 @@ are that loading a saved file in a different version of DSPy will not raise an e
the program was saved.

Starting from `dspy>=2.7`, we will guarantee the backward compatibility of the saved program in major releases, i.e., programs saved in `dspy==2.7.0`
should be loadeable in `dspy==2.7.10`.
should be loadable in `dspy==2.7.10`.
15 changes: 14 additions & 1 deletion dspy/primitives/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ def load_state(self, state):
for name, param in self.named_parameters():
param.load_state(state[name])

def save(self, path, save_program=False):
def save(self, path, save_program=False, modules_to_serialize=None):
"""Save the module.

Save the module to a directory or a file. There are two modes:
Expand All @@ -172,6 +172,12 @@ def save(self, path, save_program=False):
- `save_program=True`: Save the whole module to a directory via cloudpickle, which contains both the state and
architecture of the model.

If `save_program=True` and `modules_to_serialize` are provided, it will register those modules for serialization
with cloudpickle's `register_pickle_by_value`. This causes cloudpickle to serialize the module by value rather
than by reference, ensuring the module is fully preserved along with the saved program. This is useful
when you have custom modules that need to be serialized alongside your program. If None, then no modules
will be registered for serialization.

We also save the dependency versions, so that the loaded model can check if there is a version mismatch on
critical dependencies or DSPy version.

Expand All @@ -180,6 +186,9 @@ def save(self, path, save_program=False):
and a directory when `save_program=True`.
save_program (bool): If True, save the whole module to a directory via cloudpickle, otherwise only save
the state.
modules_to_serialize (list): A list of modules to serialize with cloudpickle's `register_pickle_by_value`.
If None, then no modules will be registered for serialization.

"""
metadata = {}
metadata["dependency_versions"] = get_dependency_versions()
Expand All @@ -198,6 +207,10 @@ def save(self, path, save_program=False):
path.mkdir(parents=True)

try:
modules_to_serialize = modules_to_serialize or []
for module in modules_to_serialize:
cloudpickle.register_pickle_by_value(module)

with open(path / "program.pkl", "wb") as f:
cloudpickle.dump(self, f)
except Exception as e:
Expand Down