Skip to content

Support custom imported module serialization with cloudpickle #8286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025

Conversation

erandeutsch
Copy link
Contributor

Description

This pull request implements the ability to pickle imported modules by value when saving whole programs.

This aims to solve issue #8285; a saved dspy program cannot be reliably loaded from a different directory if it uses an imported signature from another file.

Changes

A new optional modules_to_serialize parameter is added to the save method, allowing users to specify custom imported modules for serialization with cloudpickle. modules_to_serialize is a list of modules, which are then registered for serialization by value using cloudpickle.register_pickle_by_value.

For example, saving a module which uses an imported module:

module.save("testModule", save_program=True, modules_to_serialize=[ExtractInfo])

Using this, modules can be loaded using dspy.load in a separate directory without any problems related to importing modules. This improves the ability to share dspy programs across computers and repositories, without any import issues.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left 2 minor comments, otherwise LGTM!

@erandeutsch
Copy link
Contributor Author

erandeutsch commented May 28, 2025

I added a section to the "Saving and Loading your DSPy program" tutorial explaining how to use the new modules_to_serialize parameter in the save method for serializing imported modules.

### Serializing Imported Modules
When saving a program with `save_program=True`, you might need to include custom modules that your program depends on.
You can specify which custom modules should be serialized with your program by passing them to the `modules_to_serialize`
parameter when calling `save`. This ensures that any dependencies your program relies on are included during serialization and
available when loading the program later.
This uses cloudpickle's `cloudpickle.register_pickle_by_value` function in order to register a module as picklable by value. When
a module is registered this way, cloudpickle will serialize the module by value rather than by reference, ensuring that the
module contents are preserved with the saved program.
For example, if your program uses custom modules:
```python
import dspy
import my_custom_module
compiled_dspy_program = dspy.ChainOfThought(my_custom_module.custom_signature)
# Save the program with the custom module
compiled_dspy_program.save(
"./dspy_program/",
save_program=True,
modules_to_serialize=[my_custom_module]
)
```
This ensures that the required modules are properly serialized and available when loading the program later. Any number of
modules can be passed to `modules_to_serialize`. If you don't specify `modules_to_serialize`, no additional modules will be
registered for serialization.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chenmoneygithub chenmoneygithub merged commit bc61653 into stanfordnlp:main May 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants