diff --git a/docs/docs/tutorials/saving/index.md b/docs/docs/tutorials/saving/index.md index 66acb6470d..d30d880312 100644 --- a/docs/docs/tutorials/saving/index.md +++ b/docs/docs/tutorials/saving/index.md @@ -7,7 +7,7 @@ This guide demonstrates how to save and load your DSPy program. At a high level, ## State-only Saving -State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other informaiton like +State represents the DSPy program's internal state, including the signature, demos (few-shot examples), and other information like the `lm` to use for each `dspy.Predict` in the program. It also includes configurable attributes of other DSPy modules like `k` for `dspy.retrievers.Retriever`. To save the state of a program, use the `save` method and set `save_program=False`. You can choose to save the state to a JSON file or a pickle file. We recommend saving the state to a JSON file because it is safer and readable. @@ -94,7 +94,39 @@ assert str(compiled_dspy_program.signature) == str(loaded_dspy_program.signature ``` With whole program saving, you don't need to recreate the program, but can directly load the architecture along with the state. -You can pick the suitable saviing approach based on your needs. +You can pick the suitable saving approach based on your needs. + +### Serializing Imported Modules + +When saving a program with `save_program=True`, you might need to include custom modules that your program depends on. + +You can specify which custom modules should be serialized with your program by passing them to the `modules_to_serialize` +parameter when calling `save`. This ensures that any dependencies your program relies on are included during serialization and +available when loading the program later. + +This uses cloudpickle's `cloudpickle.register_pickle_by_value` function in order to register a module as picklable by value. When +a module is registered this way, cloudpickle will serialize the module by value rather than by reference, ensuring that the +module contents are preserved with the saved program. + +For example, if your program uses custom modules: + +```python +import dspy +import my_custom_module + +compiled_dspy_program = dspy.ChainOfThought(my_custom_module.custom_signature) + +# Save the program with the custom module +compiled_dspy_program.save( + "./dspy_program/", + save_program=True, + modules_to_serialize=[my_custom_module] +) +``` + +This ensures that the required modules are properly serialized and available when loading the program later. Any number of +modules can be passed to `modules_to_serialize`. If you don't specify `modules_to_serialize`, no additional modules will be +registered for serialization. ## Backward Compatibility @@ -104,4 +136,4 @@ are that loading a saved file in a different version of DSPy will not raise an e the program was saved. Starting from `dspy>=2.7`, we will guarantee the backward compatibility of the saved program in major releases, i.e., programs saved in `dspy==2.7.0` -should be loadeable in `dspy==2.7.10`. +should be loadable in `dspy==2.7.10`. diff --git a/dspy/primitives/module.py b/dspy/primitives/module.py index 3ddaf74d66..1b9ebb6e54 100644 --- a/dspy/primitives/module.py +++ b/dspy/primitives/module.py @@ -163,7 +163,7 @@ def load_state(self, state): for name, param in self.named_parameters(): param.load_state(state[name]) - def save(self, path, save_program=False): + def save(self, path, save_program=False, modules_to_serialize=None): """Save the module. Save the module to a directory or a file. There are two modes: @@ -172,6 +172,12 @@ def save(self, path, save_program=False): - `save_program=True`: Save the whole module to a directory via cloudpickle, which contains both the state and architecture of the model. + If `save_program=True` and `modules_to_serialize` are provided, it will register those modules for serialization + with cloudpickle's `register_pickle_by_value`. This causes cloudpickle to serialize the module by value rather + than by reference, ensuring the module is fully preserved along with the saved program. This is useful + when you have custom modules that need to be serialized alongside your program. If None, then no modules + will be registered for serialization. + We also save the dependency versions, so that the loaded model can check if there is a version mismatch on critical dependencies or DSPy version. @@ -180,6 +186,9 @@ def save(self, path, save_program=False): and a directory when `save_program=True`. save_program (bool): If True, save the whole module to a directory via cloudpickle, otherwise only save the state. + modules_to_serialize (list): A list of modules to serialize with cloudpickle's `register_pickle_by_value`. + If None, then no modules will be registered for serialization. + """ metadata = {} metadata["dependency_versions"] = get_dependency_versions() @@ -198,6 +207,10 @@ def save(self, path, save_program=False): path.mkdir(parents=True) try: + modules_to_serialize = modules_to_serialize or [] + for module in modules_to_serialize: + cloudpickle.register_pickle_by_value(module) + with open(path / "program.pkl", "wb") as f: cloudpickle.dump(self, f) except Exception as e: