Bug Description
When serializing a pipeline containing regex patterns to YAML using pipe.dumps() and then deserializing it with Pipeline.loads(), the operation fails due to invalid escape sequences in the generated YAML.
Steps to Reproduce
- Create a pipeline using the Document Cleaner example from the documentation
- Serialize the pipeline to YAML using
pipe.dumps()
- Deserialize the YAML back to a pipeline using
Pipeline.loads()
- Run the pipeline
The deserialization fails with an error about unexpected characters, e.g., #x0008, caused by YAML interpreting \b (intended as a regex word boundary) as a backspace character escape sequence.
The error is caused by pipe.dumps() that generates a YAML with single backslashes in regex patterns (e.g., \b, \w), which YAML interprets as escape sequences. To represent literal backslashes in regex patterns, YAML requires double backslashes (\\b, \\w).
Proposed Solution
The pipe.dumps() method should properly escape backslashes in regex strings with a double backslash when generating YAML output.
Expected behavior
The pipeline should serialize to valid YAML and deserialize successfully without errors.
Error message
SyntaxWarning: "\w" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\w"? A raw string is also an option.
bm25_tokenization_regex: '(?u)\b\w+\b'
Traceback (most recent call last):
File "/haystack/core/pipeline/base.py", line 308, in loads
deserialized_data = marshaller.unmarshal(data)
File "/haystack/lib/python3.14/site-packages/haystack/marshal/yaml.py", line 40, in unmarshal
return yaml.load(data_, Loader=YamlLoader)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/init.py", line 79, in load
loader = Loader(stream)
File "/haystack/lib/python3.14/site-packages/yaml/loader.py", line 34, in init
Reader.init(self, stream)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/reader.py", line 74, in init
self.check_printable(stream)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/reader.py", line 143, in check_printable
raise ReaderError(self.name, position, ord(character),
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x0008: special characters are not allowed
in "", position 1121
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/haystack/lib/python3.14/site-packages/haystack/core/pipeline/base.py", line 310, in loads
raise DeserializationError(
...<2 lines>...
) from e
haystack.core.errors.DeserializationError: Error while unmarshalling serialized pipeline data. This is usually caused by malformed or invalid syntax in the serialized representation.
Bug Description
When serializing a pipeline containing regex patterns to YAML using
pipe.dumps()and then deserializing it withPipeline.loads(), the operation fails due to invalid escape sequences in the generated YAML.Steps to Reproduce
pipe.dumps()Pipeline.loads()The deserialization fails with an error about unexpected characters, e.g.,
#x0008, caused by YAML interpreting\b(intended as a regex word boundary) as a backspace character escape sequence.The error is caused by
pipe.dumps()that generates a YAML with single backslashes in regex patterns (e.g.,\b,\w), which YAML interprets as escape sequences. To represent literal backslashes in regex patterns, YAML requires double backslashes (\\b,\\w).Proposed Solution
The
pipe.dumps()method should properly escape backslashes in regex strings with a double backslash when generating YAML output.Expected behavior
The pipeline should serialize to valid YAML and deserialize successfully without errors.
Error message
SyntaxWarning: "\w" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\w"? A raw string is also an option.
bm25_tokenization_regex: '(?u)\b\w+\b'
Traceback (most recent call last):
File "/haystack/core/pipeline/base.py", line 308, in loads
deserialized_data = marshaller.unmarshal(data)
File "/haystack/lib/python3.14/site-packages/haystack/marshal/yaml.py", line 40, in unmarshal
return yaml.load(data_, Loader=YamlLoader)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/init.py", line 79, in load
loader = Loader(stream)
File "/haystack/lib/python3.14/site-packages/yaml/loader.py", line 34, in init
Reader.init(self, stream)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/reader.py", line 74, in init
self.check_printable(stream)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^
File "/haystack/lib/python3.14/site-packages/yaml/reader.py", line 143, in check_printable
raise ReaderError(self.name, position, ord(character),
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x0008: special characters are not allowed
in "", position 1121
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/haystack/lib/python3.14/site-packages/haystack/core/pipeline/base.py", line 310, in loads
raise DeserializationError(
...<2 lines>...
) from e
haystack.core.errors.DeserializationError: Error while unmarshalling serialized pipeline data. This is usually caused by malformed or invalid syntax in the serialized representation.