Skip to content

Commit 0a53800

Browse files
Add release notes for Haystack v2.22.0 (#492)
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
1 parent 0d1fadb commit 0a53800

1 file changed

Lines changed: 150 additions & 0 deletions

File tree

content/release-notes/v2.22.0.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
title: Haystack 2.22.0
3+
description: Release notes for Haystack 2.22.0
4+
toc: True
5+
date: 2026-01-08
6+
last_updated: 2026-01-08
7+
tags: ["Release Notes"]
8+
link: https://github.com/deepset-ai/haystack/releases/tag/v2.22.0
9+
---
10+
11+
## ⭐️ Highlights
12+
13+
### ✂️ Smarter Document Chunking with Embedding-Based Splitting
14+
Introducing the new [EmbeddingBasedDocumentSplitter](https://docs.haystack.deepset.ai/docs/embeddingbaseddocumentsplitter), a component that takes an embedder and splits documents based on semantic similarity rather than fixed sizes or rules.
15+
16+
```python
17+
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
18+
from haystack.components.preprocessors import EmbeddingBasedDocumentSplitter
19+
20+
# Initialize an embedder to calculate semantic similarities
21+
embedder = SentenceTransformersDocumentEmbedder()
22+
23+
# Configure the splitter with parameters that control splitting behavior
24+
splitter = EmbeddingBasedDocumentSplitter(
25+
document_embedder=embedder,
26+
sentences_per_group=2, # Group 2 sentences before calculating embeddings
27+
percentile=0.95, # Split when cosine distance exceeds 95th percentile
28+
min_length=50, # Merge splits shorter than 50 characters
29+
max_length=1000 # Further split chunks longer than 1000 characters
30+
)
31+
result = splitter.run(documents=[doc])
32+
```
33+
34+
### 🔥 `warm_up` Runs Automatically on First Use
35+
36+
Components that define a`warm_up` method now run it automatically on first execution, removing the need for manual calls and preventing errors in standalone usage.
37+
```python
38+
from haystack.components.embedders import SentenceTransformersTextEmbedder
39+
40+
text_embedder = SentenceTransformersTextEmbedder()
41+
# text_embedder.warm_up() # ❌ Don't need this step anymore
42+
print(text_embedder.run("I love pizza!"))
43+
44+
## {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
45+
```
46+
47+
### 🛠️ Multiple Tool String Outputs with `outputs_to_string`
48+
Tools can now expose multiple string outputs via the new `outputs_to_string` configuration, giving you fine-grained control over how tool results are surfaced to the LLM, without changing the underlying tool logic.
49+
50+
``` python
51+
def format_documents(documents):
52+
return "\n".join(f"{i+1}. Document: {doc.content}" for i, doc in enumerate(documents))
53+
54+
def format_summary(metadata):
55+
return f"Found {metadata['count']} results"
56+
57+
tool = Tool(
58+
name="search",
59+
description="Search for documents",
60+
parameters={...},
61+
function=search_func, # Returns {"documents": [Document(...)], "metadata": {"count": 5}, "debug_info": {...}}
62+
outputs_to_string={
63+
"formatted_docs": {"source": "documents", "handler": format_documents},
64+
"summary": {"source": "metadata", "handler": format_summary}
65+
# Note: "debug_info" is not included, so it won't be converted to a string
66+
}
67+
)
68+
69+
# After the tool invocation, the tool result includes:
70+
# {
71+
# "formatted_docs": "1. Document Title\n Content...\n2. ...",
72+
# "summary": "Found 5 results"
73+
# }
74+
```
75+
### 🐍 Python 3.10+ Only
76+
Haystack now requires Python 3.10 or later, as Python 3.9 reached End of Life (EOL) in October 2025.
77+
78+
## ⬆️ Upgrade Notes
79+
- `HuggingFaceLocalChatGenerator` now uses `Qwen/Qwen3-0.6B` as the default model, replacing the previous default.
80+
81+
## ⚡️Enhancement Notes
82+
83+
- The parameters `query_suffix` and `document_suffix` have been added to `SentenceTransformersSimilarityRanker` to support the Qwen3 reranker model family.
84+
85+
Here is an example of how to use these new parameters to use the Qwen3-Reranker-0.6B:
86+
87+
``` python
88+
from haystack import Document
89+
from haystack.components.rankers.sentence_transformers_similarity import SentenceTransformersSimilarityRanker
90+
91+
ranker = SentenceTransformersSimilarityRanker(
92+
model="tomaarsen/Qwen3-Reranker-0.6B-seq-cls",
93+
query_prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: ',
94+
query_suffix="\n",
95+
document_prefix="<Document>: ",
96+
document_suffix="<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
97+
)
98+
99+
result = ranker.run(
100+
query="Which planet is known as the Red Planet?",
101+
documents=[
102+
Document(content="Venus is often called Earth's twin because of its similar size and proximity."),
103+
Document(content="Mars, known for its reddish appearance, is often referred to as the Red Planet."),
104+
Document(content="Jupiter, the largest planet in our solar system, has a prominent red spot."),
105+
Document(content="Saturn, famous for its rings, is sometimes mistaken for the Red Planet."),
106+
],
107+
)
108+
109+
print(result)
110+
```
111+
112+
NOTE: This only works with the Qwen3 reranker models that use the sequence classification architecture. For example, you can find some on `tomaarsen`'s Hugging Face profile.
113+
114+
- Added reasoning content support to `HuggingFaceAPIChatGenerator`. The component now extracts reasoning content from models that support chain-of-thought reasoning (e.g., DeepSeek R1). Both streaming and non-streaming modes are supported. Access via `reply.reasoning.reasoning_text`.
115+
116+
- When an Agent runs as part of a Pipeline, the agent's tracing span now uses the component span as its parent. This enables proper nested trace visualization in tracing tools like Datadog, Braintrust, or OpenTelemetry backends.
117+
118+
- The `_handle_async_stream_response()` method in `OpenAIChatGenerator` now handles `asyncio.CancelledError` exceptions. When a streaming task is cancelled mid-stream, the async for loop gracefully closes the stream using `asyncio.shield()` to ensure the cleanup operation completes even during cancellation.
119+
120+
- A new `enable_thinking` parameter has been added to enable thinking mode in chat templates for thinking-capable models, allowing them to generate intermediate reasoning steps before producing final responses.
121+
122+
- Add support for PEP 604 type syntax. This means that when defining types in components, you can use `X | Y` instead of `Union[X, Y]` and `X | None` instead of `Optional[X]`. The codebase has been migrated to the new syntax, but both syntaxes are fully supported.
123+
124+
- Support Multiple Tool String Outputs
125+
126+
Added support for tools to define multiple string outputs using the `outputs_to_string` configuration. This allows users to specify how different parts of a tool's output should be converted to strings, enhancing flexibility in handling tool results.
127+
128+
- Updated `ToolInvoker` to handle multiple output configurations.
129+
- Updated `Tool` to validate and store multiple output configurations.
130+
- Added tests to verify the functionality of multiple string outputs.
131+
132+
This enables tools to provide rich, varied context to language models or downstream components without requiring multiple tool calls, while keeping full control over which outputs are stringified.
133+
134+
- Added validation for `inputs_from_state` and `outputs_to_state` parameters in the `Tool` class. Tools now validate at construction time that state mappings reference valid tool parameters and outputs, catching configuration errors early instead of at runtime. The validation uses function introspection and JSON schema to ensure parameter names exist, and subclasses like `ComponentTool` validate against component input/output sockets.
135+
136+
## 🐛 Bug Fixes
137+
138+
- Improved error messages in ConditionalRouter when non-string values are provided as route outputs. Users now receive clear guidance (e.g., "use '2' instead of 2") instead of the cryptic "Can't compile non template nodes" error.
139+
- Fixes jinja2 variable detection in `ConditionalRouter`, `ChatPromptBuilder`, `PromptBuilder` and `OutputAdapter` by properly skipping variables that are assigned within the template. Previously under specific scenarios variables assigned within a template would falsely be picked up as input variables to the component. For more information you can check out the parent issue in the Jinja2 library here: <https://github.com/pallets/jinja/issues/2069>
140+
- Fixes deserializing an instance of `NamedEntityExtractor` when `pipeline_kwargs` is stored in the deserialization dict with the value of `None`.
141+
- When creating an HTTP client object from a dictionary, we now convert the `limits` parameter to an `httpx.Limits` object to avoid AttributeError.
142+
- Raise a `ValueError` when an async function is passed to the `Tool` class. Async functions are not supported as tools. This change provides a clear error message instead of silent failures where coroutines are never awaited.
143+
144+
## ⚠️ Deprecation Notes
145+
146+
- The `return_empty_on_no_match` parameter has been removed from the `RegexTextExtractor` component. This component now always returns a dictionary with the key "captured_text"; the value can be an empty string if no match is found or the captured text. Currently, the `return_empty_on_no_match` parameter is ignored. Starting from Haystack 2.23.0, initializing the component with this parameter will raise an error.
147+
148+
## 💙 Big thank you to everyone who contributed to this release!
149+
150+
@anakin87, @ArzelaAscoIi, @bilgeyucel, @Bobholamovic, @davidsbatista, @dfokina, @GunaPalanivel, @majiayu000, @OliverZhangA, @sjrl, @TaMaN2031A, @tommasocerruti, @tstadel, @vblagoje, @YassineGabsi

0 commit comments

Comments
 (0)