Source Change
Upstream PR merged: MervinPraison/PraisonAI#1739 — "fix: make deduplicate_chunks and build_context handle SearchResultItem objects" (fixes #1732).
- Head SHA:
984808d3bd050f5e667694a1c70fdcfa49bd48ca
- Merged: 2026-05-30
- Source file:
src/praisonai-agents/praisonaiagents/rag/context.py
- New tests:
src/praisonai-agents/tests/unit/rag/test_context.py, test_context_normalization.py
What changed in the SDK
The RAG context utilities (deduplicate_chunks, build_context, and DefaultContextBuilder.build) previously assumed every search result was a plain dict. After PR #1739 they accept both Dict[str, Any] and SearchResultItem objects (from praisonaiagents.knowledge.models).
Key implementation details to reflect in docs:
-
New helper (internal): _extract_value(item, key, default) transparently reads from item[key] (dict) or getattr(item, key, default) (object). Documenters don't need to expose this — just describe the behaviour.
-
New type alias: ResultItem = Union[Dict[str, Any], SearchResultItem].
-
Updated public signatures:
def deduplicate_chunks(
results: List[ResultItem],
similarity_threshold: float = 0.9,
) -> List[ResultItem]: ...
def build_context(
results: List[ResultItem],
max_tokens: int = 4000,
deduplicate: bool = True,
separator: str = "\n\n---\n\n",
include_source: bool = True,
) -> Tuple[str, List[ResultItem]]: ...
class DefaultContextBuilder:
def build(
self,
results: List[ResultItem],
max_tokens: int = 4000,
deduplicate: bool = True,
) -> str: ...
-
Metadata precedence (new, documented behaviour): for source / filename, the lookup order is:
result.metadata[key] (if present and non-None)
- Top-level attribute / key on the item (e.g.
SearchResultItem.source)
- The supplied default (
"")
This is implemented by _extract_metadata_value() and is verified by tests.
-
Defensive normalization: if metadata is None or not a dict, it is coerced to {} before lookups. Worth a one-line note for users wiring custom stores.
-
Backward compatible: dict-based callers continue to work unchanged.
Why this matters to users
End-users mostly use RAG through Agent / Knowledge / RAG, but the public helpers build_context, deduplicate_chunks, and ContextBuilderProtocol are part of the documented surface in docs/rag/module.mdx. Today the docs only show List[Dict[str, Any]], which is now inaccurate and makes the SearchResultItem path look unsupported.
Required Documentation Updates
Per AGENTS.md: place new pages in docs/features/; do not create or modify anything in docs/concepts/. Updates to the existing docs/rag/ and docs/knowledge/ pages are in scope.
1. Update docs/rag/module.mdx (existing page)
Apply the following surgical edits — no full rewrite needed.
a) "Context Utilities" section (around L184–L203)
Replace the current block with content that:
- Shows that both
dict and SearchResultItem are accepted.
- Adds a
SearchResultItem example alongside the dict example (use <Tabs> with two tabs: "Dict results" and "SearchResultItem results").
- Mentions the metadata→top-level→default fallback for
source / filename.
Suggested example (verified against tests):
from praisonaiagents.rag import build_context, deduplicate_chunks
from praisonaiagents.knowledge.models import SearchResultItem
results = [
SearchResultItem(text="First content", source="a.pdf", filename="a.pdf"),
SearchResultItem(text="Second content", source="b.pdf", filename="b.pdf"),
]
# Mixed dict + object input also works
results.append({"text": "Third content", "metadata": {"filename": "c.pdf"}})
unique = deduplicate_chunks(results)
context, used = build_context(unique, max_tokens=2000, include_source=True)
Add a <Note> callout:
build_context and deduplicate_chunks accept a mix of dict results and SearchResultItem objects. When include_source=True, the label is taken from metadata["filename"] / metadata["source"] first, falling back to the top-level filename / source attribute on the item, and finally to Source N.
b) "Protocols → ContextBuilderProtocol" section (around L149–L165)
Update the signature to show the broader input type and add a one-liner explaining that custom builders should also be tolerant of both shapes. Suggested:
from typing import Any, Dict, List, Union
from praisonaiagents.knowledge.models import SearchResultItem
from praisonaiagents.rag.protocols import ContextBuilderProtocol
ResultItem = Union[Dict[str, Any], SearchResultItem]
class MyContextBuilder:
def build(
self,
results: List[ResultItem],
max_tokens: int = 4000,
deduplicate: bool = True,
) -> str:
...
c) Add a tiny "Result item formats" subsection (just above "Integration with Knowledge", ~L205) that documents the two accepted shapes and the fallback rules in a small table:
| Lookup |
1st choice |
2nd choice |
Fallback |
source |
metadata["source"] |
item.source (object) / item["source"] (dict) |
"" |
filename |
metadata["filename"] |
item.filename / item["filename"] |
"" |
text |
item.text / item["text"] |
item.memory / item["memory"] |
"" (item skipped in build_context) |
2. Light touch updates to related pages (no new pages)
docs/rag/retrieval.mdx — if it lists what Knowledge.search() returns, add one line: "Each item is a SearchResultItem (see praisonaiagents.knowledge.models); it can be passed directly into build_context / deduplicate_chunks."
docs/knowledge/overview.mdx — if it describes search outputs, add the same one-liner so users discover the model.
Do not touch docs/concepts/. Do not edit anything under docs/sdk/reference/ (auto-generated) or docs/js/ / docs/rust/ (auto-managed by the parity system).
3. Style requirements (from AGENTS.md)
- User-facing, not SDK-spec style. Lead with the agent / end-user benefit ("RAG now works whether your store returns dicts or typed objects — no conversion needed").
- Use Mintlify components:
<Tabs> for the dict-vs-object examples, <Note> for the fallback rule, <AccordionGroup> for any extra notes.
- Keep section intros to a single sentence; avoid forbidden phrases ("As you can see…", "It's important to note…", etc.).
- Examples must be copy-paste runnable and use the friendly top-level imports (
from praisonaiagents.rag import ..., from praisonaiagents.knowledge.models import SearchResultItem).
- No emojis unless explicitly requested.
4. Out of scope
- Do not document
_extract_value / _extract_metadata_value — they are private helpers.
- Do not add a brand-new top-level page for this; it is an additive behaviour change to existing utilities.
- Do not modify
docs/concepts/* (human-approved folder per AGENTS.md §1.8).
Acceptance Criteria
References
Source Change
Upstream PR merged: MervinPraison/PraisonAI#1739 — "fix: make deduplicate_chunks and build_context handle SearchResultItem objects" (fixes #1732).
984808d3bd050f5e667694a1c70fdcfa49bd48casrc/praisonai-agents/praisonaiagents/rag/context.pysrc/praisonai-agents/tests/unit/rag/test_context.py,test_context_normalization.pyWhat changed in the SDK
The RAG context utilities (
deduplicate_chunks,build_context, andDefaultContextBuilder.build) previously assumed every search result was a plaindict. After PR #1739 they accept bothDict[str, Any]andSearchResultItemobjects (frompraisonaiagents.knowledge.models).Key implementation details to reflect in docs:
New helper (internal):
_extract_value(item, key, default)transparently reads fromitem[key](dict) orgetattr(item, key, default)(object). Documenters don't need to expose this — just describe the behaviour.New type alias:
ResultItem = Union[Dict[str, Any], SearchResultItem].Updated public signatures:
Metadata precedence (new, documented behaviour): for
source/filename, the lookup order is:result.metadata[key](if present and non-None)SearchResultItem.source)"")This is implemented by
_extract_metadata_value()and is verified by tests.Defensive normalization: if
metadataisNoneor not a dict, it is coerced to{}before lookups. Worth a one-line note for users wiring custom stores.Backward compatible: dict-based callers continue to work unchanged.
Why this matters to users
End-users mostly use RAG through
Agent/Knowledge/RAG, but the public helpersbuild_context,deduplicate_chunks, andContextBuilderProtocolare part of the documented surface indocs/rag/module.mdx. Today the docs only showList[Dict[str, Any]], which is now inaccurate and makes theSearchResultItempath look unsupported.Required Documentation Updates
Per
AGENTS.md: place new pages indocs/features/; do not create or modify anything indocs/concepts/. Updates to the existingdocs/rag/anddocs/knowledge/pages are in scope.1. Update
docs/rag/module.mdx(existing page)Apply the following surgical edits — no full rewrite needed.
a) "Context Utilities" section (around L184–L203)
Replace the current block with content that:
dictandSearchResultItemare accepted.SearchResultItemexample alongside the dict example (use<Tabs>with two tabs: "Dict results" and "SearchResultItem results").source/filename.Suggested example (verified against tests):
Add a
<Note>callout:b) "Protocols → ContextBuilderProtocol" section (around L149–L165)
Update the signature to show the broader input type and add a one-liner explaining that custom builders should also be tolerant of both shapes. Suggested:
c) Add a tiny "Result item formats" subsection (just above "Integration with Knowledge", ~L205) that documents the two accepted shapes and the fallback rules in a small table:
sourcemetadata["source"]item.source(object) /item["source"](dict)""filenamemetadata["filename"]item.filename/item["filename"]""textitem.text/item["text"]item.memory/item["memory"]""(item skipped inbuild_context)2. Light touch updates to related pages (no new pages)
docs/rag/retrieval.mdx— if it lists whatKnowledge.search()returns, add one line: "Each item is aSearchResultItem(seepraisonaiagents.knowledge.models); it can be passed directly intobuild_context/deduplicate_chunks."docs/knowledge/overview.mdx— if it describes search outputs, add the same one-liner so users discover the model.Do not touch
docs/concepts/. Do not edit anything underdocs/sdk/reference/(auto-generated) ordocs/js//docs/rust/(auto-managed by the parity system).3. Style requirements (from AGENTS.md)
<Tabs>for the dict-vs-object examples,<Note>for the fallback rule,<AccordionGroup>for any extra notes.from praisonaiagents.rag import ...,from praisonaiagents.knowledge.models import SearchResultItem).4. Out of scope
_extract_value/_extract_metadata_value— they are private helpers.docs/concepts/*(human-approved folder per AGENTS.md §1.8).Acceptance Criteria
docs/rag/module.mdxshows the updated signatures forbuild_context,deduplicate_chunks, andContextBuilderProtocol.buildacceptingUnion[Dict[str, Any], SearchResultItem].SearchResultItemdirectly.<Note>(or equivalent callout) explains the metadata → top-level → default fallback forsource/filename.SearchResultItem) is shown to be supported.docs/rag/retrieval.mdx,docs/knowledge/overview.mdx) carry a one-line cross-reference toSearchResultItem.docs/concepts/,docs/sdk/reference/,docs/js/, ordocs/rust/.praisonaiagents.rag/praisonaiagents.knowledge.models(no deep internal paths).References
praisonai-package/src/praisonai-agents/praisonaiagents/rag/context.pypraisonai-package/src/praisonai-agents/praisonaiagents/knowledge/models.py(SearchResultItem)src/praisonai-agents/tests/unit/rag/test_context.pyclassesTestSearchResultItemSupport,TestExtractValue,TestExtractMetadataValue