Skip to content

docs(rag): document SearchResultItem support in build_context / deduplicate_chunks (PR #1739) #431

Description

@MervinPraison

Source Change

Upstream PR merged: MervinPraison/PraisonAI#1739"fix: make deduplicate_chunks and build_context handle SearchResultItem objects" (fixes #1732).

  • Head SHA: 984808d3bd050f5e667694a1c70fdcfa49bd48ca
  • Merged: 2026-05-30
  • Source file: src/praisonai-agents/praisonaiagents/rag/context.py
  • New tests: src/praisonai-agents/tests/unit/rag/test_context.py, test_context_normalization.py

What changed in the SDK

The RAG context utilities (deduplicate_chunks, build_context, and DefaultContextBuilder.build) previously assumed every search result was a plain dict. After PR #1739 they accept both Dict[str, Any] and SearchResultItem objects (from praisonaiagents.knowledge.models).

Key implementation details to reflect in docs:

  1. New helper (internal): _extract_value(item, key, default) transparently reads from item[key] (dict) or getattr(item, key, default) (object). Documenters don't need to expose this — just describe the behaviour.

  2. New type alias: ResultItem = Union[Dict[str, Any], SearchResultItem].

  3. Updated public signatures:

    def deduplicate_chunks(
        results: List[ResultItem],
        similarity_threshold: float = 0.9,
    ) -> List[ResultItem]: ...
    
    def build_context(
        results: List[ResultItem],
        max_tokens: int = 4000,
        deduplicate: bool = True,
        separator: str = "\n\n---\n\n",
        include_source: bool = True,
    ) -> Tuple[str, List[ResultItem]]: ...
    
    class DefaultContextBuilder:
        def build(
            self,
            results: List[ResultItem],
            max_tokens: int = 4000,
            deduplicate: bool = True,
        ) -> str: ...
  4. Metadata precedence (new, documented behaviour): for source / filename, the lookup order is:

    1. result.metadata[key] (if present and non-None)
    2. Top-level attribute / key on the item (e.g. SearchResultItem.source)
    3. The supplied default ("")

    This is implemented by _extract_metadata_value() and is verified by tests.

  5. Defensive normalization: if metadata is None or not a dict, it is coerced to {} before lookups. Worth a one-line note for users wiring custom stores.

  6. Backward compatible: dict-based callers continue to work unchanged.

Why this matters to users

End-users mostly use RAG through Agent / Knowledge / RAG, but the public helpers build_context, deduplicate_chunks, and ContextBuilderProtocol are part of the documented surface in docs/rag/module.mdx. Today the docs only show List[Dict[str, Any]], which is now inaccurate and makes the SearchResultItem path look unsupported.


Required Documentation Updates

Per AGENTS.md: place new pages in docs/features/; do not create or modify anything in docs/concepts/. Updates to the existing docs/rag/ and docs/knowledge/ pages are in scope.

1. Update docs/rag/module.mdx (existing page)

Apply the following surgical edits — no full rewrite needed.

a) "Context Utilities" section (around L184–L203)

Replace the current block with content that:

  • Shows that both dict and SearchResultItem are accepted.
  • Adds a SearchResultItem example alongside the dict example (use <Tabs> with two tabs: "Dict results" and "SearchResultItem results").
  • Mentions the metadata→top-level→default fallback for source / filename.

Suggested example (verified against tests):

from praisonaiagents.rag import build_context, deduplicate_chunks
from praisonaiagents.knowledge.models import SearchResultItem

results = [
    SearchResultItem(text="First content",  source="a.pdf", filename="a.pdf"),
    SearchResultItem(text="Second content", source="b.pdf", filename="b.pdf"),
]

# Mixed dict + object input also works
results.append({"text": "Third content", "metadata": {"filename": "c.pdf"}})

unique = deduplicate_chunks(results)
context, used = build_context(unique, max_tokens=2000, include_source=True)

Add a <Note> callout:

build_context and deduplicate_chunks accept a mix of dict results and SearchResultItem objects. When include_source=True, the label is taken from metadata["filename"] / metadata["source"] first, falling back to the top-level filename / source attribute on the item, and finally to Source N.

b) "Protocols → ContextBuilderProtocol" section (around L149–L165)

Update the signature to show the broader input type and add a one-liner explaining that custom builders should also be tolerant of both shapes. Suggested:

from typing import Any, Dict, List, Union
from praisonaiagents.knowledge.models import SearchResultItem
from praisonaiagents.rag.protocols import ContextBuilderProtocol

ResultItem = Union[Dict[str, Any], SearchResultItem]

class MyContextBuilder:
    def build(
        self,
        results: List[ResultItem],
        max_tokens: int = 4000,
        deduplicate: bool = True,
    ) -> str:
        ...

c) Add a tiny "Result item formats" subsection (just above "Integration with Knowledge", ~L205) that documents the two accepted shapes and the fallback rules in a small table:

Lookup 1st choice 2nd choice Fallback
source metadata["source"] item.source (object) / item["source"] (dict) ""
filename metadata["filename"] item.filename / item["filename"] ""
text item.text / item["text"] item.memory / item["memory"] "" (item skipped in build_context)

2. Light touch updates to related pages (no new pages)

  • docs/rag/retrieval.mdx — if it lists what Knowledge.search() returns, add one line: "Each item is a SearchResultItem (see praisonaiagents.knowledge.models); it can be passed directly into build_context / deduplicate_chunks."
  • docs/knowledge/overview.mdx — if it describes search outputs, add the same one-liner so users discover the model.

Do not touch docs/concepts/. Do not edit anything under docs/sdk/reference/ (auto-generated) or docs/js/ / docs/rust/ (auto-managed by the parity system).

3. Style requirements (from AGENTS.md)

  • User-facing, not SDK-spec style. Lead with the agent / end-user benefit ("RAG now works whether your store returns dicts or typed objects — no conversion needed").
  • Use Mintlify components: <Tabs> for the dict-vs-object examples, <Note> for the fallback rule, <AccordionGroup> for any extra notes.
  • Keep section intros to a single sentence; avoid forbidden phrases ("As you can see…", "It's important to note…", etc.).
  • Examples must be copy-paste runnable and use the friendly top-level imports (from praisonaiagents.rag import ..., from praisonaiagents.knowledge.models import SearchResultItem).
  • No emojis unless explicitly requested.

4. Out of scope

  • Do not document _extract_value / _extract_metadata_value — they are private helpers.
  • Do not add a brand-new top-level page for this; it is an additive behaviour change to existing utilities.
  • Do not modify docs/concepts/* (human-approved folder per AGENTS.md §1.8).

Acceptance Criteria

  • docs/rag/module.mdx shows the updated signatures for build_context, deduplicate_chunks, and ContextBuilderProtocol.build accepting Union[Dict[str, Any], SearchResultItem].
  • At least one runnable example uses SearchResultItem directly.
  • A <Note> (or equivalent callout) explains the metadata → top-level → default fallback for source / filename.
  • Mixed input (dict + SearchResultItem) is shown to be supported.
  • Related pages (docs/rag/retrieval.mdx, docs/knowledge/overview.mdx) carry a one-line cross-reference to SearchResultItem.
  • No edits under docs/concepts/, docs/sdk/reference/, docs/js/, or docs/rust/.
  • All code examples import from praisonaiagents.rag / praisonaiagents.knowledge.models (no deep internal paths).
  • Mintlify frontmatter on any edited page remains valid and the file renders.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    claudeTrigger Claude Code analysisdocumentationImprovements or additions to documentationrag

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions