Configuration for web search enrichment using Exa AI-native search. Stage Category: APPLY (Document Enrichment - enriches existing documents) Transformation: N documents → N documents (each enriched with web search results) Purpose: Enriches each document in the pipeline by adding AI-native web search results from Exa's neural ranking system to the document's metadata. The query can be templated per-document using {{DOC.}}, {{INPUT.}}, etc., enabling dynamic searches based on document content. Results are automatically cached by query to minimize redundant API calls. When to Use: - Enrich documents with real-time web context - Add related articles/research to each document - Augment internal data with external web sources - Find competitive intelligence per product/company in documents - Add news/updates related to document entities - Research and discovery based on document content - Context augmentation for RAG pipelines When NOT to Use: - Searching within your own collections (use feature_search instead) - Need full web page content (use web_scrape stage for that) - Want to create NEW documents from web search (this enriches existing ones) - No existing documents to enrich (pipeline must have documents first) Operational Behavior: - ENRICHES existing documents (N → N operation, preserves all docs) - Each document gets web search results added to metadata.web_search - Query is templated per-document with {{DOC.*}} support - Smart caching: identical queries share results (only 1 API call) - Preserves all original document data (ID, collection, score, etc.) - Makes external HTTP request to Exa API (cached per unique query) - Fast operation: 100-500ms per unique query (not per document) Common Pipeline Position: - feature_search → web_search (enrich search results with web context) - feature_search → web_search → llm_filter (search, enrich, then filter) - feature_search → web_search → web_scrape (enrich with URLs, then scrape) Cost & Performance: - Moderate Cost: Exa API charges per unique query (caching reduces costs) - Fast: 100-500ms per unique query, cached queries are instant - Network dependent: requires external API call - Static queries: 1 API call for all documents (highly efficient) - Dynamic queries: 1 API call per unique rendered query Output Schema: Adds to each DocumentResult: - metadata.web_search.query: Rendered query used for this document - metadata.web_search.results: Array of web search results - metadata.web_search.results[].url: Web page URL - metadata.web_search.results[].title: Page title - metadata.web_search.results[].text: Text snippet (if include_text=True) - metadata.web_search.results[].published_date: Publication date - metadata.web_search.results[].author: Author name - metadata.web_search.results[].score: Exa relevance score - metadata.web_search.results[].position: Result position (0-indexed) - metadata.web_search.num_results: Count of results - metadata.web_search.autoprompt_used: Whether autoprompt was enabled Requirements: - query: REQUIRED, search query text (supports templates like {INPUT.query}) - num_results: OPTIONAL, number of results (default 10, max 100) - use_autoprompt: OPTIONAL, use Exa's query enhancement (default True) - start_published_date: OPTIONAL, filter by publication date - category: OPTIONAL, filter by content type - include_text: OPTIONAL, include text snippets (default True) Use Cases: - RAG enhancement: Enrich documents with current web context before LLM - Product research: Add competitor info to each product document - News enrichment: Add latest news to company/entity documents - Academic research: Add related papers to each research document - Documentation augmentation: Add official docs/guides to each result - Competitive intelligence: Enrich results with competitor mentions - Fact verification: Add source citations from web to each claim Examples: Static query enrichment (all documents get same web results): json { \"query\": \"latest AI developments 2024\", \"num_results\": 10, \"include_text\": true } Result: 1 API call total, all documents enriched with same 10 web results Dynamic per-document enrichment (query varies by document): json { \"query\": \"{{DOC.metadata.product_name}} reviews and comparisons\", \"num_results\": 5, \"include_text\": true } Result: 1 API call per unique product name (automatically cached) Hybrid query (combines input + document fields): json { \"query\": \"{{INPUT.topic}} {{DOC.metadata.category}}\", \"num_results\": 3, \"start_published_date\": \"2024-01-01\" } Result: Caching optimizes for documents with same topic+category combo News enrichment with date filter: json { \"query\": \"{{DOC.metadata.company_name}} latest news\", \"num_results\": 5, \"category\": \"news\", \"start_published_date\": \"2024-11-01\" } Result: Recent news added to each company document's metadata
| Name | Type | Description | Notes |
|---|---|---|---|
| query | str | Search query text for Exa AI search. Supports template variables: {{INPUT.field}} for query inputs, {{DOC.field}} for document fields in enrichment context. Exa uses neural ranking for semantic search, so natural language queries work well. Examples: 'machine learning tutorials', 'latest AI developments', '{{INPUT.user_query}}', 'news about {{DOC.metadata.company_name}}' | [optional] [default to '{{INPUT.query}}'] |
| num_results | int | OPTIONAL. Number of search results to return. Must be between 1 and 100. Default is 10. More results = higher API costs. Consider using lower values for faster responses and cost control. | [optional] [default to 10] |
| use_autoprompt | bool | OPTIONAL. Enable Exa's autoprompt feature for query enhancement. When True, Exa optimizes the query for better search results. Default is True. Recommended for most use cases. Disable if you want exact query matching without enhancement. | [optional] [default to True] |
| start_published_date | str | OPTIONAL. Filter results to content published after this date. Format: YYYY-MM-DD (e.g., '2024-01-01'). When NOT specified, returns results from all dates. Useful for finding recent content, news, or time-sensitive information. | [optional] [default to 'null'] |
| category | str | OPTIONAL. Filter results by content category. When NOT specified, searches across all categories. Common categories: 'research paper', 'news', 'github', 'tweet', 'company', 'pdf', 'personal site', 'blog'. Case-insensitive. Use for focused domain search. | [optional] [default to 'null'] |
| include_text | bool | OPTIONAL. Include text snippets in search results. When True, each result includes a text preview (~200 words). Default is True. Disable to reduce API costs and response size. Text snippets are stored in metadata.text field of DocumentResult. | [optional] [default to True] |
from mixpeek.models.stage_params_external_web_search import StageParamsExternalWebSearch
# TODO update the JSON string below
json = "{}"
# create an instance of StageParamsExternalWebSearch from a JSON string
stage_params_external_web_search_instance = StageParamsExternalWebSearch.from_json(json)
# print the JSON string representation of the object
print(StageParamsExternalWebSearch.to_json())
# convert the object into a dict
stage_params_external_web_search_dict = stage_params_external_web_search_instance.to_dict()
# create an instance of StageParamsExternalWebSearch from a dict
stage_params_external_web_search_from_dict = StageParamsExternalWebSearch.from_dict(stage_params_external_web_search_dict)