Configuration for RAG context preparation. Stage Category: APPLY Transformation: N documents → 1 context document (single_context mode) OR N documents → N formatted documents (formatted_list mode) Purpose: Prepare search results for LLM consumption by formatting documents, managing token budgets, and adding citations. This is a preparation stage that does NOT call an LLM - it prepares content for downstream LLM stages. When to Use: - Before passing search results to an LLM for RAG - When you need to fit multiple documents into a token budget - When you need citation tracking for source attribution - When you need consistent document formatting When NOT to Use: - When you want the LLM to generate a summary (use summarize stage) - When you don't need token management - For simple pass-through of documents Output Modes: - single_context: Combines all documents into one context string - formatted_list: Returns individually formatted documents Common Pipeline Position: feature_search → rerank → rag_prepare → (external LLM call) Examples: Basic context preparation: json { \"max_tokens\": 8000, \"output_mode\": \"single_context\" } Custom document formatting with citations: json { \"max_tokens\": 4000, \"document_template\": \"[{{CONTEXT.INDEX}}] {{DOC.metadata.title}}\\n{{DOC.content}}\\n\", \"citation\": {\"style\": \"numbered\", \"include_title\": true} } Formatted list for custom processing: json { \"output_mode\": \"formatted_list\", \"document_template\": \"Source: {{DOC.metadata.source}}\\n{{DOC.content}}\" }
| Name | Type | Description | Notes |
|---|---|---|---|
| max_tokens | int | OPTIONAL. Maximum tokens for the combined context output. Documents exceeding this limit are handled by truncation_strategy. Default: 8000 (safe for most models). | [optional] [default to 8000] |
| tokenizer | str | OPTIONAL. Tokenizer to use for token counting. Default: 'cl100k_base' (GPT-4/GPT-3.5 tokenizer). Options: 'cl100k_base', 'p50k_base', 'r50k_base', 'gpt2' | [optional] [default to 'cl100k_base'] |
| truncation_strategy | str | OPTIONAL. How to handle documents exceeding max_tokens: - 'priority_truncate': Include docs in score order, truncate last to fit - 'proportional': Give each doc proportional token budget based on count - 'drop_last': Include complete docs until limit, drop remaining | [optional] [default to 'priority_truncate'] |
| output_mode | str | OPTIONAL. Output format: - 'single_context': One document with combined 'context' string + 'citations' - 'formatted_list': N documents with 'formatted_content' field each | [optional] [default to 'single_context'] |
| document_template | str | OPTIONAL. Template for formatting each document. Available placeholders: - {{CONTEXT.INDEX}}: 1-based position in result set (1, 2, 3...) - {{CONTEXT.CITATION}}: Citation marker based on citation.style - {{DOC.*}}: Any document field (e.g., {{DOC.content}}, {{DOC.metadata.title}}) | [optional] [default to '''[{{CONTEXT.INDEX}}] {{DOC.content}} |
'''] content_field | str | Primary field to extract content from each document. | [optional] [default to 'content'] separator | str | Separator between documents in single_context mode. | [optional] [default to ''' '''] citation | StageDefsCitationConfig | Citation configuration for source tracking. | [optional] context_field | str | Field name for combined context (single_context mode). | [optional] [default to 'rag_context'] citations_field | str | Field name for citation metadata. | [optional] [default to 'citations'] formatted_content_field | str | Field name for formatted content (formatted_list mode). | [optional] [default to 'formatted_content']
from mixpeek.models.stage_params_rag_prepare import StageParamsRagPrepare
# TODO update the JSON string below
json = "{}"
# create an instance of StageParamsRagPrepare from a JSON string
stage_params_rag_prepare_instance = StageParamsRagPrepare.from_json(json)
# print the JSON string representation of the object
print(StageParamsRagPrepare.to_json())
# convert the object into a dict
stage_params_rag_prepare_dict = stage_params_rag_prepare_instance.to_dict()
# create an instance of StageParamsRagPrepare from a dict
stage_params_rag_prepare_from_dict = StageParamsRagPrepare.from_dict(stage_params_rag_prepare_dict)