LLMLabelingOutput

Configuration for LLM-based cluster labeling. Supports multiple LLM providers with comprehensive model selection: - OpenAI: GPT-4o, GPT-4o-mini, GPT-4.1, O3-mini (best for quality) - Google: Gemini 2.5 Flash, Gemini 1.5 Flash (best for speed and cost) - Anthropic: Claude 3.5 Sonnet, Claude 3.5 Haiku (best for reasoning) All models are defined as enums and validated at API level.

Properties

Name	Type	Description	Notes
enabled	bool	Whether to generate labels for clusters using LLM. When enabled, clusters will have semantic labels like 'High-Performance Laptops' instead of generic labels like 'Cluster 0'.	[optional] [default to False]
labeling_inputs	LLMLabelingInputOutput	Input configuration for LLM labeling. Supports flexible input mappings for multimodal inputs (text, images, videos, audio). Use input_mappings for advanced multimodal labeling with providers like Gemini. If not provided (null/undefined), the full document payload will be serialized as JSON and sent to the LLM, providing complete context for semantic labeling.	[optional]
provider	LLMProvider	LLM provider to use for labeling. Supported providers: - openai: GPT models (GPT-4o, GPT-4o-mini, GPT-4.1, O3-mini) - google: Gemini models (Gemini 2.5 Flash, Gemini 1.5 Flash) - anthropic: Claude models (Claude 3.5 Sonnet, Claude 3.5 Haiku) If not specified, automatically inferred from model_name.	[optional]
model_name	ModelName		[optional]
include_summary	bool	Whether to generate cluster summaries	[optional] [default to True]
include_keywords	bool	Whether to extract keywords for clusters	[optional] [default to True]
max_samples_per_cluster	int	Maximum representative documents to send to LLM per cluster for semantic analysis	[optional] [default to 10]
sample_text_max_length	int	Maximum characters per document sample text	[optional] [default to 200]
use_embedding_dedup	bool	Enable embedding-based label deduplication to prevent near-duplicate labels (requires sentence-transformers)	[optional] [default to False]
embedding_similarity_threshold	float	Cosine similarity threshold for duplicate label detection (labels above this are considered duplicates)	[optional] [default to 0.8]
cache_ttl_seconds	int	Time-to-live for cached labels in seconds. Labels for clusters with identical representative documents will be reused within this TTL window, reducing LLM API costs. Default: 604800 (7 days). Set to 0 to disable caching.	[optional] [default to 604800]
custom_prompt	str	OPTIONAL. Custom prompt template for LLM labeling. NOT REQUIRED - uses default discriminative prompt if not provided. When provided, completely replaces the default prompt. Your custom prompt receives cluster information but you must format it yourself. Use when: - Need domain-specific labeling (e.g., medical, legal, technical) - Want different label format (e.g., emoji labels, code names) - Require specific output structure - Have custom business logic for categorization Default prompt includes: cluster document samples, forbidden labels for uniqueness, and JSON response format. See engine/clusters/labeling/prompts.py for reference. Example: 'Analyze these product clusters and generate SHORT category names (2-3 words max) focusing on product type and price range. Return JSON: [{"cluster_id": "cl_0", "label": "..."}]'	[optional]
response_shape	ResponseShape		[optional]
parameters	Dict[str, object]	Provider-specific parameters forwarded to the LLM service. For OpenAI: temperature, max_tokens, top_p, json_output, etc. For Google: temperature, top_k, max_output_tokens, json_output, etc.	[optional]

Example

from mixpeek.models.llm_labeling_output import LLMLabelingOutput

# TODO update the JSON string below
json = "{}"
# create an instance of LLMLabelingOutput from a JSON string
llm_labeling_output_instance = LLMLabelingOutput.from_json(json)
# print the JSON string representation of the object
print(LLMLabelingOutput.to_json())

# convert the object into a dict
llm_labeling_output_dict = llm_labeling_output_instance.to_dict()
# create an instance of LLMLabelingOutput from a dict
llm_labeling_output_from_dict = LLMLabelingOutput.from_dict(llm_labeling_output_dict)

[Back to Model list] [Back to API list] [Back to README]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMLabelingOutput

Properties

Example

FilesExpand file tree

LLMLabelingOutput.md

Latest commit

History

LLMLabelingOutput.md

File metadata and controls

LLMLabelingOutput

Properties

Example