Complete results from a single clustering execution. Represents the outcome of running a clustering algorithm on a collection's documents. Each execution creates a snapshot of clustering results at a point in time, including the clusters found, quality metrics, and semantic labels. Use Cases: - Display clustering execution history in UI - Compare clustering quality across multiple runs - Track execution status for long-running jobs - Debug failed clustering attempts - View cluster summaries and labels for analysis Workflow: 1. Create cluster configuration → POST /clusters 2. Execute clustering → POST /clusters/{id}/execute 3. Poll execution status → GET /clusters/{id}/executions 4. View execution history → POST /clusters/{id}/executions/list Status Lifecycle: pending → processing → completed (or failed) Note: Execution results are immutable once completed. Re-running clustering creates a new execution result with a new run_id.
| Name | Type | Description | Notes |
|---|---|---|---|
| run_id | str | REQUIRED. Unique identifier for this specific clustering execution. Format: 'run_' prefix followed by random alphanumeric string. Used to retrieve specific execution artifacts and results. Each re-execution of the same cluster creates a new run_id. References execution artifacts in S3 and MongoDB. | |
| cluster_id | str | REQUIRED. Parent cluster configuration that was executed. Format: 'clust_' prefix followed by random alphanumeric string. Links this execution back to the cluster definition. Multiple executions can share the same cluster_id. | |
| status | str | REQUIRED. Current status of the clustering execution. Values: 'pending' = Job queued, waiting to start. 'processing' = Clustering algorithm running (may take minutes for large datasets). 'completed' = Clustering finished successfully, results available. 'failed' = Clustering failed, check error_message for details. Status changes: pending → processing → (completed OR failed). Poll this field to track job progress. | |
| num_clusters | int | REQUIRED. Number of clusters found by the clustering algorithm. Range: 1 to num_points (though typically much lower). Interpretation: Too few clusters = overgeneralization, may need lower n_clusters param. Too many clusters = overfitting, may need higher n_clusters param. Optimal value depends on dataset and use case. Available immediately upon completion, even if metrics fail. | |
| num_points | int | REQUIRED. Total number of documents/points that were clustered. Equals the count of documents in the collection at execution time. Note: This may differ across executions if documents were added/removed. Used to calculate metrics and validate clustering quality. Minimum 2 points required for clustering (1 cluster per point otherwise). | |
| metrics | ClusterExecutionMetrics | OPTIONAL. Quality metrics evaluating clustering performance. NOT REQUIRED - only present for successful executions. null if: - Execution is still pending/processing. - Execution failed. - Too few points to calculate metrics (need 2+ points). Contains silhouette_score, davies_bouldin_index, calinski_harabasz_score. Use to compare quality across multiple executions. | [optional] |
| centroids | List[ClusterExecutionCentroid] | OPTIONAL. List of cluster centroids with semantic labels. NOT REQUIRED - only present for completed executions with LLM labeling enabled. Length: equals num_clusters. Each centroid contains: - cluster_id: Identifier for the cluster (e.g., 'cl_0'). - num_members: Count of documents in this cluster. - label: Human-readable cluster name (e.g., 'Product Reviews'). - summary: Brief description of cluster content. - keywords: Array of representative terms. null if: - Execution pending/processing/failed. - LLM labeling not configured. Use for: Displaying cluster summaries in UI, filtering by cluster. | [optional] |
| created_at | datetime | REQUIRED. Timestamp when the clustering execution started. ISO 8601 format with timezone (UTC). Used to: - Sort executions chronologically. - Calculate execution duration (completed_at - created_at). - Filter execution history by date range. Always present, even for failed executions. | |
| completed_at | datetime | OPTIONAL. Timestamp when the clustering execution finished. ISO 8601 format with timezone (UTC). NOT REQUIRED - only present for completed or failed executions. null if: status is 'pending' or 'processing'. Use to: - Calculate execution duration (completed_at - created_at). - Show when results became available. Present for both successful and failed executions. | [optional] |
| error_message | str | OPTIONAL. Error message if the clustering execution failed. NOT REQUIRED - only present when status is 'failed'. null if: execution succeeded or is still in progress. Contains: - Human-readable error description. - Possible causes and suggested fixes. - Stack trace details (for debugging). Common errors: - 'Insufficient documents for clustering' (need 2+ docs). - 'Feature extractor not found' (invalid collection config). - 'Out of memory' (dataset too large for algorithm). Use for: Debugging failed executions and user error messages. | [optional] |
| llm_labeling_errors | List[str] | OPTIONAL. List of errors encountered during LLM labeling. NOT REQUIRED - only present when LLM labeling was attempted and encountered errors. null if: - LLM labeling was not enabled. - LLM labeling succeeded for all clusters. - Execution is still in progress. Each error is a JSON string containing: - 'error': Human-readable error message. - 'clusters': List of cluster IDs affected by this error. Common errors: - 'LLM API timeout for 2 clusters' (network/API issues). - 'OpenAI rate limit exceeded' (quota exhausted). - 'Invalid model name: gpt-3.5' (config error). - 'No representative documents for cluster cl_3' (empty cluster). Use for: - Debugging why some clusters have fallback labels. - Identifying LLM API issues without failing entire clustering. - Warning users about partial labeling success. | [optional] |
from mixpeek.models.cluster_execution_result import ClusterExecutionResult
# TODO update the JSON string below
json = "{}"
# create an instance of ClusterExecutionResult from a JSON string
cluster_execution_result_instance = ClusterExecutionResult.from_json(json)
# print the JSON string representation of the object
print(ClusterExecutionResult.to_json())
# convert the object into a dict
cluster_execution_result_dict = cluster_execution_result_instance.to_dict()
# create an instance of ClusterExecutionResult from a dict
cluster_execution_result_from_dict = ClusterExecutionResult.from_dict(cluster_execution_result_dict)