This release includes the primary inference artifacts for the ApiCortex failure prediction engine, optimized for high-precision downtime detection.
Artifacts
xgboost_failure_prediction_v1_clean.pkl: The trained XGBoost model utilized by theml-serviceto predict API transaction failures.model_metadata_v1_clean.pkl: Associated feature mapping and normalization metadata required for consistent inference performance.
Model Performance Metrics
The model was evaluated using a recall-prioritized threshold (0.73) to maximize early detection of downtime events while maintaining a precision floor.
| Metric | Score |
|---|---|
| ROC-AUC | 0.9692 |
| PR-AUC | 0.8866 |
| Precision | 0.7979 |
| Recall | 0.8528 |
| F1-Score | 0.8244 |
| F2-Score | 0.8412 |
Feature Importance (SHAP Analysis)
The model's decisions are primarily driven by rolling latency windows and error rate variance. The top 5 influential features are:
p95_latency_roll_max_15: Maximum P95 latency over the last 15 intervals.error_rate_ewm: Exponentially weighted moving average of the error rate.latency_p95_zscore: Statistical deviation of P95 latency from the baseline.p95_latency_roll_mean_15: Average P95 latency over the last 15 intervals.error_rate: Raw error rate in the current window.
Important
This model utilizes the ml-service's RollingFeatureEngineer. Ensure that your data ingestion pipeline is active to provide the necessary 15-interval historical window for accurate predictions.