AI-powered Root Cause Analysis (RCA) system for production incidents, leveraging Embabel AI agents and DICE (Domain-Integrated Context Engineering) for intelligent memory and reasoning.
The test-report-server provides a web-based interface for running tests, viewing results, and tuning AI parameters to improve detection success across scenarios. See the Test Report Server & AI Parameter Tuning section below for details.
The system consists of two primary Kotlin/Spring Boot modules:
embabel-dice-rca: The analysis engine and AI agent. It collects telemetry from Datadog (logs, metrics, spans), performs pattern analysis, and identifies root cause candidates.dice-server: The intelligent memory and reasoning engine. It decomposes incident data into atomic facts (propositions) and provides a reasoning API to answer complex questions about incidents.
- Telemetry Collection: Interfaces with Datadog REST API.
- Analysis Engine: Clusters logs, identifies metric anomalies, and correlates APM traces.
- AI Agent: Uses Embabel framework to orchestrate the investigation workflow.
- DICE Bridge: Pushes investigation results to the DICE server for persistent memory.
- Ingestion API: Receives raw incident data and reports.
- Proposition Extraction: Uses LLMs to extract atomic, factual propositions from text.
- Reasoning Engine: Provides semantic query capabilities over stored incident memory.
- Persistence: Managed factual memory of all incidents.
- Test Execution UI: Web interface for running integration tests and viewing results.
- Persistent Test Storage: H2 database stores all test runs with AI parameters, coverage metrics, and outcomes.
- AI Parameter Tuning: Track which parameter combinations (model, temperature, keywords) improve detection success across scenarios.
- Analysis Dashboard: View test summaries, filter by scenario/status, and analyze trends over time.
- Java 21+
- Maven 3.8+
- OpenAI or Anthropic API Key
- Datadog API & App Keys
Set environment variables:
export OPENAI_API_KEY="sk-..."
export DD_API_KEY="..."
export DD_APP_KEY="..."
export DD_SITE="datadoghq.com"The project is configured to use Java 21 and Kotlin JVM target 21. If you see IDE errors about JVM target mismatches:
VS Code Settings (already configured in .vscode/settings.json):
kotlin.compiler.jvm.target: Set to"21"to match Maven configurationkotlin.languageServer.enabled: Enabled for Kotlin language support
If you see "Cannot inline bytecode built with JVM target 21 into bytecode that is being built with JVM target 1.8":
- Reload VS Code window:
Ctrl+Shift+P→ "Reload Window" - Ensure Java 21 is selected as the project SDK
- The Maven build uses JVM target 21 correctly - this is typically an IDE cache issue
Maven Configuration:
- Both modules (
dice-serverandembabel-dice-rca) havejvmTarget=21configured in theirpom.xml - Kotlin Maven plugin version is explicitly set to match Kotlin version
-
Start the DICE Server:
cd dice-server && mvn spring-boot:run
-
Run the RCA Agent:
cd embabel-dice-rca && mvn spring-boot:run
The project includes a comprehensive integration test harness that simulates a Datadog incident and verifies the full flow from analysis to DICE reasoning:
cd embabel-dice-rca && mvn test -Dtest=SystemIntegrationTestEach module contains unit tests for its core logic:
cd dice-server && mvn test
cd embabel-dice-rca && mvn testThe test harness provides a web-based UI for iterative tuning of AI parameters to improve detection success across scenarios. Given the complexity of minimal configuration scenarios, the system persists all test results to a local H2 database for analysis.
Start the Test Report Server:
cd test-report-server
mvn spring-boot:run
# Open http://localhost:8081Key Features for AI Tuning:
-
Run Tests from UI: Execute specific test patterns (e.g.,
DiceRcaIntegration,AllScenarios) directly from the web interface with optional verbose logging. -
Persistent Test Storage: All test runs are saved to H2 database (
embabel-dice-rca/test-reports/test-history) with:- AI parameters (model, temperature)
- Test outcomes (passed/failed, keyword coverage, component/cause identification)
- Performance metrics (duration, API calls)
- Full test execution reports (JSON)
-
Coverage Metrics: Each test result shows keyword coverage percentage, allowing you to:
- Identify which scenarios need keyword adjustments
- Track improvements after parameter changes
- Compare detection success across different AI model/temperature combinations
-
Historical Analysis: View recent runs, filter by scenario or status, and analyze trends:
- Which parameter combinations yield higher pass rates
- How keyword coverage correlates with test success
- Performance impact of different configurations
-
Real-time Logs: The UI displays test execution logs including:
- LLM model used (e.g.,
gpt-4.1-nano) - Token usage (prompt/completion tokens)
- AI reasoning answers
- System lifecycle events
- LLM model used (e.g.,
Workflow for Tuning:
- Run tests from the UI with current parameters
- Review coverage metrics and pass/fail rates in the dashboard
- Adjust AI parameters (temperature, model, expected keywords) based on results
- Re-run tests and compare outcomes in the "Recent runs" table
- Use the H2 database for deeper analysis (see
embabel-dice-rca/docs/TEST_REPORT_ANALYSIS.mdfor SQL queries)
The persistent storage enables data-driven tuning: identify which parameter adjustments improve detection success for specific scenarios, track improvements over time, and optimize the AI configuration for production use.
See test-report-server/README.md for detailed API and configuration options.
