|
2 | 2 | <img src="https://sodafoundation.io/wp-content/uploads/2025/10/SODA_logo_outline_c.png" alt="SODA Foundation Logo" width="100"/> |
3 | 3 | <p align="left"> |
4 | 4 | <br/> |
5 | | - <b>SODA Contexture</b> is a new initiative by the SODA Foundation. <br/> |
6 | | - The <b>SODA TS AI Agent</b> serves as the first prototype of this project, demonstrating our vision for intelligent data management. |
7 | 5 | </p> |
8 | 6 | </div> |
9 | 7 |
|
10 | | -### Welcome to SODA Contexture |
11 | | -### 🚧 Status: Research & Development in Progress |
12 | | -We are currently in the active R&D phase, exploring new possibilities and building the foundations of Contexture. Things are moving fast, and we are excited about the road ahead! |
13 | | -### 🤝 Join Us! |
14 | | -We warmly welcome more contributors and partners to join us on this journey. Whether you are a developer, researcher, or organization interested in data management and AI, there is a place for you here. |
15 | | -- **Contribute**: Join the development of the SODA Contexture on [GitHub](https://github.com/sodafoundation/contexture) |
16 | | -- **Connect**: Have questions or want to get involved? Chat with us on Slack: [Join SODA Foundation Slack](https://sodafoundation.io/slack) |
17 | | ---- |
18 | | -<div align="center"> |
19 | | - <sub>Part of the <a href="https://sodafoundation.io">SODA Foundation</a> ecosystem.</sub> |
20 | | -</div> |
21 | | - |
22 | | - |
23 | | -# SODA TS AI Agent is part of SODA Contexture Project. It is the first prototype developement and research. |
24 | | ---- |
25 | | -# soda-ts-ai-agent |
26 | | - |
27 | | -soda-ts-ai-agent is an open source AI agent for time series data |
28 | | -An initial implementation of the TSDB copilot to test multiple frameworks and their effect on the quality of answers. Currently using direct HTTP requests to the Prometheus endpoints with LLM generated PromQL, along with a dynamically generated prompt for each user query. |
29 | | - |
30 | | -## 1. Prerequisites |
31 | | -### A running Time Series Database (TSDB) with accessible endpoints |
32 | | - - Currently, this project uses **Prometheus** as the TSDB backend |
33 | | - - Prometheus must be up and running, and its HTTP API endpoint should be available. |
34 | | - |
35 | | -**Setting up Prometheus** |
36 | | -Follow the official Prometheus installation guide to get started: |
37 | | -(https://prometheus.io/docs/prometheus/latest/getting_started/) |
38 | | - |
39 | | -### An available LLM served through Ollama |
40 | | -- This project relies on Ollama to run large language models locally. |
41 | | -- Install Ollama by following the instructions: |
42 | | - (https://docs.ollama.com/linux) |
43 | | -- Make sure the Ollama service is running (default endpoint: `http://127.0.0.1:11434`). |
44 | | -- Model should be downloaded and running in Ollama. |
45 | | - |
46 | | -## 2. Install Dependencies |
47 | | - |
48 | | -```bash |
49 | | -python -m venv .venv |
50 | | -source .venv/bin/activate |
51 | | -pip install -r requirements.txt |
52 | | -``` |
53 | | - |
54 | | -## 3. Prepare Your Config |
55 | | - |
56 | | -- **TSDB endpoint**: Set in `config/prometheus_config.yaml` |
57 | | -- **Ollama endpoint**: Set in `config/ollama_config.yaml` |
58 | | -- **Agent modes**: can be configured in `config/agent_modes.yaml` |
59 | | - |
60 | | -## 4. Agent Modes |
61 | | -Currently we have |
62 | | -- `DYNAMIC_PROMPT`: Advanced prompt building with context and examples -> generate PromQL -> HTTP request to Prometheus endpoint |
63 | | -Any other modes like MCP or any other can be added. |
64 | | - |
65 | | -## 5. Run CLI |
66 | | - |
67 | | -#### Pre-requisite |
68 | | -Please follow the steps to configure Agent mode before running the cli. |
69 | | -- `DYNAMIC_PROMPT`: Follow [Configure DYNAMIC_PROMPT](#8-dynamic-prompt-mode) |
70 | | - |
71 | | -#### Running the solution |
72 | | - |
73 | | -```bash |
74 | | -python pkg/cli.py --query-set test/query_sets/example1.yaml --copilot DYNAMIC_PROMPT --prometheus-config config/prometheus_config.yaml |
75 | | -``` |
76 | | - |
77 | | -## 6. Query Set Format |
78 | | - |
79 | | -```yaml |
80 | | -queries: |
81 | | - - "Which cluster has highest CPU utilisation?" |
82 | | - - "Which cluster has the highest memory allocation?" |
83 | | -``` |
84 | | -
|
85 | | -## 7. Output Format |
86 | | -
|
87 | | -Output is saved as YAML in the specified output directory. |
88 | | -
|
89 | | -```yaml |
90 | | -"Your question here": |
91 | | - final: "Final human-readable summary or conclusion" |
92 | | - ollama_response: "Detailed step-by-step reasoning or intermediate generation from LLM" |
93 | | - promql: "raw PromQL query" |
94 | | - result: "Output results of PromQL execution" |
95 | | - error: "Optional error message if something went wrong" |
96 | | -``` |
97 | | -
|
98 | | -Or on error: |
99 | | -```yaml |
100 | | -Which cluster has highest CPU utilisation in last month?: |
101 | | - error: timed out |
102 | | -``` |
103 | | -
|
104 | | -## 8. Dynamic Prompt Mode |
105 | | -
|
106 | | -To onboard domain knowledge for better prompts: |
107 | | -
|
108 | | -```bash |
109 | | -# To embed the mappings with current set of metrics available in prometheus. |
110 | | -curl <prometheus_url>/metrics > ./config/metrics.txt |
111 | | - |
112 | | -python pkg/copilot/DP_logic/DynamicPrompt/onboarding_cli.py |
113 | | -``` |
114 | | - |
115 | | -It creates vector embeddings for metric context. |
116 | | - |
117 | | -### **Important:** Update the following paths in your `.env` file: |
118 | | - |
119 | | -```env |
120 | | -EMBEDDING_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/embeddings/embeddings.npz |
121 | | -TEMPLATE_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/template_sections |
122 | | -OVERRIDE_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/overrides.json |
123 | | -EXAMPLES_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/golden_examples.json |
124 | | -INFO_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/additional_context.json |
125 | | -``` |
126 | | -## Demo Videos |
127 | | -TS AGENT flow with kubernetes metrics: https://www.youtube.com/watch?v=al3kg0OENMo |
| 8 | +### SODA Contexture |
| 9 | +The Open Context Engine for AI |
| 10 | + |
| 11 | +### What is SODA Contexture? |
| 12 | +SODA Contexture is an open source project under SODA Foundation (a sub-foundation under Linux Foundation). |
| 13 | +It is an open context building engine for AI. |
| 14 | +SODA Contexture provides a platform to build enriched operational contexts to AI Agents for various data sources at scale. |
| 15 | +It improves the accuracy, efficiency, and speed of data inferences and insights significantly. |
| 16 | + |
| 17 | +The project defines the Open Context Specification(OCS) to describe the data in a structured way. The specification provides the context implementation guidelines. |
| 18 | +SODA Contexture builds contexts using internal context agents based on OCS and also third party context sources. |
| 19 | + |
| 20 | +### The key problems it solves |
| 21 | +There is no standard way of communication to AI to get things done! |
| 22 | +Hence, the data inference and insights suffer from: |
| 23 | +- Low Accuracy |
| 24 | + - The accuracy of results varies drastically based on the nature of data and inputs |
| 25 | + = Mixing guesses and different sources of knowledge confuses AI |
| 26 | +- Inconsistency |
| 27 | + - Hallucination is key known issue with AI |
| 28 | +- High Latency |
| 29 | + - Based on the type of query and volume of data, it fails to give ontime results |
| 30 | +- Huge Cost |
| 31 | + - Iterations to get a close results and verification add costs |
| 32 | +- Lack of Scale |
| 33 | + - Works for small amount of data or 1 agent, when it comes to scale, it fails |
| 34 | +- Low Reliability |
| 35 | + = Due uncertain results AI is not fully dependable |
| 36 | + |
| 37 | +One of the solutions to these problems is to provide the right context to the AI, for it to understand better to fetch the right pieces of data to derive the right inference. |
| 38 | +However this is not easy. Because, the data relationships and types can vary. That is why SODA Contexture is trying to solve the issue of “Missing Context” |
| 39 | +through OCS and building various components connecting to provide enriched and structured context. |
| 40 | + |
| 41 | +### System Architecture |
| 42 | +<img width="164" height="164" alt="image" src="https://github.com/user-attachments/assets/c9fc6cdd-8be9-4a1d-a825-ab5b7db10a28" /> |
| 43 | + |
| 44 | +SODA Contexture derives enriched context based on the OCS (Open Context Specification) implementation |
| 45 | +for the specific data sources and fills the issue of “Missing Context”. It builds the best possible |
| 46 | +context using its context building engine based on OCS for the input queries. Using this enhanced context |
| 47 | +AI models can understand the context better and fetch the right data (or data sets) to provide accurate |
| 48 | +inferences and insights. |
| 49 | + |
| 50 | +<img width="368" height="146" alt="image" src="https://github.com/user-attachments/assets/0529f34d-fa4f-44a4-8846-7ed973a4c0f6" /> |
| 51 | + |
| 52 | +#### SODA Contexture Ecosystem Comprises of: |
| 53 | +- SODA Contexture Engine: The core component that processes user requests and orchestrates context generation. |
| 54 | +- Open Context Specification: The specification which details the operational context building attributes for various types of data. |
| 55 | +- Data Connectors: Logical Connectors to different types of data such as prometheus, sql, s3 and so on to understand the nature of data storage and layout. These connectors provide SODA Contexture a better idea to use the OCS to build the context better. These are logical connectors for specific data source |
| 56 | +- Context Providers: Sources that provide enriched context information (e.g., Istio, Kubernetes). |
| 57 | + |
| 58 | +### Open Context Specification (OCS) |
| 59 | +OCS (Open Context Specification) provides the specification for operational data context spec for different kinds of data sources. It provides the key attributes to derive the best possible context to enable AI to provide more accurate results. |
| 60 | + |
| 61 | +OCS Defines the key attributes to build the operational context: |
| 62 | +- Identity and Origin (The "Who" and "Where") |
| 63 | + - Defines the unique fingerprint of the data source. |
| 64 | + - AI needs this to distinguish between similar metrics from different environments |
| 65 | +- Dimensionality & Topology (The "Relationship") |
| 66 | + - Defines how this metric relates to other components |
| 67 | + - This is the most critical part for AI reasoning. |
| 68 | +- Metric Semantics (The "What") |
| 69 | + - Define what the number actually represents |
| 70 | + - This will avoid the AI comparing unrelated mertrics. |
| 71 | +- Temporal Context (The "When") |
| 72 | + - AI needs to know if it's looking at a "point-in-time" value or a trend. |
| 73 | + - Interval, Duration, Time stamp etc |
| 74 | +- Operational Constraints (The "How") |
| 75 | + - This tells the AI how to interpret the health of the metric. |
| 76 | + - Threshold, Polarity, Aggregation |
| 77 | + |
| 78 | +### Progress |
| 79 | +We are actively developing the project. So if you would like to join the design, OCS and other components, please join us! |
| 80 | + |
| 81 | +### How to join the development? |
| 82 | + - [GitHub](https://github.com/sodafoundation/contexture) |
| 83 | + - [SODA Slack](https://sodafoundation.slack.com) |
0 commit comments