Skip to content

Commit ea0c1d6

Browse files
authored
Revise README for SODA Contexture project
Updated README to reflect new project structure and key features of SODA Contexture.
1 parent 1926b99 commit ea0c1d6

1 file changed

Lines changed: 76 additions & 120 deletions

File tree

README.md

Lines changed: 76 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -2,126 +2,82 @@
22
<img src="https://sodafoundation.io/wp-content/uploads/2025/10/SODA_logo_outline_c.png" alt="SODA Foundation Logo" width="100"/>
33
<p align="left">
44
<br/>
5-
<b>SODA Contexture</b> is a new initiative by the SODA Foundation. <br/>
6-
The <b>SODA TS AI Agent</b> serves as the first prototype of this project, demonstrating our vision for intelligent data management.
75
</p>
86
</div>
97

10-
### Welcome to SODA Contexture
11-
### 🚧 Status: Research & Development in Progress
12-
We are currently in the active R&D phase, exploring new possibilities and building the foundations of Contexture. Things are moving fast, and we are excited about the road ahead!
13-
### 🤝 Join Us!
14-
We warmly welcome more contributors and partners to join us on this journey. Whether you are a developer, researcher, or organization interested in data management and AI, there is a place for you here.
15-
- **Contribute**: Join the development of the SODA Contexture on [GitHub](https://github.com/sodafoundation/contexture)
16-
- **Connect**: Have questions or want to get involved? Chat with us on Slack: [Join SODA Foundation Slack](https://sodafoundation.io/slack)
17-
---
18-
<div align="center">
19-
<sub>Part of the <a href="https://sodafoundation.io">SODA Foundation</a> ecosystem.</sub>
20-
</div>
21-
22-
23-
# SODA TS AI Agent is part of SODA Contexture Project. It is the first prototype developement and research.
24-
---
25-
# soda-ts-ai-agent
26-
27-
soda-ts-ai-agent is an open source AI agent for time series data
28-
An initial implementation of the TSDB copilot to test multiple frameworks and their effect on the quality of answers. Currently using direct HTTP requests to the Prometheus endpoints with LLM generated PromQL, along with a dynamically generated prompt for each user query.
29-
30-
## 1. Prerequisites
31-
### A running Time Series Database (TSDB) with accessible endpoints
32-
- Currently, this project uses **Prometheus** as the TSDB backend
33-
- Prometheus must be up and running, and its HTTP API endpoint should be available.
34-
35-
**Setting up Prometheus**
36-
Follow the official Prometheus installation guide to get started:
37-
(https://prometheus.io/docs/prometheus/latest/getting_started/)
38-
39-
### An available LLM served through Ollama
40-
- This project relies on Ollama to run large language models locally.
41-
- Install Ollama by following the instructions:
42-
(https://docs.ollama.com/linux)
43-
- Make sure the Ollama service is running (default endpoint: `http://127.0.0.1:11434`).
44-
- Model should be downloaded and running in Ollama.
45-
46-
## 2. Install Dependencies
47-
48-
```bash
49-
python -m venv .venv
50-
source .venv/bin/activate
51-
pip install -r requirements.txt
52-
```
53-
54-
## 3. Prepare Your Config
55-
56-
- **TSDB endpoint**: Set in `config/prometheus_config.yaml`
57-
- **Ollama endpoint**: Set in `config/ollama_config.yaml`
58-
- **Agent modes**: can be configured in `config/agent_modes.yaml`
59-
60-
## 4. Agent Modes
61-
Currently we have
62-
- `DYNAMIC_PROMPT`: Advanced prompt building with context and examples -> generate PromQL -> HTTP request to Prometheus endpoint
63-
Any other modes like MCP or any other can be added.
64-
65-
## 5. Run CLI
66-
67-
#### Pre-requisite
68-
Please follow the steps to configure Agent mode before running the cli.
69-
- `DYNAMIC_PROMPT`: Follow [Configure DYNAMIC_PROMPT](#8-dynamic-prompt-mode)
70-
71-
#### Running the solution
72-
73-
```bash
74-
python pkg/cli.py --query-set test/query_sets/example1.yaml --copilot DYNAMIC_PROMPT --prometheus-config config/prometheus_config.yaml
75-
```
76-
77-
## 6. Query Set Format
78-
79-
```yaml
80-
queries:
81-
- "Which cluster has highest CPU utilisation?"
82-
- "Which cluster has the highest memory allocation?"
83-
```
84-
85-
## 7. Output Format
86-
87-
Output is saved as YAML in the specified output directory.
88-
89-
```yaml
90-
"Your question here":
91-
final: "Final human-readable summary or conclusion"
92-
ollama_response: "Detailed step-by-step reasoning or intermediate generation from LLM"
93-
promql: "raw PromQL query"
94-
result: "Output results of PromQL execution"
95-
error: "Optional error message if something went wrong"
96-
```
97-
98-
Or on error:
99-
```yaml
100-
Which cluster has highest CPU utilisation in last month?:
101-
error: timed out
102-
```
103-
104-
## 8. Dynamic Prompt Mode
105-
106-
To onboard domain knowledge for better prompts:
107-
108-
```bash
109-
# To embed the mappings with current set of metrics available in prometheus.
110-
curl <prometheus_url>/metrics > ./config/metrics.txt
111-
112-
python pkg/copilot/DP_logic/DynamicPrompt/onboarding_cli.py
113-
```
114-
115-
It creates vector embeddings for metric context.
116-
117-
### **Important:** Update the following paths in your `.env` file:
118-
119-
```env
120-
EMBEDDING_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/embeddings/embeddings.npz
121-
TEMPLATE_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/template_sections
122-
OVERRIDE_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/overrides.json
123-
EXAMPLES_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/golden_examples.json
124-
INFO_PATH=/Path to your/pkg/copilot/DP_logic/DynamicPrompt/config/additional_context.json
125-
```
126-
## Demo Videos
127-
TS AGENT flow with kubernetes metrics: https://www.youtube.com/watch?v=al3kg0OENMo
8+
### SODA Contexture
9+
The Open Context Engine for AI
10+
11+
### What is SODA Contexture?
12+
SODA Contexture is an open source project under SODA Foundation (a sub-foundation under Linux Foundation).
13+
It is an open context building engine for AI.
14+
SODA Contexture provides a platform to build enriched operational contexts to AI Agents for various data sources at scale.
15+
It improves the accuracy, efficiency, and speed of data inferences and insights significantly.
16+
17+
The project defines the Open Context Specification(OCS) to describe the data in a structured way. The specification provides the context implementation guidelines.
18+
SODA Contexture builds contexts using internal context agents based on OCS and also third party context sources.
19+
20+
### The key problems it solves
21+
There is no standard way of communication to AI to get things done!
22+
Hence, the data inference and insights suffer from:
23+
- Low Accuracy
24+
- The accuracy of results varies drastically based on the nature of data and inputs
25+
= Mixing guesses and different sources of knowledge confuses AI
26+
- Inconsistency
27+
- Hallucination is key known issue with AI
28+
- High Latency
29+
- Based on the type of query and volume of data, it fails to give ontime results
30+
- Huge Cost
31+
- Iterations to get a close results and verification add costs
32+
- Lack of Scale
33+
- Works for small amount of data or 1 agent, when it comes to scale, it fails
34+
- Low Reliability
35+
= Due uncertain results AI is not fully dependable
36+
37+
One of the solutions to these problems is to provide the right context to the AI, for it to understand better to fetch the right pieces of data to derive the right inference.
38+
However this is not easy. Because, the data relationships and types can vary. That is why SODA Contexture is trying to solve the issue of “Missing Context”
39+
through OCS and building various components connecting to provide enriched and structured context.
40+
41+
### System Architecture
42+
<img width="164" height="164" alt="image" src="https://github.com/user-attachments/assets/c9fc6cdd-8be9-4a1d-a825-ab5b7db10a28" />
43+
44+
SODA Contexture derives enriched context based on the OCS (Open Context Specification) implementation
45+
for the specific data sources and fills the issue of “Missing Context”. It builds the best possible
46+
context using its context building engine based on OCS for the input queries. Using this enhanced context
47+
AI models can understand the context better and fetch the right data (or data sets) to provide accurate
48+
inferences and insights.
49+
50+
<img width="368" height="146" alt="image" src="https://github.com/user-attachments/assets/0529f34d-fa4f-44a4-8846-7ed973a4c0f6" />
51+
52+
#### SODA Contexture Ecosystem Comprises of:
53+
- SODA Contexture Engine: The core component that processes user requests and orchestrates context generation.
54+
- Open Context Specification: The specification which details the operational context building attributes for various types of data.
55+
- Data Connectors: Logical Connectors to different types of data such as prometheus, sql, s3 and so on to understand the nature of data storage and layout. These connectors provide SODA Contexture a better idea to use the OCS to build the context better. These are logical connectors for specific data source
56+
- Context Providers: Sources that provide enriched context information (e.g., Istio, Kubernetes).
57+
58+
### Open Context Specification (OCS)
59+
OCS (Open Context Specification) provides the specification for operational data context spec for different kinds of data sources. It provides the key attributes to derive the best possible context to enable AI to provide more accurate results.
60+
61+
OCS Defines the key attributes to build the operational context:
62+
- Identity and Origin (The "Who" and "Where")
63+
- Defines the unique fingerprint of the data source.
64+
- AI needs this to distinguish between similar metrics from different environments
65+
- Dimensionality & Topology (The "Relationship")
66+
- Defines how this metric relates to other components
67+
- This is the most critical part for AI reasoning.
68+
- Metric Semantics (The "What")
69+
- Define what the number actually represents
70+
- This will avoid the AI comparing unrelated mertrics.
71+
- Temporal Context (The "When")
72+
- AI needs to know if it's looking at a "point-in-time" value or a trend.
73+
- Interval, Duration, Time stamp etc
74+
- Operational Constraints (The "How")
75+
- This tells the AI how to interpret the health of the metric.
76+
- Threshold, Polarity, Aggregation
77+
78+
### Progress
79+
We are actively developing the project. So if you would like to join the design, OCS and other components, please join us!
80+
81+
### How to join the development?
82+
- [GitHub](https://github.com/sodafoundation/contexture)
83+
- [SODA Slack](https://sodafoundation.slack.com)

0 commit comments

Comments
 (0)