Skip to content

Commit 7402221

Browse files
committed
Adjust proposal to Agentic workflows
Signed-off-by: Pavol Loffay <[email protected]>
1 parent 597282f commit 7402221

File tree

1 file changed

+64
-42
lines changed

1 file changed

+64
-42
lines changed

projects/mcp-server.md

Lines changed: 64 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
# OpenTelemetry Collector Model Context Protocol (MCP) Server
1+
# OpenTelemetry Agentic Workflows
22

33
## Background and description
44

55
The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o).
66

7-
Large language models (LLMs), present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, analyze telemetry data, facilitate configuration changes, or assist and simplify the instrumentation process.
7+
Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process.
88

9-
However, to support this process, a standardized interface is required for LLMs to interact with the OpenTelemetry ecosystem. The Model Context Protocol (MCP) provides an idiomatic approach for this interaction.
9+
At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap.
1010

11-
At the moment, the OpenTelemetry project does not have an official, standard MCP server. This has led to the creation of several independent, open-source projects to fill the gap.
11+
As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests.
1212

1313
### Existing OpenTelemetry MCP Servers
1414

@@ -22,51 +22,78 @@ The proliferation of these projects demonstrates strong community interest and t
2222
* [traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop.
2323

2424
Each of these servers uses a different approach, particularly for collector configuration and data profiling.
25-
This fragmentation creates confusion for users in terms of installation and configuration. It is less effective
26-
as multiple competing tools fill up context window and provide overlapping functionality.
25+
This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality.
26+
27+
### Current challenges
28+
29+
Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought.
30+
31+
The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules.
32+
33+
Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code.
2734

2835
## Project Scope and Architecture
2936

30-
The scope of this project is to create OpenTelemetry MCP server(s) to simplify deployment and day-2 operations.
31-
The MCP server should also provide data profiling/intelligence capabilities to support the day-2 operation use-cases.
37+
The scope of this project is to enable **Agentic Workflows** for OpenTelemetry to simplify deployment, configuration, and day-2 operations across the OpenTelemetry project (collectors, SDKs, instrumentation, semantic conventions). To support this process, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. For instance [The Model Context Protocol (MCP)](https://github.com/model-context-protocol/model-context-protocol) or [Agent Skills](https://github.com/agent-skills/agent-skills) provide an idiomatic approach for this interaction.
3238

3339
### Goals, objectives, and requirements
3440

35-
The goals are divided into categories. However, some common goals apply to all created MCP(s):
41+
Our goals are categorized by the OpenTelemetry components they integrate with agentic workflows. Since deploying a telemetry pipeline typically involves multiple components (instrumentation, semantic conventions, collector, etc.), agentic workflows must be able to span across them.
3642

37-
* A common installation and configuration
43+
The initial focus will be on integrating the following areas:
3844

3945
#### Collector
4046

41-
* Deployment, configuration and management
42-
* Simplify writing OpenTelemetry Transformation Language (OTTL)
43-
* Simplify writing PII rules based on the received data
47+
The Collector follows a fast two-week release cadence, which requiries constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows.
48+
49+
* Enable agents to read and write valid Collector configuration.
50+
* Enable agents to handle API breaking changes (e.g. deprecations, removals, renamings) in the configuration and collector Golang API.
51+
* Enable agents to upgrade collector.
52+
* Enable agents to write valid OpenTelemetry Transformation Language (OTTL).
53+
* Enable agents to troubleshoot collector issues.
54+
55+
#### Semantic Conventions
56+
57+
The Semantic Convention registry contains a large number of entries. They can be hard to grasp, easy to miss, and sometimes difficult to find. An agent can provide concrete recommendations about which attributes to use and which to avoid, but this requires tooling that condenses the registry into context-optimized pieces to avoid polluting the context window.
58+
59+
* Provide context-optimized querying of the Semantic Conventions registry.
60+
* Enable agents to assist with maintaining codebases to add and update semantic conventions, potentially integrating with [Weaver](https://github.com/open-telemetry/weaver).
61+
62+
#### Instrumentation & SDKs
4463

45-
#### Instrumentation
64+
Instrumentation involves SDK setup, configuration, and code. Each step has its own challenges, and comes with a certain complexity. OpenTelemetry's documentation covers these topics extensively, but isn't an AI agent friendly format to provide those information efficiently. Surfacing the right documentation alongside code analysis can make the instrumentation process easier and assist with producing valid code.
4665

47-
* SDK configuration
48-
* Auto-instrumentation configuration
49-
* Identify instrumentation issues: single span traces, broken traces, high cardinality attributes
66+
* Enable agents to discover and configure SDK and auto-instrumentation.
67+
* Enable agents to analyze instrumentation quality (detecting broken traces, missing context).
68+
* Enable agents to surface relevant documentation during instrumentation workflows.
5069

51-
#### Semantic conventions
70+
#### Documentation and distribution
5271

53-
* Weaver schema generation
54-
* Context optimized querying of the official semantic conventions registry
72+
Coherent documentation and distribution of the agentic workflows are required to enable users to efficiently manage the context window and avoid overlapping functionality.
73+
74+
* Introduce documentation for the Agentic Workflows.
75+
* Align distribution and installation of the components with the Agentic Workflows.
5576

5677
### Non Goals
5778

58-
* MCP servers should not implement any telemetry backend related use-cases.
59-
* MCP servers should not have a shadow knowledge base or documentation, they will pull this information from docs, upstream repositories, and [ecosystem explorer](https://github.com/open-telemetry/community/pull/3000).
79+
* The project will not implement any telemetry backends.
80+
* The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation.
6081

6182
## Deliverables
6283

63-
* Collector MCP server
64-
* Configuration use-cases
65-
* Data profiling use-cases: writing PII rules, high cardinality attributes, broken traces, single span traces
66-
* Standalone MCP Server
67-
* Instrumentation use-cases
68-
* Collector provisioning and configuration use-cases
69-
* Understanding changes in released artifacts
84+
The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows.
85+
The deliverables are ordered based on the priority the project team deems them to be.
86+
87+
### 1. Collector
88+
* MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector.
89+
* MCP server or agentic skill to troubleshoot collector issues.
90+
91+
### 2. Semantic Conventions
92+
* MCP server or agentic skill to query the Semantic Conventions registry.
93+
94+
### 3. Instrumentation & SDKs
95+
* MCP server or agentic skill to discover and configure SDK and auto-instrumentation.
96+
* MCP server or agentic skill to analyze instrumentation quality (detecting broken traces, missing context).
7097

7198
## Staffing / Help Wanted
7299

@@ -117,27 +144,22 @@ There will be [OpenTelemetry MCP call for contributors post](https://github.com/
117144
This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is
118145
confirmed and expected time commitments are known, this timeline is in flux.
119146

120-
Phase 1: Static collector configuration (Months 1-2)
121-
- OpenTelemetry collector configuration
122-
123-
Phase 2: Data profiling via collector (Months 1-2)
124-
- OpenTelemetry collector extension which provides API to query and profile the processed telemetry data
125-
- Data volume attribution
126-
127-
Phase 3: Instrumentation (Months 1-2)
128-
- Identify broken traces
129-
- Identify single span traces
130-
- Identify high cardinality metrics
131-
- Instrumentation configuration
147+
Phase 1: Collector use-cases
148+
Phase 2: Semantic conventions use-cases
149+
Phase 3: Instrumentation use-cases
132150

133151
## Labels
134152

135-
`mcp` for all PRs and issues related to this project.
153+
`agentic-workflow`, `mcp` for all PRs and issues related to this project.
136154

137155
## GitHub Project (Post-Approval)
138156

139157
TBD
140158

159+
## Githus Repository
160+
161+
* Request: https://github.com/open-telemetry/community/issues/3198
162+
141163
## SIG Meetings, Roadmap, and Other Info (Post-Approval)
142164

143165
[Developer Experience SIG](https://github.com/open-telemetry/community?tab=readme-ov-file#sig-devex)

0 commit comments

Comments
 (0)