You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: projects/mcp-server.md
+64-42Lines changed: 64 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
-
# OpenTelemetry Collector Model Context Protocol (MCP) Server
1
+
# OpenTelemetry Agentic Workflows
2
2
3
3
## Background and description
4
4
5
5
The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o).
6
6
7
-
Large language models (LLMs), present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, analyze telemetry data, facilitate configuration changes, or assist and simplify the instrumentation process.
7
+
Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process.
8
8
9
-
However, to support this process, a standardized interface is required for LLMs to interact with the OpenTelemetry ecosystem. The Model Context Protocol (MCP) provides an idiomatic approach for this interaction.
9
+
At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap.
10
10
11
-
At the moment, the OpenTelemetry project does not have an official, standard MCP server. This has led to the creation of several independent, open-source projects to fill the gap.
11
+
As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests.
12
12
13
13
### Existing OpenTelemetry MCP Servers
14
14
@@ -22,51 +22,78 @@ The proliferation of these projects demonstrates strong community interest and t
22
22
*[traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop.
23
23
24
24
Each of these servers uses a different approach, particularly for collector configuration and data profiling.
25
-
This fragmentation creates confusion for users in terms of installation and configuration. It is less effective
26
-
as multiple competing tools fill up context window and provide overlapping functionality.
25
+
This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality.
26
+
27
+
### Current challenges
28
+
29
+
Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought.
30
+
31
+
The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules.
32
+
33
+
Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code.
27
34
28
35
## Project Scope and Architecture
29
36
30
-
The scope of this project is to create OpenTelemetry MCP server(s) to simplify deployment and day-2 operations.
31
-
The MCP server should also provide data profiling/intelligence capabilities to support the day-2 operation use-cases.
37
+
The scope of this project is to enable **Agentic Workflows** for OpenTelemetry to simplify deployment, configuration, and day-2 operations across the OpenTelemetry project (collectors, SDKs, instrumentation, semantic conventions). To support this process, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. For instance [The Model Context Protocol (MCP)](https://github.com/model-context-protocol/model-context-protocol) or [Agent Skills](https://github.com/agent-skills/agent-skills) provide an idiomatic approach for this interaction.
32
38
33
39
### Goals, objectives, and requirements
34
40
35
-
The goals are divided into categories. However, some common goals apply to all created MCP(s):
41
+
Our goals are categorized by the OpenTelemetry components they integrate with agentic workflows. Since deploying a telemetry pipeline typically involves multiple components (instrumentation, semantic conventions, collector, etc.), agentic workflows must be able to span across them.
36
42
37
-
* A common installation and configuration
43
+
The initial focus will be on integrating the following areas:
38
44
39
45
#### Collector
40
46
41
-
* Deployment, configuration and management
42
-
* Simplify writing OpenTelemetry Transformation Language (OTTL)
43
-
* Simplify writing PII rules based on the received data
47
+
The Collector follows a fast two-week release cadence, which requiries constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows.
48
+
49
+
* Enable agents to read and write valid Collector configuration.
50
+
* Enable agents to handle API breaking changes (e.g. deprecations, removals, renamings) in the configuration and collector Golang API.
51
+
* Enable agents to upgrade collector.
52
+
* Enable agents to write valid OpenTelemetry Transformation Language (OTTL).
53
+
* Enable agents to troubleshoot collector issues.
54
+
55
+
#### Semantic Conventions
56
+
57
+
The Semantic Convention registry contains a large number of entries. They can be hard to grasp, easy to miss, and sometimes difficult to find. An agent can provide concrete recommendations about which attributes to use and which to avoid, but this requires tooling that condenses the registry into context-optimized pieces to avoid polluting the context window.
58
+
59
+
* Provide context-optimized querying of the Semantic Conventions registry.
60
+
* Enable agents to assist with maintaining codebases to add and update semantic conventions, potentially integrating with [Weaver](https://github.com/open-telemetry/weaver).
61
+
62
+
#### Instrumentation & SDKs
44
63
45
-
#### Instrumentation
64
+
Instrumentation involves SDK setup, configuration, and code. Each step has its own challenges, and comes with a certain complexity. OpenTelemetry's documentation covers these topics extensively, but isn't an AI agent friendly format to provide those information efficiently. Surfacing the right documentation alongside code analysis can make the instrumentation process easier and assist with producing valid code.
46
65
47
-
* SDK configuration
48
-
*Auto-instrumentation configuration
49
-
*Identify instrumentation issues: single span traces, broken traces, high cardinality attributes
66
+
*Enable agents to discover and configure SDK and auto-instrumentation.
*Enable agents to surface relevant documentation during instrumentation workflows.
50
69
51
-
#### Semantic conventions
70
+
#### Documentation and distribution
52
71
53
-
* Weaver schema generation
54
-
* Context optimized querying of the official semantic conventions registry
72
+
Coherent documentation and distribution of the agentic workflows are required to enable users to efficiently manage the context window and avoid overlapping functionality.
73
+
74
+
* Introduce documentation for the Agentic Workflows.
75
+
* Align distribution and installation of the components with the Agentic Workflows.
55
76
56
77
### Non Goals
57
78
58
-
*MCP servers should not implement any telemetry backend related use-cases.
59
-
*MCP servers should not have a shadow knowledge base or documentation, they will pull this information from docs, upstream repositories, and [ecosystem explorer](https://github.com/open-telemetry/community/pull/3000).
79
+
*The project will not implement any telemetry backends.
80
+
*The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation.
60
81
61
82
## Deliverables
62
83
63
-
* Collector MCP server
64
-
* Configuration use-cases
65
-
* Data profiling use-cases: writing PII rules, high cardinality attributes, broken traces, single span traces
66
-
* Standalone MCP Server
67
-
* Instrumentation use-cases
68
-
* Collector provisioning and configuration use-cases
69
-
* Understanding changes in released artifacts
84
+
The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows.
85
+
The deliverables are ordered based on the priority the project team deems them to be.
86
+
87
+
### 1. Collector
88
+
* MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector.
89
+
* MCP server or agentic skill to troubleshoot collector issues.
90
+
91
+
### 2. Semantic Conventions
92
+
* MCP server or agentic skill to query the Semantic Conventions registry.
93
+
94
+
### 3. Instrumentation & SDKs
95
+
* MCP server or agentic skill to discover and configure SDK and auto-instrumentation.
96
+
* MCP server or agentic skill to analyze instrumentation quality (detecting broken traces, missing context).
70
97
71
98
## Staffing / Help Wanted
72
99
@@ -117,27 +144,22 @@ There will be [OpenTelemetry MCP call for contributors post](https://github.com/
117
144
This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is
118
145
confirmed and expected time commitments are known, this timeline is in flux.
0 commit comments