Adjust proposal to Agentic workflows

pavolloffay · pavolloffay · commit 7402221f7126 · 2026-01-15T13:31:14.000+01:00
Signed-off-by: Pavol Loffay &lt;p.loffay@gmail.com&gt;
diff --git a/projects/mcp-server.md b/projects/mcp-server.md
@@ -1,14 +1,14 @@
-# OpenTelemetry Collector Model Context Protocol (MCP) Server
+# OpenTelemetry Agentic Workflows
 
 ## Background and description
 
 The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o).
 
-Large language models (LLMs), present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, analyze telemetry data, facilitate configuration changes, or assist and simplify the instrumentation process.
+Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process.
 
-However, to support this process, a standardized interface is required for LLMs to interact with the OpenTelemetry ecosystem. The Model Context Protocol (MCP) provides an idiomatic approach for this interaction.
+At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap.
 
-At the moment, the OpenTelemetry project does not have an official, standard MCP server. This has led to the creation of several independent, open-source projects to fill the gap.
+As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests.
 
 ### Existing OpenTelemetry MCP Servers
 
@@ -22,51 +22,78 @@ The proliferation of these projects demonstrates strong community interest and t
 * [traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop.
 
 Each of these servers uses a different approach, particularly for collector configuration and data profiling.
-This fragmentation creates confusion for users in terms of installation and configuration. It is less effective
-as multiple competing tools fill up context window and provide overlapping functionality.
+This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality.
+
+### Current challenges
+
+Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought.
+
+The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules.
+
+Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code.
 
 ## Project Scope and Architecture
 
-The scope of this project is to create OpenTelemetry MCP server(s) to simplify deployment and day-2 operations.
-The MCP server should also provide data profiling/intelligence capabilities to support the day-2 operation use-cases.
+The scope of this project is to enable **Agentic Workflows** for OpenTelemetry to simplify deployment, configuration, and day-2 operations across the OpenTelemetry project (collectors, SDKs, instrumentation, semantic conventions). To support this process, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. For instance [The Model Context Protocol (MCP)](https://github.com/model-context-protocol/model-context-protocol) or [Agent Skills](https://github.com/agent-skills/agent-skills) provide an idiomatic approach for this interaction.
 
 ### Goals, objectives, and requirements
 
-The goals are divided into categories. However, some common goals apply to all created MCP(s):
+Our goals are categorized by the OpenTelemetry components they integrate with agentic workflows. Since deploying a telemetry pipeline typically involves multiple components (instrumentation, semantic conventions, collector, etc.), agentic workflows must be able to span across them.
 
-* A common installation and configuration
+The initial focus will be on integrating the following areas:
 
 #### Collector
 
-* Deployment, configuration and management
-* Simplify writing OpenTelemetry Transformation Language (OTTL)
-* Simplify writing PII rules based on the received data
+The Collector follows a fast two-week release cadence, which requiries constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows.
+
+* Enable agents to read and write valid Collector configuration.
+* Enable agents to handle API breaking changes (e.g. deprecations, removals, renamings) in the configuration and collector Golang API.
+* Enable agents to upgrade collector.
+* Enable agents to write valid OpenTelemetry Transformation Language (OTTL).
+* Enable agents to troubleshoot collector issues.
+
+#### Semantic Conventions
+
+The Semantic Convention registry contains a large number of entries. They can be hard to grasp, easy to miss, and sometimes difficult to find. An agent can provide concrete recommendations about which attributes to use and which to avoid, but this requires tooling that condenses the registry into context-optimized pieces to avoid polluting the context window.
+
+* Provide context-optimized querying of the Semantic Conventions registry.
+* Enable agents to assist with maintaining codebases to add and update semantic conventions, potentially integrating with [Weaver](https://github.com/open-telemetry/weaver).
+
+#### Instrumentation & SDKs
 
-#### Instrumentation
+Instrumentation involves SDK setup, configuration, and code. Each step has its own challenges, and comes with a certain complexity. OpenTelemetry's documentation covers these topics extensively, but isn't an AI agent friendly format to provide those information efficiently. Surfacing the right documentation alongside code analysis can make the instrumentation process easier and assist with producing valid code.
 
-* SDK configuration
-* Auto-instrumentation configuration
-* Identify instrumentation issues: single span traces, broken traces, high cardinality attributes
+* Enable agents to discover and configure SDK and auto-instrumentation.
+* Enable agents to analyze instrumentation quality (detecting broken traces, missing context).
+* Enable agents to surface relevant documentation during instrumentation workflows.
 
-#### Semantic conventions
+#### Documentation and distribution
 
-* Weaver schema generation
-* Context optimized querying of the official semantic conventions registry
+Coherent documentation and distribution of the agentic workflows are required to enable users to efficiently manage the context window and avoid overlapping functionality.
+
+* Introduce documentation for the Agentic Workflows.
+* Align distribution and installation of the components with the Agentic Workflows.
 
 ### Non Goals
 
-* MCP servers should not implement any telemetry backend related use-cases.
-* MCP servers should not have a shadow knowledge base or documentation, they will pull this information from docs, upstream repositories, and [ecosystem explorer](https://github.com/open-telemetry/community/pull/3000).
+* The project will not implement any telemetry backends.
+* The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation.
 
 ## Deliverables
 
-* Collector MCP server
-  * Configuration use-cases
-  * Data profiling use-cases: writing PII rules, high cardinality attributes, broken traces, single span traces
-* Standalone MCP Server
-  * Instrumentation use-cases
-  * Collector provisioning and configuration use-cases
-  * Understanding changes in released artifacts
+The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows.
+The deliverables are ordered based on the priority the project team deems them to be.
+
+### 1. Collector
+* MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector.
+* MCP server or agentic skill to troubleshoot collector issues.
+
+### 2. Semantic Conventions
+* MCP server or agentic skill to query the Semantic Conventions registry.
+
+### 3. Instrumentation & SDKs
+* MCP server or agentic skill to discover and configure SDK and auto-instrumentation.
+* MCP server or agentic skill to analyze instrumentation quality (detecting broken traces, missing context).
 
 ## Staffing / Help Wanted
 
@@ -117,27 +144,22 @@ There will be [OpenTelemetry MCP call for contributors post](https://github.com/
 This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is
 confirmed and expected time commitments are known, this timeline is in flux.
 
-Phase 1: Static collector configuration (Months 1-2)
-- OpenTelemetry collector configuration
-
-Phase 2: Data profiling via collector (Months 1-2)
-- OpenTelemetry collector extension which provides API to query and profile the processed telemetry data
-- Data volume attribution
-
-Phase 3: Instrumentation (Months 1-2)
-- Identify broken traces
-- Identify single span traces
-- Identify high cardinality metrics
-- Instrumentation configuration
+Phase 1: Collector use-cases
+Phase 2: Semantic conventions use-cases
+Phase 3: Instrumentation use-cases
 
 ## Labels
 
-`mcp` for all PRs and issues related to this project.
+`agentic-workflow`, `mcp` for all PRs and issues related to this project.
 
 ## GitHub Project (Post-Approval)
 
 TBD
 
+## Githus Repository
+
+* Request: https://github.com/open-telemetry/community/issues/3198
+
 ## SIG Meetings, Roadmap, and Other Info (Post-Approval)
 
 [Developer Experience SIG](https://github.com/open-telemetry/community?tab=readme-ov-file#sig-devex)