|
| 1 | +# OTel Blueprints |
| 2 | + |
| 3 | +## Background and description |
| 4 | + |
| 5 | +This project aims to deliver a set of architecture blueprints, with the goal of facilitating and guiding adoption of best practices when deploying OpenTelemetry on a defined set of common environments. |
| 6 | +We'd like these blueprints to be backed by evidence in the form of reference architectures shared by end users. |
| 7 | + |
| 8 | +The end-goal is to provide holistic, incremental, high-level guidance that any adopter can apply across their environments, resulting in mature architectures ready for production use, at scale. |
| 9 | + |
| 10 | +### Current challenges |
| 11 | + |
| 12 | +These are some of the high-level adoption challenges that this project aims to help with, some of which are unique to OpenTelemetry as a cross-cutting concern. |
| 13 | + |
| 14 | +#### Adopting OpenTelemetry is a cross-functional effort, likely involving many roles |
| 15 | + |
| 16 | +Adopting OpenTelemetry implies changes in multiple parts of an organization. |
| 17 | +The components required for a complete implementation are naturally distributed across different areas of responsibility. |
| 18 | + |
| 19 | +For instance, application teams or library maintainers interact with the OpenTelemetry API to add domain-specific instrumentation. |
| 20 | +Platform teams often aim to provide consistent SDK configuration, while supporting centralized telemetry pipelines. |
| 21 | +While infrastructure teams may be responsible for ensuring telemetry from hosts and other devices is collected in a standard fashion. |
| 22 | + |
| 23 | +When these efforts are not coordinated, the resulting telemetry can become fragmented, and adoption suffers, failing to deliver the end-to-end observability part of OTel's core promise. |
| 24 | + |
| 25 | +#### There is no "one-size-fits-all" architecture |
| 26 | + |
| 27 | +While OTel adoption may trigger new conversations about platform engineering strategy (i.e. [Reverse Conway Maneuver](https://www.agileanalytics.cloud/blog/team-topologies-the-reverse-conway-manoeuvre)), the project's goal is to cater to all organizational structures, not to force a specific one. |
| 28 | + |
| 29 | +The resulting architectures will (and should) look different depending on the organization's model. |
| 30 | +For instance: |
| 31 | + |
| 32 | +- A company with federated, autonomous teams might favour a pattern of team-level Collectors routing to a central gateway. |
| 33 | +- An organization with a strong central platform team might provide a "paved road" via a fully managed Collector layer and base SDK configurations. |
| 34 | + |
| 35 | +Both are valid approaches. |
| 36 | +The challenge is that our guidance must be flexible enough to present these different patterns, acknowledging that a "one-size-fits-all" deployment model will not work. |
| 37 | + |
| 38 | +#### Documentation is typically focused on specific solutions, not challenges |
| 39 | + |
| 40 | +The existing OpenTelemetry user documentation is rightly focused on providing and describing solutions. |
| 41 | +It's great at explaining what a specific component is, how to configure an SDK, or how to deploy a Collector at scale. |
| 42 | +This is essential for a technical project. |
| 43 | + |
| 44 | +The gap, however, is in connecting these solutions to a path forward for common adoption challenges. |
| 45 | +Adopters often start with a problem, such as "How do I provide stable SDK config across multiple languages?" or "How do I build a scalable, multi-tenant gateway?". |
| 46 | + |
| 47 | +Blueprints must bridge this gap, starting from the problem and mapping it to a set of principles and actionable patterns. |
| 48 | + |
| 49 | +#### Feedback is often component-specific, not strategic |
| 50 | + |
| 51 | +Currently, feedback in OpenTelemetry is mainly gathered via surveys and interviews conducted by the End-User SIG. |
| 52 | +These are normally focused on specific components, or helping specific SIGs prioritize work. |
| 53 | + |
| 54 | +This creates a risk that development efforts in different parts of OTel are not always informed by the most pressing optimizations from the perspective of adoption. |
| 55 | +We may be optimizing components in a silo, while a user's main pain point is connecting them. |
| 56 | +These blueprints, by capturing common patterns, can serve as that feedback mechanism to help guide the project's priorities. |
| 57 | + |
| 58 | +#### Sharing learnings from highly regulated environments |
| 59 | + |
| 60 | +Signing up to an _OTel in Practice_ or _OTel Me_ session organized by the End-User SIG is not always easy, or even an option, for end users in highly regulated environments. |
| 61 | +This is due to the inherent lack of framework or standard format in these sessions, paired with rules and regulations in place in these organizations to avoid publicly sharing sensitive information. |
| 62 | + |
| 63 | +### Goals, objectives, and requirements |
| 64 | +#### Goals |
| 65 | + |
| 66 | +The high-level goals of this project are to: |
| 67 | + |
| 68 | +- Enable scalable adoption of OpenTelemetry by providing clear, challenge-oriented guidance. |
| 69 | +- Improve feedback loops from end-users to maintainers, capturing common patterns and challenges to help guide future development. |
| 70 | +- Provide a set of templates to capture reference architectures and design blueprints, allowing end users to easily communicate to stakeholders in their organization the type of information that will be publicly shared. |
| 71 | + |
| 72 | +#### Objectives |
| 73 | + |
| 74 | +To achieve these goals, this project will: |
| 75 | + |
| 76 | +- Define a standard, repeatable process for capturing and publishing end-user reference architectures. |
| 77 | +- Define a standard, strategic template for authoring blueprints that map common challenges to OTel-based solutions. |
| 78 | +- Publish an initial set of 5 reference architectures from end users that have successfully adopted OpenTelemetry at scale. |
| 79 | +- Identify most common 3 environments and challenges as the base for an initial set of blueprints. |
| 80 | +- Publish this initial set of 3 blueprints, collating best practices as seen in the field. |
| 81 | +- Establish a clear, discoverable location for this content on the OpenTelemetry website, managed by the End-User SIG. |
| 82 | + |
| 83 | +**Note:** DevEx SIG has already been documenting reference architectures with end-users. |
| 84 | +They have so far conducted 4 interviews and document them as reference architectures. |
| 85 | +Ideally, all these reference architectures will be hosted in the same space as others. |
| 86 | + |
| 87 | +#### Why now? |
| 88 | + |
| 89 | +OpenTelemetry has successfully moved passed the "early adopter" stage. |
| 90 | +New waves of adopters are typically composed of platform teams in large organizations. |
| 91 | +They require common, vendor-neutral guidance to piece together a large-scale strategy from low-level component documentation. |
| 92 | +They need a "paved road" and a set of proven best practices. |
| 93 | +Providing this guidance is one of the biggest levers we can pull to accelerate widespread, successful adoption. |
| 94 | + |
| 95 | +## Deliverables |
| 96 | + |
| 97 | +This project will output two types of deliverables: |
| 98 | + |
| 99 | +- **Reference architectures**: Similar to [CNCF reference architectures](https://architecture.cncf.io/architectures), scoped to OpenTelemetry (potentially cross-shared between these). |
| 100 | +These will share how different companies or institutions, under different organizational structures and technology stacks, are approaching OpenTelemetry adoption, and the outcomes it has delivered. |
| 101 | +- **Blueprints**: Focused on a given environment, these will give specific guidance to solve common challenges. |
| 102 | +The format of these blueprints will be discussed as part of this project, however the general proposal is to follow popular forms of [strategic documentation](https://itsadeliverything.com/good-strategy-bad-strategy-the-difference-and-why-it-matters-by-richard-rumelt). |
| 103 | +For each of them, we'll identify: |
| 104 | + 1. The main challenges the blueprint will solve, and the scope it applies to. |
| 105 | + 2. The guiding principles and best practices that solve these challenges. |
| 106 | + 3. Individual actions to implement these best practices, linking to more specific guidance in order to avoid duplication of existing parts of the OpenTelemetry documentation (e.g. getting started, SDK config, Collector deployment patterns, etc). |
| 107 | + |
| 108 | +For both of these, this project aims to define templates and processes in order to make it easier to contribute both new reference architectures or blueprints. |
| 109 | + |
| 110 | +After this project is complete, the End User SIG will expand the library of reference architectures and blueprints as part of their BAU operation. |
| 111 | + |
| 112 | +## Staffing / Help Wanted |
| 113 | + |
| 114 | +### Industry outreach |
| 115 | + |
| 116 | +End users were contacted during KubeCon NA, providing very positive feedback in this initiative and willingness to contribute. |
| 117 | + |
| 118 | +Solutions/observability architects/consultants from organizations like New Relic, Splunk and Grafana were contacted and are interested in joining this effort. |
| 119 | + |
| 120 | +We will also reach out to past guests of sessions organized by the End-User SIG to encourage their participation. |
| 121 | + |
| 122 | +### SIG |
| 123 | +End-User SIG & DevEx SIG |
| 124 | + |
| 125 | +### Required staffing |
| 126 | +See [Project Staffing](/project-management.md#project-staffing) |
| 127 | + |
| 128 | +#### Project Leads(s) |
| 129 | +Dan Gomez Blanco (@danielgblanco) |
| 130 | +Damien Mathieu (@dmathieu) |
| 131 | + |
| 132 | +#### Other Staffing |
| 133 | + |
| 134 | +- Contributors/architects willing to help coordinate with end-users, create templates, analyze reference architectures, and write up blueprints: |
| 135 | + - Jacob Aronoff (@jaronoff97) |
| 136 | + - Lukasz Ciukaj (@luke6Lh43) |
| 137 | + - Alain Pham (@alainpham) |
| 138 | + - ChaosKyle (@ChaosKyle) |
| 139 | + - Brad Schmitt (@bpschmitt) |
| 140 | +- End-Users willing to contribute reference architectures: |
| 141 | + - Neil Fordyce, Skyscanner |
| 142 | +- Maintainers/approvers from Comms SIG to help reviewing and copy editing |
| 143 | + - Tiffany Hrabusa (@tiffany76 ) |
| 144 | +- Others |
| 145 | + |
| 146 | +### Sponsorship |
| 147 | +See [Project Sponsorship](/project-management.md#project-sponsorship) |
| 148 | + |
| 149 | +#### TC Sponsor |
| 150 | +Reiley Yang (@reyang) |
| 151 | + |
| 152 | +#### Delegated TC Sponsor (Optional) |
| 153 | +TBD |
| 154 | + |
| 155 | +#### GC Liaison |
| 156 | +Marylia Gutierrez (@maryliag) |
| 157 | + |
| 158 | +## Expected Timeline |
| 159 | + |
| 160 | +- 1 month: Decide on initial format for reference architectures and blueprint documents, and which verticals/architecture types to write blueprints for. |
| 161 | +- 3-6 moths: Gather and document reference architectures from end users, identify most common challenges, and collate blueprints. |
| 162 | + |
| 163 | +## Labels |
| 164 | + |
| 165 | +`otel-blueprints` |
| 166 | + |
| 167 | +## GitHub Project (Post-Approval) |
| 168 | + |
| 169 | +**TO-DO** |
| 170 | + |
| 171 | +## SIG Meetings, Roadmap, and Other Info (Post-Approval) |
| 172 | + |
| 173 | +* Slack channel: [#otel-sig-end-user](https://cloud-native.slack.com/archives/C01RT3MSWGZ) |
| 174 | +* Meeting notes: [End-User SIG Meeting Notes](https://docs.google.com/document/d/1e-UNZA3Tuno9b53RQbe--whUcO0VIXF3P81oXsrBK6g) |
| 175 | +* Meeting times: Every other Thursday at 10:00 PT |
| 176 | + |
| 177 | +**TO-DO**: Roadmap item will be added after new GH project is created. |
0 commit comments