Skip to content

Conversation

@jaydeluca
Copy link
Member

@jaydeluca jaydeluca commented Sep 19, 2025

This PR contains a project proposal for a new standalone "ecosystem explorer" documentation website.

POC can be seen here: https://jaydeluca.github.io/instrumentation-explorer/

Note: This project proposal is dependent on identifying collaborators for the staffing bit. We will use this project proposal doc to socialize the effort in hopes of finding people interested in participating.

Related:

Copy link
Member

@svrnm svrnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for leading this @jaydeluca -- this also replaces #2246

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Collector intentionally out of scope? From the discussions I have had with @svrnm it seems like this should be similar enough (instead of libraries we would be talking about components), so I feel like it would be a good idea to include it as well

@jaydeluca
Copy link
Member Author

Is the Collector intentionally out of scope? From the discussions I have had with @svrnm it seems like this should be similar enough (instead of libraries we would be talking about components), so I feel like it would be a good idea to include it as well

@mx-psi yes and no. It's not entirely left out of the scope (see here), I have added evaluating the feasibility of this approach for both the collector and javascript as within scope, but I personally cannot commit to doing all the legwork for those as well. If we were to commit to a more concrete deliverable for those two projects, I think we will need a larger team.

Do you think the language I used around this isn't clear, or do you think it should be changed?

@mx-psi
Copy link
Member

mx-psi commented Sep 19, 2025

Is the Collector intentionally out of scope? From the discussions I have had with @svrnm it seems like this should be similar enough (instead of libraries we would be talking about components), so I feel like it would be a good idea to include it as well

@mx-psi yes and no. It's not entirely left out of the scope (see here), I have added evaluating the feasibility of this approach for both the collector and javascript as within scope, but I personally cannot commit to doing all the legwork for those as well. If we were to commit to a more concrete deliverable for those two projects, I think we will need a larger team.

Do you think the language I used around this isn't clear, or do you think it should be changed?

I guess my concern is with the naming, maybe something like "Ecosystem Documentation", and "Ecosystem explorer" would make people think that the Collector is (potentially) included, the current naming seems more focused on language libraries

@jaydeluca
Copy link
Member Author

guess my concern is with the naming, maybe something like "Ecosystem Documentation", and "Ecosystem explorer" would make people think that the Collector is (potentially) included, the current naming seems more focused on language libraries

Ah yes, that makes sense, I can update. Thanks @mx-psi

@jaydeluca jaydeluca changed the title Project Proposal: Instrumentation Documentation Project Proposal: Ecosystem Explorer Sep 19, 2025
@svrnm
Copy link
Member

svrnm commented Sep 19, 2025

@jaydeluca can you add a section about the existing registry, how this project and the registry are related to each other, for me it would be totally fine to say that this is going to replace the registry eventually.

@jaydeluca jaydeluca marked this pull request as ready for review September 19, 2025 14:31
@thompson-tomo
Copy link
Contributor

thompson-tomo commented Oct 1, 2025

So I have come here from the comment in the backstage channel.

Thinking about this what are the thoughts about building a backstage plugin to add support for open telemetry in a similar fashion to api's (https://github.com/backstage/backstage/blob/master/plugins/api-docs/README.md). We could then use backstage rather than building a tool for the ecosystem explorer.

I would see the development steps as being:

  • introducing the telemetry section to list the signals and show the definitions in backstage
  • Allow components (libraries) to define what signals they produce and link it to the definition just like they can for apis.
  • Provide a way to show technology specific docs ie oracledb which provides descriptive info and the corresponding signals.

I also think it would be beneficial to ensure that weaver can be used to generate the file to add to the ecosystem Explorer so that as a user has one tool to use.

A nice thing I forsee is that we could potentially focus the sem conv specification section of the website on defining the base signals and the informative info is captured in the eco system Explorer.

@svrnm svrnm added the area/project-proposal Submitting a filled out project template label Oct 1, 2025
@jaydeluca
Copy link
Member Author

Thinking about this what are the thoughts about building a backstage plugin to add support for open telemetry in a similar fashion to api's (https://github.com/backstage/backstage/blob/master/plugins/api-docs/README.md). We could then use backstage rather than building a tool for the ecosystem explorer.

Thanks for raising this @thompson-tomo, it is something we also considered and explored a bit.

I think the main concern is the tradeoff in complexity. Backstage brings a lot of functionality, but it also requires provisioning infrastructure, ongoing upgrades, and database maintenance. In my experience, it often needs a dedicated team to keep it running smoothly. For our use case, much of that extra functionality isn't really essential, and could actually slow us down.

The initial POC focused on a static-site approach which is much simpler to operate, has almost no ongoing maintenance burden, and still meets our immediate needs. Personally, I'd much prefer a solution that requires as little operational overhead as possible.

@trask
Copy link
Member

trask commented Oct 1, 2025

+1 on optimizing for low operational overhead since that's an area where we struggle as an Open Source project

@thompson-tomo
Copy link
Contributor

I am all for a lower operational overhead but I do get worried when I see suggestions to build something as that can create tech debt especially when it is not a key part of the organisation business/objective. Hence suggestion to use an established product.

With the objective of low operational effort and static site generation in mind, we should be exploring how weaver can be contribute to the solution.

What I could foresee is:

  • User defines a weaver registry file which imports signals from the sem conv registry
  • User refines the imported signals based on what they are implementing
  • User defines either a seperate implementation metadata file or adds it to the registry file. TBD
  • Weaver codegen runs ie weaver implementation generate for the refined signals and metadata. This would produce boilerplate code which is used in the implementation and exports a yaml definition file (resolved schema).
  • the exported yaml could be added to the eco-system which is proceeded by weaver implementation describe to emit the static content.

All of that would be reusable especially if the repo/package readme could also be generated in the same manner.

@jack-berg
Copy link
Member

When I read this proposal, I see this project as a an attempt to build a new registry (registry 2.0). I'm very supportive of this as I've been complaining about various deficiencies of the registry for some time.

Significantly, the proposal is to launch the new effort in parallel to the existing registry, allowing the contributors to iterate quickly and avoid the friction of a bunch of up front data model design work and consensus gathering.

Later, once we've worked out the sharp edges, we can merge registry 1.0 into registry 2.0, and EOL registry 1.0.

Some thoughts:

  • The registry already has a data model for capturing meta data about components (schema here), but its insufficient. As you note, there's all sorts of data which is important to capture and visualize for end users, which as of now, is very difficult for users to discover. The key examples being: configuration schema, schema of telemetry emitted, instructions on how to get started.
  • The registry already has tools for searching for components, and displaying bits of information about them, but its insufficient. More than anything else, the registry is an index of links, requiring the user to follow the link and find installation / configuration / telemetry schema details in the linked website / repository. Except this information is no always available, and when it is, there's no standardization about the display. A richer registry schema would allow a richer / more useful visualization experience.
  • The registry already has tooling for scanning for project repositories and creating / updating registry component entries. The problem is, its all centralized in the opentelemetry.io project and standardized, meaning that its only possible to scrape the types of meta data which can scraped from a central script. In his prototype, @jaydeluca has built tooling which leverages domain knowledge specific to the opentelemetry-java-instrumentation project, and is therefore able to generate much richer meta data that is captured by the registry today. Any effort for a better registry is probably going to require some coordination between the repos where components live (i.e SDK, instrumentation, contrib, collector, etc) and the opentelemetry.io repo. Some shared tooling can / should be centralized in opentelemetry.io, but the repos should also work to do a better job of publishing the types of meta data that can't be generated through simple scanner scripts.

@svrnm
Copy link
Member

svrnm commented Nov 18, 2025

When I read this proposal, I see this project as a an attempt to build a new registry (registry 2.0). I'm very supportive of this as I've been complaining about various deficiencies of the registry for some time.

It absolutely is! That's why I am in favour of this project so much. The registry has been a building block of our project from day 1, but besides a few cosmetic changes we never lifted it to a proper solution.

Significantly, the proposal is to launch the new effort in parallel to the existing registry, allowing the contributors to iterate quickly and avoid the friction of a bunch of up front data model design work and consensus gathering.

Yes 🙌

Later, once we've worked out the sharp edges, we can merge registry 1.0 into registry 2.0, and EOL registry 1.0.

I can barely wait for it ;)

* The registry already has a data model for capturing meta data about components (schema [here](https://github.com/open-telemetry/opentelemetry.io?rgh-link-date=2025-11-10T15%3A29%3A30Z)), but its insufficient. As you note, there's all sorts of data which is important to capture and visualize for end users, which as of now, is very difficult for users to discover. The key examples being: configuration schema, schema of telemetry emitted, instructions on how to get started.

Yes, we can learn and take from the existing schema, but overall it's insufficient.

* The [registry](https://opentelemetry.io/ecosystem/registry/) already has tools for searching for components, and displaying bits of information about them, but its insufficient. More than anything else, the registry is an index of links, requiring the user to follow the link and find installation / configuration / telemetry schema details in the linked website / repository. Except this information is no always available, and when it is, there's no standardization about the display. A richer registry schema would allow a richer / more useful visualization experience.

The registry as of today does what it was intended to do, provide a quick glance into all the things opentelemetry. but as the project evolved, the registry didn't, and as you lined out it's missing a lot of details and nuances, so this project aims to help with that.

* The registry already has [tooling](https://github.com/open-telemetry/opentelemetry.io/blob/main/scripts/registry-scanner/index.mjs?rgh-link-date=2025-11-10T15%3A29%3A30Z) for scanning for project repositories and creating / updating registry component entries. The problem is, its all centralized in the `opentelemetry.io` project and standardized, meaning that its only possible to scrape the types of meta data which can scraped from a central script. In his prototype, @jaydeluca has built tooling which leverages domain knowledge specific to the `opentelemetry-java-instrumentation` project, and is therefore able to generate much richer meta data that is captured by the registry today. Any effort for a better registry is probably going to require some coordination between the repos where components live (i.e SDK, instrumentation, contrib, collector, etc) and the `opentelemetry.io` repo. Some shared tooling can / should be centralized in `opentelemetry.io`, but the repos should also work to do a better job of publishing the types of meta data that can't be generated through simple scanner scripts.

Those scripts (like most moving pieces of the registry) are hotfixes and bandage, so having having dedicated tooling and people taking care of it, is part of this proposal. I don't think it necessarily belongs into the dotio repo, in my mind we would have a repository who holds all the tooling to crawl that data and makes it accessible to consumers like opentelemetrydotio, but maybe others as well (think https://artifacthub.io/, backstage, CLI tools, etc.)

jaydeluca and others added 2 commits November 20, 2025 14:06
Co-authored-by: Severin Neumann <severin.neumann@altmuehlnet.de>
@thompson-tomo
Copy link
Contributor

As another thought have we considered leveraging artifact hub from the cncf and potentially just embedding an open telemetry collector plugin category as mentioned is possible in https://artifacthub.io/docs/topics/embedding_artifacts/

@svrnm
Copy link
Member

svrnm commented Dec 1, 2025

As another thought have we considered leveraging artifact hub from the cncf and potentially just embedding an open telemetry collector plugin category as mentioned is possible in artifacthub.io/docs/topics/embedding_artifacts

Yes, we have considered that, it's one of the potential upstream receivers of this project.

@trask
Copy link
Member

trask commented Dec 1, 2025

@jaydeluca updating from main should resolve the link failure

@trask trask added this pull request to the merge queue Dec 1, 2025
Merged via the queue into open-telemetry:main with commit 62dc8f7 Dec 1, 2025
7 checks passed
@svrnm
Copy link
Member

svrnm commented Dec 2, 2025

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/project-proposal Submitting a filled out project template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants