Add OpenTelemetry tracing provider to the container (disabled by default)#37227
Merged
Conversation
…ult) Introduces the no-op-by-default infrastructure for native OpenTelemetry (OTel) tracing in the container. No tracing happens until explicitly enabled via the opentelemetry-sdk feature flag. - container-opentelemetry: new Felix bundle embedding the OTel SDK + OTLP HTTP exporter (jdk sender, no okhttp/kotlin); exports the api/context/sdk packages. Pre-installed into the container. - OpenTelemetryProvider (container-disc): Provider<OpenTelemetry> registered in all container types. Hands out OpenTelemetry.noop() when disabled; builds the real SDK (OTLP/HTTP, batching, parent-based ratio sampling, W3C propagation) only when enabled. Builds the OTel Resource from the resource-attribute map. - telemetry.def: enabled / endpoint / samplingRatio + a resourceAttribute map. ContainerCluster fills the map with deployment identity available in the model (application, tenant, cluster.type, cluster.id). - opentelemetry-sdk JSON feature flag (enabled/endpoint/samplingRatio), threaded through ModelContext -> ContainerCluster.getConfig. Takes effect at redeployment. - dependency-enforcer + abi-spec updated accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…void null endpoint - container-dev: add the OpenTelemetry SDK + OTLP exporter artifacts so the in-process container test harness (StandaloneContainerApplication) can load OpenTelemetryProvider, which is now registered in every container. In production these come from the pre-installed container-opentelemetry bundle via OSGi; container-dev is the flat test classpath, mirroring vespa-3party-bundles / container-onnxruntime. - Never hand back a null endpoint when disabled: the telemetry.def endpoint field is mandatory, so the generated config builder rejects null. OpenTelemetryConfiguration.disabled() and OpenTelemetrySettings now use an empty string (still no localhost default, so nothing is mistakenly sent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The SDK must not leak onto the 3rd-party container classpath (container-dependencies-enforcer). Keep the public surface to the OTel API only: - Split OpenTelemetrySdkBuilder out of OpenTelemetryProvider. The provider now references only the OTel API; the SDK-building class is loaded lazily, only when tracing is enabled. So the in-process container test harness needs only the API to instantiate the disabled, no-op provider. - container-dev: depend on opentelemetry-api only (was sdk + exporter). The SDK is supplied at runtime by the pre-installed container-opentelemetry bundle. - container-dependencies-enforcer: whitelist opentelemetry-api/context/common (provided). These are genuinely provided at runtime by the bundle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onal The bundle failed to resolve in a real container (system tests): bnd had generated mandatory Import-Package entries for packages the container does not provide (io.grpc, guava, the OTel incubator/autoconfigure SPI, jspecify, sun.misc). The earlier assumption that bnd marks these optional automatically was wrong. Everything the bundle actually uses at runtime is embedded (private) in the jar. Add an explicit Import-Package keeping only com.fasterxml.jackson.core mandatory (provided by the container) and marking all other computed imports resolution:=optional, so the bundle resolves without the unavailable deps it never exercises. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bjorncs
reviewed
Jun 17, 2026
Comment on lines
+90
to
+93
| <include>io.opentelemetry:opentelemetry-api:${opentelemetry.vespa.version}:provided</include> | ||
| <include>io.opentelemetry:opentelemetry-context:${opentelemetry.vespa.version}:provided</include> | ||
| <include>io.opentelemetry:opentelemetry-common:${opentelemetry.vespa.version}:provided</include> | ||
|
|
Member
There was a problem hiding this comment.
We should not expose otel libraries unless it's strictly required by applications.
onurkaracali
commented
Jun 17, 2026
| <version>${project.version}</version> | ||
| <type>pom</type> | ||
| </dependency> | ||
| <!-- OpenTelemetry API only: this is what the in-process container test harness needs to instantiate the |
Contributor
Author
There was a problem hiding this comment.
Add exclusion to container
…ontainer classpath - configdefinitions: export the generated ai.vespa.telemetry package (@ExportPackage package-info) so container-disc can import TelemetryConfig at runtime in a real container (OSGi). - container: exclude io.opentelemetry:* from the container-dev dependency so the public 3rd-party container classpath does not expose OpenTelemetry. It stays internal to the platform (container-disc), provided at runtime by the container-opentelemetry bundle. - container-dependencies-enforcer: drop the OpenTelemetry whitelist entries (no longer exposed). container-dev keeps opentelemetry-api so the in-process container test harness still resolves the no-op OpenTelemetryProvider. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…endencies The maven-bundle-plugin did not reliably embed deep transitive OTel modules across build environments (e.g. opentelemetry-common, 4 levels deep via sdk->api->context->common). When a module was not embedded, bnd emitted a mandatory versioned Import-Package for it, so the bundle failed OSGi resolution in the real container (system tests), cascading to container-disc and standalone-container. Declare all 12 OTel modules directly so Embed-Dependency embeds them as first-level deps instead of relying on Embed-Transitive. Same workaround pattern as container-apache-http-client-bundle. Verified: the built bundle imports only com.fasterxml.jackson.core and no io.opentelemetry packages. Also keep Import-Package limited to com.fasterxml.jackson.core (everything else is embedded). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t bundle Inlined upstream OpenTelemetry MR-JARs carry their own OSGI-INF/MANIFEST.MF (notably under META-INF/versions/9/OSGI-INF/), which survive into the assembled fat bundle. OSGi resolvers only read META-INF/MANIFEST.MF, so the extra files are inert noise that can mislead anyone inspecting the bundle. Replace the built-in jar-with-dependencies descriptorRef with a custom descriptor that mirrors the built-in but excludes OSGI-INF/MANIFEST.MF and META-INF/versions/*/OSGI-INF/MANIFEST.MF when unpacking dependencies.
bjorncs
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Introduces the no-op-by-default infrastructure for native OpenTelemetry (OTel) tracing in the container. Nothing is traced until explicitly enabled via the
opentelemetry-sdkfeature flag — safe to ship disabled and roll out gradually.This is the foundation PR; request instrumentation (server spans in the Jetty/jdisc layer, context propagation, child spans, RPC propagation to content nodes) follows separately.
Changes
container-opentelemetry— new Felix bundle embedding the OTel SDK + OTLP/HTTP exporter (jdk sender; no okhttp/kotlin pulled in). Exports theio.opentelemetry.api/context/sdk/exporter.otlppackages. Pre-installed into the container.OpenTelemetryProvider(container-disc) —Provider<OpenTelemetry>registered in all container types viaContainerCluster. Hands outOpenTelemetry.noop()when disabled; builds the real SDK (OTLP/HTTP, batch span processor, parent-based ratio sampling, W3C trace-context propagation) only when enabled. Builds the OTelResourcefrom the resource-attribute map.telemetry.def—enabled/endpoint/samplingRatioplus aresourceAttribute{}map. No defaults on the scalar fields, so config is always exactly what the model supplies (no silent fallback endpoint).ContainerCluster— fills the resource-attribute map from deployment identity available in the model:application,tenant,cluster.type,cluster.id.opentelemetry-sdkfeature flag — JSON flag (enabled/endpoint/samplingRatio), threaded throughModelContext→ModelContextImpl→ContainerCluster.getConfig. Takes effect at redeployment.config-model-apiabi-spec updated accordingly.Behavior
OpenTelemetry.noop()— no SDK constructed, no exporter threads, no connections, no telemetry.deconstruct()).Notes / follow-ups
service.instance.id(per-node identity) is intentionally not set — it's a metrics-centric attribute; for traces, per-node attribution (if needed) is better added at runtime ashost.nameor enriched by the Alloy agent.cluster.typeis currently"container"for all container clusters (the model has no storedClusterSpec.Typehere).🤖 Generated with Claude Code