Skip to content

Commit 4fc286b

Browse files
onurkaracaliclaudebjorncs
authored
Add OpenTelemetry tracing provider to the container (disabled by default) (#37227)
* Add OpenTelemetry tracing provider to the container (disabled by default) Introduces the no-op-by-default infrastructure for native OpenTelemetry (OTel) tracing in the container. No tracing happens until explicitly enabled via the opentelemetry-sdk feature flag. - container-opentelemetry: new Felix bundle embedding the OTel SDK + OTLP HTTP exporter (jdk sender, no okhttp/kotlin); exports the api/context/sdk packages. Pre-installed into the container. - OpenTelemetryProvider (container-disc): Provider<OpenTelemetry> registered in all container types. Hands out OpenTelemetry.noop() when disabled; builds the real SDK (OTLP/HTTP, batching, parent-based ratio sampling, W3C propagation) only when enabled. Builds the OTel Resource from the resource-attribute map. - telemetry.def: enabled / endpoint / samplingRatio + a resourceAttribute map. ContainerCluster fills the map with deployment identity available in the model (application, tenant, cluster.type, cluster.id). - opentelemetry-sdk JSON feature flag (enabled/endpoint/samplingRatio), threaded through ModelContext -> ContainerCluster.getConfig. Takes effect at redeployment. - dependency-enforcer + abi-spec updated accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Make OpenTelemetry classes available to in-process container tests; avoid null endpoint - container-dev: add the OpenTelemetry SDK + OTLP exporter artifacts so the in-process container test harness (StandaloneContainerApplication) can load OpenTelemetryProvider, which is now registered in every container. In production these come from the pre-installed container-opentelemetry bundle via OSGi; container-dev is the flat test classpath, mirroring vespa-3party-bundles / container-onnxruntime. - Never hand back a null endpoint when disabled: the telemetry.def endpoint field is mandatory, so the generated config builder rejects null. OpenTelemetryConfiguration.disabled() and OpenTelemetrySettings now use an empty string (still no localhost default, so nothing is mistakenly sent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Expose only the OpenTelemetry API on the public container classpath The SDK must not leak onto the 3rd-party container classpath (container-dependencies-enforcer). Keep the public surface to the OTel API only: - Split OpenTelemetrySdkBuilder out of OpenTelemetryProvider. The provider now references only the OTel API; the SDK-building class is loaded lazily, only when tracing is enabled. So the in-process container test harness needs only the API to instantiate the disabled, no-op provider. - container-dev: depend on opentelemetry-api only (was sdk + exporter). The SDK is supplied at runtime by the pre-installed container-opentelemetry bundle. - container-dependencies-enforcer: whitelist opentelemetry-api/context/common (provided). These are genuinely provided at runtime by the bundle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Fix container-opentelemetry OSGi resolution: mark unused imports optional The bundle failed to resolve in a real container (system tests): bnd had generated mandatory Import-Package entries for packages the container does not provide (io.grpc, guava, the OTel incubator/autoconfigure SPI, jspecify, sun.misc). The earlier assumption that bnd marks these optional automatically was wrong. Everything the bundle actually uses at runtime is embedded (private) in the jar. Add an explicit Import-Package keeping only com.fasterxml.jackson.core mandatory (provided by the container) and marking all other computed imports resolution:=optional, so the bundle resolves without the unavailable deps it never exercises. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address review: export ai.vespa.telemetry, keep OTel off the public container classpath - configdefinitions: export the generated ai.vespa.telemetry package (@ExportPackage package-info) so container-disc can import TelemetryConfig at runtime in a real container (OSGi). - container: exclude io.opentelemetry:* from the container-dev dependency so the public 3rd-party container classpath does not expose OpenTelemetry. It stays internal to the platform (container-disc), provided at runtime by the container-opentelemetry bundle. - container-dependencies-enforcer: drop the OpenTelemetry whitelist entries (no longer exposed). container-dev keeps opentelemetry-api so the in-process container test harness still resolves the no-op OpenTelemetryProvider. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Remove optional imports * Fix bundle embedding: declare all OpenTelemetry modules as direct dependencies The maven-bundle-plugin did not reliably embed deep transitive OTel modules across build environments (e.g. opentelemetry-common, 4 levels deep via sdk->api->context->common). When a module was not embedded, bnd emitted a mandatory versioned Import-Package for it, so the bundle failed OSGi resolution in the real container (system tests), cascading to container-disc and standalone-container. Declare all 12 OTel modules directly so Embed-Dependency embeds them as first-level deps instead of relying on Embed-Transitive. Same workaround pattern as container-apache-http-client-bundle. Verified: the built bundle imports only com.fasterxml.jackson.core and no io.opentelemetry packages. Also keep Import-Package limited to com.fasterxml.jackson.core (everything else is embedded). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: strip stray OSGI-INF/MANIFEST.MF from container-opentelemetry fat bundle Inlined upstream OpenTelemetry MR-JARs carry their own OSGI-INF/MANIFEST.MF (notably under META-INF/versions/9/OSGI-INF/), which survive into the assembled fat bundle. OSGi resolvers only read META-INF/MANIFEST.MF, so the extra files are inert noise that can mislead anyone inspecting the bundle. Replace the built-in jar-with-dependencies descriptorRef with a custom descriptor that mirrors the built-in but excludes OSGI-INF/MANIFEST.MF and META-INF/versions/*/OSGI-INF/MANIFEST.MF when unpacking dependencies. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Bjørn Christian Seime <bjorn.christian@seime.no>
1 parent 8ee4f35 commit 4fc286b

23 files changed

Lines changed: 492 additions & 2 deletions

File tree

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ add_subdirectory(container-apache-http-client-bundle)
117117
add_subdirectory(container-disc)
118118
add_subdirectory(container-messagebus)
119119
add_subdirectory(container-onnxruntime)
120+
add_subdirectory(container-opentelemetry)
120121
add_subdirectory(container-llama)
121122
add_subdirectory(container-search)
122123
add_subdirectory(container-search-and-docproc)

config-model-api/abi-spec.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1415,6 +1415,7 @@
14151415
"public double feedNiceness()",
14161416
"public int maxUnCommittedMemory()",
14171417
"public boolean containerDumpHeapOnShutdownTimeout()",
1418+
"public com.yahoo.config.provision.OpenTelemetryConfiguration opentelemetrySdk()",
14181419
"public int heapSizePercentage(java.util.Optional)",
14191420
"public java.util.List allowedAthenzProxyIdentities()",
14201421
"public int maxActivationInhibitedOutOfSyncGroups()",

config-model-api/src/main/java/com/yahoo/config/model/api/ModelContext.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import com.yahoo.config.provision.DockerImage;
1515
import com.yahoo.config.provision.HostName;
1616
import com.yahoo.config.provision.NodeResources.Architecture;
17+
import com.yahoo.config.provision.OpenTelemetryConfiguration;
1718
import com.yahoo.config.provision.SharedHosts;
1819

1920
import java.io.File;
@@ -96,6 +97,7 @@ interface FeatureFlags {
9697
@ModelFeatureFlag(owners = {"hmusum"}) default double feedNiceness() { return 0.0; }
9798
@ModelFeatureFlag(owners = {"hmusum"}) default int maxUnCommittedMemory() { return 130000; }
9899
@ModelFeatureFlag(owners = {"bjorncs"}) default boolean containerDumpHeapOnShutdownTimeout() { return false; }
100+
@ModelFeatureFlag(owners = {"onur"}) default OpenTelemetryConfiguration opentelemetrySdk() { return OpenTelemetryConfiguration.disabled(); }
99101
@ModelFeatureFlag(owners = {"hmusum"}) default int heapSizePercentage(Optional<String> clusterId) { return 0;}
100102
@ModelFeatureFlag(owners = {"bjorncs"}) default List<String> allowedAthenzProxyIdentities() { return List.of(); }
101103
@ModelFeatureFlag(owners = {"vekterli"}) default int maxActivationInhibitedOutOfSyncGroups() { return 0; }

config-model/src/main/java/com/yahoo/vespa/model/container/ContainerCluster.java

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
22
package com.yahoo.vespa.model.container;
33

4+
import ai.vespa.telemetry.TelemetryConfig;
45
import com.yahoo.cloud.config.ClusterInfoConfig;
56
import com.yahoo.cloud.config.ConfigserverConfig;
67
import com.yahoo.cloud.config.CuratorConfig;
@@ -13,6 +14,7 @@
1314
import com.yahoo.config.model.producer.AnyConfigProducer;
1415
import com.yahoo.config.model.producer.TreeConfigProducer;
1516
import com.yahoo.config.provision.ClusterSpec;
17+
import com.yahoo.config.provision.OpenTelemetryConfiguration;
1618
import com.yahoo.config.provision.Zone;
1719
import com.yahoo.container.ComponentsConfig;
1820
import com.yahoo.container.QrSearchersConfig;
@@ -109,7 +111,8 @@ public abstract class ContainerCluster<CONTAINER extends Container>
109111
ClusterInfoConfig.Producer,
110112
ConfigserverConfig.Producer,
111113
CuratorConfig.Producer,
112-
SchemaInfoConfig.Producer
114+
SchemaInfoConfig.Producer,
115+
TelemetryConfig.Producer
113116
{
114117

115118
/**
@@ -168,6 +171,8 @@ public abstract class ContainerCluster<CONTAINER extends Container>
168171

169172
private volatile boolean deferChangesUntilRestart = false;
170173
private final boolean applyOnRestartForApplicationMetadataConfigEnabled;
174+
private final OpenTelemetryConfiguration opentelemetrySdk;
175+
private final Map<String, String> telemetryResourceAttributes;
171176
private boolean clientsLegacyMode;
172177
private List<Client> clients = List.of();
173178

@@ -179,7 +184,9 @@ public ContainerCluster(TreeConfigProducer<?> parent, String configSubId, String
179184
this.zooKeeperLocalhostAffinity = zooKeeperLocalhostAffinity;
180185
this.compressionType = "zstd";
181186
applyOnRestartForApplicationMetadataConfigEnabled = deployState.featureFlags().applyOnRestartForApplicationMetadataConfig();
182-
187+
opentelemetrySdk = deployState.featureFlags().opentelemetrySdk();
188+
telemetryResourceAttributes = telemetryResourceAttributes(deployState, clusterId);
189+
183190
componentGroup = new ComponentGroup<>(this, "component");
184191

185192
addCommonVespaBundles();
@@ -199,6 +206,8 @@ public ContainerCluster(TreeConfigProducer<?> parent, String configSubId, String
199206
addSimpleComponent(com.yahoo.container.handler.ClustersStatus.class.getName());
200207
addSimpleComponent("com.yahoo.container.jdisc.DisabledConnectionLogProvider");
201208
addSimpleComponent(com.yahoo.jdisc.http.server.jetty.Janitor.class);
209+
// OpenTelemetry tracing provider: present in all container types; hands out a no-op instance unless enabled.
210+
addSimpleComponent("com.yahoo.container.jdisc.telemetry.OpenTelemetryProvider");
202211
}
203212

204213
protected abstract boolean messageBusEnabled();
@@ -566,6 +575,32 @@ public void getConfig(QrSearchersConfig.Builder builder) {
566575
if (containerSearch != null) containerSearch.getConfig(builder);
567576
}
568577

578+
@Override
579+
public void getConfig(TelemetryConfig.Builder builder) {
580+
builder.enabled(opentelemetrySdk.enabled())
581+
.endpoint(opentelemetrySdk.endpoint())
582+
.samplingRatio(opentelemetrySdk.samplingRatio());
583+
builder.resourceAttribute.putAll(telemetryResourceAttributes);
584+
}
585+
586+
/**
587+
* Resource attributes describing this container service, filled from the deployment identity available
588+
* in the model. Used by the OpenTelemetry provider to build the OTel {@code Resource}.
589+
*/
590+
private Map<String, String> telemetryResourceAttributes(DeployState deployState, String clusterId) {
591+
var applicationId = deployState.getProperties().applicationId();
592+
593+
Map<String, String> attributes = new LinkedHashMap<>();
594+
attributes.put("application", applicationId.application().value());
595+
attributes.put("tenant", applicationId.tenant().value());
596+
attributes.put("zone", this.zone.systemLocalValue());
597+
attributes.put("environment", this.zone.environment().value());
598+
attributes.put("cloud", this.zone.cloud().toString());
599+
attributes.put("cluster.type", "container");
600+
attributes.put("cluster.id", clusterId);
601+
return attributes;
602+
}
603+
569604
@Override
570605
public void getConfig(QrStartConfig.Builder builder) {
571606
builder.jvm
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
2+
package com.yahoo.config.provision;
3+
4+
/**
5+
* Settings for Vespa's OpenTelemetry SDK (tracing), controlled by the {@code opentelemetry-sdk}
6+
* feature flag and produced into the container's telemetry config.
7+
*
8+
* @author onur
9+
*/
10+
public interface OpenTelemetryConfiguration {
11+
12+
boolean enabled();
13+
String endpoint();
14+
double samplingRatio();
15+
16+
/** The default, disabled configuration: produces a no-op OpenTelemetry. */
17+
static OpenTelemetryConfiguration disabled() {
18+
return new OpenTelemetryConfiguration() {
19+
@Override public boolean enabled() { return false; }
20+
@Override public String endpoint() { return ""; }
21+
@Override public double samplingRatio() { return 1.0; }
22+
};
23+
}
24+
25+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
// Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
2+
@ExportPackage
3+
package ai.vespa.telemetry;
4+
5+
import com.yahoo.osgi.annotation.ExportPackage;
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
2+
# Configuration for the container OpenTelemetry tracing provider.
3+
# Disabled by default: when enabled=false the provider hands out a no-op OpenTelemetry that produces nothing.
4+
package=ai.vespa.telemetry
5+
6+
enabled bool
7+
endpoint string
8+
samplingRatio double
9+
10+
# Resource attributes describing the emitting service, filled by the config-model
11+
# from whatever deployment identity is available (application, tenant, cluster type, cluster id).
12+
resourceAttribute{} string

configserver/src/main/java/com/yahoo/vespa/config/server/deploy/ModelContextImpl.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import com.yahoo.config.provision.DockerImage;
2929
import com.yahoo.config.provision.HostName;
3030
import com.yahoo.config.provision.NodeResources.Architecture;
31+
import com.yahoo.config.provision.OpenTelemetryConfiguration;
3132
import com.yahoo.config.provision.SharedHosts;
3233
import com.yahoo.vespa.flags.Flag;
3334
import com.yahoo.vespa.flags.FlagSource;
@@ -223,6 +224,7 @@ private <T, F extends Flag<T, F>, U extends UnboundFlag<T, F, U>> ModelContext.F
223224
@Override public double feedConcurrency() { return flag(PermanentFlags.FEED_CONCURRENCY).value(); }
224225
@Override public double feedNiceness() { return flag(PermanentFlags.FEED_NICENESS).value(); }
225226
@Override public int mbusNetworkThreads() { return flag(Flags.MBUS_NUM_NETWORK_THREADS).value(); }
227+
@Override public OpenTelemetryConfiguration opentelemetrySdk() { return flag(Flags.OPENTELEMETRY_SDK).value(); }
226228
@Override public List<String> allowedAthenzProxyIdentities() { return flag(PermanentFlags.ALLOWED_ATHENZ_PROXY_IDENTITIES).value(); }
227229
@Override public int maxActivationInhibitedOutOfSyncGroups() { return flag(PermanentFlags.MAX_ACTIVATION_INHIBITED_OUT_OF_SYNC_GROUPS).value(); }
228230
@Override public double resourceLimitDisk() { return flag(PermanentFlags.RESOURCE_LIMIT_DISK).value(); }

container-dependency-versions/pom.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,13 @@
3636

3737
<dependencyManagement>
3838
<dependencies>
39+
<dependency>
40+
<groupId>io.opentelemetry</groupId>
41+
<artifactId>opentelemetry-bom</artifactId>
42+
<version>${opentelemetry.vespa.version}</version>
43+
<type>pom</type>
44+
<scope>import</scope>
45+
</dependency>
3946
<dependency>
4047
<groupId>aopalliance</groupId>
4148
<artifactId>aopalliance</artifactId>

container-dev/pom.xml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,14 @@
187187
<version>${project.version}</version>
188188
<type>pom</type>
189189
</dependency>
190+
<!-- OpenTelemetry API only: this is what the in-process container test harness needs to instantiate the
191+
(disabled, no-op) OpenTelemetryProvider, and all that 3rd-party container projects should compile
192+
against. The SDK is loaded lazily by OpenTelemetrySdkBuilder and supplied at runtime by the
193+
pre-installed container-opentelemetry bundle. -->
194+
<dependency>
195+
<groupId>io.opentelemetry</groupId>
196+
<artifactId>opentelemetry-api</artifactId>
197+
</dependency>
190198
<dependency>
191199
<groupId>com.yahoo.vespa</groupId>
192200
<artifactId>vespajlib</artifactId>

0 commit comments

Comments
 (0)