Skip to content

Commit b81ebce

Browse files
authored
Upgrade Kubernetes clients and related dependencies in kubernetes-overlord-extensions and druid-kubernetes-extensions (#19071)
* fabric8 bump checkpoint * expose webclientoptions in configuration and override additionalconfig for vertx * document new power * Remove unnecessary complication of the druid vertx factory wrapper * use junit5 * bump okhttp to try and fix a dependency enforcer * update licenses.yaml * nit spacing * min dep enforcer requires bump of kotlin-stdlib * tinkering with licenses yaml * Bump kubernetes-extensions k8s dep so we can align on okhttp also get a bunch of other licenses in line * cleanup object mapper for k8s-ol-ext * minor dep updates + exclude protobuf from druid-kubernetes-extensions * Keep working on getting these dependencies to play nicely together * Update aws sdk in licenses * fix kubernetes-extensions pom * Working on static checks still * k8s overlord ext pom
1 parent 760e7e4 commit b81ebce

14 files changed

Lines changed: 456 additions & 74 deletions

File tree

docs/development/extensions-core/k8s-jobs.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -911,6 +911,35 @@ was [okhttp](https://github.com/fabric8io/kubernetes-client/tree/main/httpclient
911911
|`druid.indexer.runner.k8sAndWorker.http.vertx.eventLoopPoolSize`|`Integer`|...|`2 * number cores`|No|
912912
|`druid.indexer.runner.k8sAndWorker.http.vertx.internalBlockingPoolSize`|`Integer`|...|20|No|
913913

914+
##### `WebClientOptions` pass-through
915+
916+
The vert.x HTTP client also supports a generic pass-through for any property on the underlying Vert.x [`WebClientOptions`](https://vertx.io/docs/apidocs/io/vertx/ext/web/client/WebClientOptions.html) object (which extends [`HttpClientOptions`](https://vertx.io/docs/apidocs/io/vertx/core/http/HttpClientOptions.html)). This gives operators full control over connection pool behavior, timeouts, and other low-level HTTP client settings without requiring new Druid configuration fields.
917+
918+
Set properties using the prefix `druid.indexer.runner.k8sAndWorker.http.vertx.webClientOptions.<propertyName>`, where `<propertyName>` matches the corresponding setter on `WebClientOptions` (e.g., `setMaxPoolSize` maps to `maxPoolSize`).
919+
920+
**Tuning connection keep-alive timeouts**
921+
922+
This pass-through is particularly useful for environments where an intermediate network component (such as an AWS ALB, or service mesh sidecar) closes idle connections before the client expects it. When this happens, the client may attempt to reuse a connection that the server has already closed, resulting in unexpected connection errors.
923+
924+
To mitigate this, set the client-side keep-alive and idle timeouts to a value **lower** than the shortest server-side timeout in your network path. For example, if your AWS ALB has a 60 second idle timeout (the default), setting the client to 30 seconds ensures the client retires idle connections well before the server does:
925+
926+
```properties
927+
druid.indexer.runner.k8sAndWorker.http.vertx.webClientOptions.keepAliveTimeout=30
928+
druid.indexer.runner.k8sAndWorker.http.vertx.webClientOptions.idleTimeout=30
929+
```
930+
931+
**Commonly useful properties**
932+
933+
|Property|Type|Description|Default|
934+
|--------|----|-----------|-------|
935+
|`keepAliveTimeout`|`Integer`|Seconds an idle HTTP keep-alive connection is retained before the client closes it.|`60`|
936+
|`idleTimeout`|`Integer`|Seconds a connection can remain idle before being closed. Acts as a general safety net alongside `keepAliveTimeout`.|`0` (disabled)|
937+
|`maxPoolSize`|`Integer`|Maximum number of connections in the pool per endpoint.|`5`|
938+
|`connectTimeout`|`Integer`|Milliseconds to wait when establishing a new connection.|`60000`|
939+
|`poolCleanerPeriod`|`Integer`|Milliseconds between pool sweeps that evict connections exceeding the above timeouts.|`1000`|
940+
941+
Any property with a public setter on `WebClientOptions` or its parent classes (`HttpClientOptions`, `ClientOptionsBase`, `TCPSSLOptions`, `NetworkOptions`) can be set through this mechanism.
942+
914943
#### OkHttp Client
915944

916945
|Property| Possible Values |Description| Default |required|

embedded-tests/pom.xml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -572,7 +572,6 @@
572572
<dependency>
573573
<groupId>io.kubernetes</groupId>
574574
<artifactId>client-java-api</artifactId>
575-
<version>19.0.0</version>
576575
<scope>test</scope>
577576
</dependency>
578577
<dependency>

extensions-core/kubernetes-extensions/pom.xml

Lines changed: 40 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -34,23 +34,6 @@
3434
<relativePath>../../pom.xml</relativePath>
3535
</parent>
3636

37-
<properties>
38-
<kubernetes.client.version>19.0.0</kubernetes.client.version>
39-
</properties>
40-
41-
42-
<dependencyManagement>
43-
<dependencies>
44-
<!-- This is an indirect dependency of io.kubernetes.client-java
45-
update to address vulnerability in transitive dependency okio used by okhttp -->
46-
<dependency>
47-
<groupId>com.squareup.okhttp3</groupId>
48-
<artifactId>okhttp</artifactId>
49-
<version>4.12.0</version>
50-
</dependency>
51-
</dependencies>
52-
</dependencyManagement>
53-
5437
<dependencies>
5538
<dependency>
5639
<groupId>org.apache.druid</groupId>
@@ -68,17 +51,53 @@
6851
<dependency>
6952
<groupId>io.kubernetes</groupId>
7053
<artifactId>client-java</artifactId>
71-
<version>${kubernetes.client.version}</version>
54+
<exclusions>
55+
<!-- client-java-proto brings in protobuf-java 4.x which Druid does not yet support project-wide.
56+
The Kubernetes API supports multiple wire formats (JSON, YAML, Protobuf, etc.) but this extension
57+
is not currently using the proto client so exclusion is safe -->
58+
<exclusion>
59+
<groupId>io.kubernetes</groupId>
60+
<artifactId>client-java-proto</artifactId>
61+
</exclusion>
62+
<exclusion>
63+
<groupId>com.google.protobuf</groupId>
64+
<artifactId>protobuf-java</artifactId>
65+
</exclusion>
66+
<!-- Azure AD auth library and its transitive OIDC/JWT stack — only needed for AKS token-based
67+
authentication which this extension does not use -->
68+
<exclusion>
69+
<groupId>com.microsoft.azure</groupId>
70+
<artifactId>adal4j</artifactId>
71+
</exclusion>
72+
</exclusions>
7273
</dependency>
7374
<dependency>
7475
<groupId>io.kubernetes</groupId>
7576
<artifactId>client-java-extended</artifactId>
76-
<version>${kubernetes.client.version}</version>
77+
<exclusions>
78+
<!-- client-java-proto brings in protobuf-java 4.x which Druid does not yet support project-wide.
79+
The Kubernetes API supports multiple wire formats (JSON, YAML, Protobuf, etc.) but this extension
80+
is not currently using the proto client so exclusion is safe -->
81+
<exclusion>
82+
<groupId>io.kubernetes</groupId>
83+
<artifactId>client-java-proto</artifactId>
84+
</exclusion>
85+
</exclusions>
7786
</dependency>
7887
<dependency>
7988
<groupId>io.kubernetes</groupId>
8089
<artifactId>client-java-api</artifactId>
81-
<version>${kubernetes.client.version}</version>
90+
<exclusions>
91+
<!-- jakarta.ws.rs-api 4.0 conflicts with the javax.ws.rs 1.x used elsewhere in Druid -->
92+
<exclusion>
93+
<groupId>jakarta.ws.rs</groupId>
94+
<artifactId>jakarta.ws.rs-api</artifactId>
95+
</exclusion>
96+
</exclusions>
97+
</dependency>
98+
<dependency>
99+
<groupId>com.squareup.okhttp3</groupId>
100+
<artifactId>okhttp-jvm</artifactId>
82101
</dependency>
83102

84103
<!-- Tests -->
@@ -140,7 +159,7 @@
140159
<artifactId>maven-dependency-plugin</artifactId>
141160
<configuration>
142161
<!-- analyze incorrectly flags this dependency as missing when omitted, and unused when declared -->
143-
<ignoredDependencies>io.kubernetes:client-java-api-fluent:jar:19.0.0</ignoredDependencies>
162+
<ignoredDependencies>io.kubernetes:client-java-api-fluent:jar:${kubernetes.client.version}</ignoredDependencies>
144163
</configuration>
145164
</plugin>
146165
</plugins>

extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/DefaultK8sApiClient.java

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
import io.kubernetes.client.openapi.apis.CoreV1Api;
3131
import io.kubernetes.client.openapi.models.V1Pod;
3232
import io.kubernetes.client.openapi.models.V1PodList;
33+
import io.kubernetes.client.util.PatchUtils;
3334
import io.kubernetes.client.util.Watch;
3435
import org.apache.druid.discovery.DiscoveryDruidNode;
3536
import org.apache.druid.discovery.NodeRole;
@@ -65,7 +66,22 @@ public DefaultK8sApiClient(ApiClient realK8sClient, @Json ObjectMapper jsonMappe
6566
public void patchPod(String podName, String podNamespace, String jsonPatchStr)
6667
{
6768
try {
68-
coreV1Api.patchNamespacedPod(podName, podNamespace, new V1Patch(jsonPatchStr), "true", null, null, null, null);
69+
PatchUtils.patch(
70+
V1Pod.class,
71+
() -> coreV1Api.patchNamespacedPodCall(
72+
podName,
73+
podNamespace,
74+
new V1Patch(jsonPatchStr),
75+
"true",
76+
null,
77+
null,
78+
null,
79+
null,
80+
null
81+
),
82+
V1Patch.PATCH_FORMAT_JSON_PATCH,
83+
realK8sClient
84+
);
6985
}
7086
catch (ApiException ex) {
7187
throw new RE(ex, "Failed to patch pod[%s/%s], code[%d], error[%s].", podNamespace, podName, ex.getCode(), ex.getResponseBody());
@@ -80,7 +96,20 @@ public DiscoveryDruidNodeList listPods(
8096
)
8197
{
8298
try {
83-
V1PodList podList = coreV1Api.listNamespacedPod(podNamespace, null, null, null, null, labelSelector, 0, null, null, null, null, null);
99+
V1PodList podList = coreV1Api.listNamespacedPod(
100+
podNamespace,
101+
null,
102+
null,
103+
null,
104+
null,
105+
labelSelector,
106+
null,
107+
null,
108+
null,
109+
null,
110+
null,
111+
null
112+
);
84113
Preconditions.checkState(podList != null, "WTH: NULL podList");
85114

86115
Map<String, DiscoveryDruidNode> allNodes = new HashMap();
@@ -113,8 +142,20 @@ public WatchResult watchPods(String namespace, String labelSelector, String last
113142
Watch<V1Pod> watch =
114143
Watch.createWatch(
115144
realK8sClient,
116-
coreV1Api.listNamespacedPodCall(namespace, null, true, null, null,
117-
labelSelector, null, lastKnownResourceVersion, null, null, 0, true, null
145+
coreV1Api.listNamespacedPodCall(
146+
namespace,
147+
null,
148+
true,
149+
null,
150+
null,
151+
labelSelector,
152+
null,
153+
lastKnownResourceVersion,
154+
null,
155+
null,
156+
null,
157+
true,
158+
null
118159
),
119160
new TypeReference<Watch.Response<V1Pod>>()
120161
{

extensions-core/kubernetes-overlord-extensions/pom.xml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@
3434
<relativePath>../../pom.xml</relativePath>
3535
</parent>
3636

37+
<properties>
38+
<vertx.version>4.5.24</vertx.version>
39+
</properties>
3740

3841
<dependencies>
3942
<dependency>
@@ -138,13 +141,17 @@
138141
</dependency>
139142
<dependency>
140143
<groupId>com.squareup.okhttp3</groupId>
141-
<artifactId>okhttp</artifactId>
142-
<version>4.12.0</version>
144+
<artifactId>okhttp-jvm</artifactId>
143145
</dependency>
144146
<dependency>
145147
<groupId>io.vertx</groupId>
146148
<artifactId>vertx-core</artifactId>
147-
<version>4.5.24</version>
149+
<version>${vertx.version}</version>
150+
</dependency>
151+
<dependency>
152+
<groupId>io.vertx</groupId>
153+
<artifactId>vertx-web-client</artifactId>
154+
<version>${vertx.version}</version>
148155
</dependency>
149156
<dependency>
150157
<groupId>javax.ws.rs</groupId>

extensions-core/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesOverlordModule.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
import org.apache.druid.guice.JsonConfigurator;
4040
import org.apache.druid.guice.LazySingleton;
4141
import org.apache.druid.guice.PolyBind;
42+
import org.apache.druid.guice.annotations.Json;
4243
import org.apache.druid.guice.annotations.LoadScope;
4344
import org.apache.druid.guice.annotations.Self;
4445
import org.apache.druid.guice.annotations.Smile;
@@ -386,17 +387,19 @@ public RunnerStrategy get()
386387
private static class VertxHttpClientFactoryProvider implements Provider<DruidKubernetesHttpClientFactory>
387388
{
388389
private DruidKubernetesVertxHttpClientConfig config;
390+
private ObjectMapper jsonMapper;
389391

390392
@Inject
391-
public void inject(DruidKubernetesVertxHttpClientConfig config)
393+
public void inject(DruidKubernetesVertxHttpClientConfig config, @Json ObjectMapper jsonMapper)
392394
{
393-
this.config = config; // Guice injects the Vertx-specific config
395+
this.config = config;
396+
this.jsonMapper = jsonMapper;
394397
}
395398

396399
@Override
397400
public DruidKubernetesHttpClientFactory get()
398401
{
399-
return new DruidKubernetesVertxHttpClientFactory(config);
402+
return new DruidKubernetesVertxHttpClientFactory(config, jsonMapper);
400403
}
401404
}
402405

extensions-core/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/httpclient/vertx/DruidKubernetesVertxHttpClientConfig.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@
2222
import com.fasterxml.jackson.annotation.JsonProperty;
2323
import io.vertx.core.VertxOptions;
2424

25+
import java.util.Collections;
26+
import java.util.Map;
27+
2528
public class DruidKubernetesVertxHttpClientConfig
2629
{
2730
@JsonProperty
@@ -33,6 +36,9 @@ public class DruidKubernetesVertxHttpClientConfig
3336
@JsonProperty
3437
private int internalBlockingPoolSize = VertxOptions.DEFAULT_INTERNAL_BLOCKING_POOL_SIZE;
3538

39+
@JsonProperty
40+
private Map<String, Object> webClientOptions = Collections.emptyMap();
41+
3642
public int getWorkerPoolSize()
3743
{
3844
return workerPoolSize;
@@ -47,4 +53,9 @@ public int getInternalBlockingPoolSize()
4753
{
4854
return internalBlockingPoolSize;
4955
}
56+
57+
public Map<String, Object> getWebClientOptions()
58+
{
59+
return webClientOptions;
60+
}
5061
}

extensions-core/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/httpclient/vertx/DruidKubernetesVertxHttpClientFactory.java

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,31 +19,55 @@
1919

2020
package org.apache.druid.k8s.overlord.common.httpclient.vertx;
2121

22-
import io.fabric8.kubernetes.client.vertx.VertxHttpClientBuilder;
22+
import com.fasterxml.jackson.databind.ObjectMapper;
2323
import io.fabric8.kubernetes.client.vertx.VertxHttpClientFactory;
2424
import io.vertx.core.Vertx;
2525
import io.vertx.core.VertxOptions;
2626
import io.vertx.core.file.FileSystemOptions;
2727
import io.vertx.core.spi.resolver.ResolverProvider;
28+
import io.vertx.ext.web.client.WebClientOptions;
29+
import org.apache.druid.java.util.common.logger.Logger;
2830
import org.apache.druid.k8s.overlord.common.httpclient.DruidKubernetesHttpClientFactory;
2931

3032
/**
3133
* Similar to {@link VertxHttpClientFactory} but allows us to override thread pool configurations.
3234
*/
33-
public class DruidKubernetesVertxHttpClientFactory implements DruidKubernetesHttpClientFactory
35+
public class DruidKubernetesVertxHttpClientFactory extends VertxHttpClientFactory implements DruidKubernetesHttpClientFactory
3436
{
3537
public static final String TYPE_NAME = "vertx";
36-
private final Vertx vertx;
3738

38-
public DruidKubernetesVertxHttpClientFactory(final DruidKubernetesVertxHttpClientConfig httpClientConfig)
39+
private static final Logger LOG = new Logger(DruidKubernetesVertxHttpClientFactory.class);
40+
41+
private final DruidKubernetesVertxHttpClientConfig httpClientConfig;
42+
private final ObjectMapper objectMapper;
43+
44+
public DruidKubernetesVertxHttpClientFactory(
45+
DruidKubernetesVertxHttpClientConfig httpClientConfig,
46+
ObjectMapper objectMapper
47+
)
3948
{
40-
this.vertx = createVertxInstance(httpClientConfig);
49+
super(createVertxInstance(httpClientConfig));
50+
this.httpClientConfig = httpClientConfig;
51+
this.objectMapper = objectMapper;
4152
}
4253

4354
@Override
44-
public VertxHttpClientBuilder<DruidKubernetesVertxHttpClientFactory> newBuilder()
55+
protected void additionalConfig(WebClientOptions options)
4556
{
46-
return new VertxHttpClientBuilder<>(this, vertx);
57+
if (!httpClientConfig.getWebClientOptions().isEmpty()) {
58+
try {
59+
LOG.info("Applying additional WebClientOptions from configuration: %s", httpClientConfig.getWebClientOptions());
60+
objectMapper.updateValue(options, httpClientConfig.getWebClientOptions());
61+
}
62+
catch (Exception e) {
63+
throw new RuntimeException(
64+
"Failed to apply webClientOptions to WebClientOptions. "
65+
+ "Check that all property names and values are valid. "
66+
+ "Properties provided: " + httpClientConfig.getWebClientOptions(),
67+
e
68+
);
69+
}
70+
}
4771
}
4872

4973
/**

extensions-core/kubernetes-overlord-extensions/src/test/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerFactoryTest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ public void setup()
6767
Config config = new ConfigBuilder().build();
6868

6969
druidKubernetesClient =
70-
new DruidKubernetesClient(new DruidKubernetesVertxHttpClientFactory(new DruidKubernetesVertxHttpClientConfig()), config);
70+
new DruidKubernetesClient(new DruidKubernetesVertxHttpClientFactory(new DruidKubernetesVertxHttpClientConfig(), new ObjectMapper()), config);
7171
druidKubernetesCachingClient = null;
7272
taskAdapter = new TestTaskAdapter();
7373
kubernetesTaskRunnerConfig = new KubernetesTaskRunnerEffectiveConfig(kubernetesTaskRunnerStaticConfig, () -> null);

0 commit comments

Comments
 (0)