Skip to content

druid-kubernetes-overlord-extensions Fabric8 KubernetesClient HTTP Client Issues #18629

@capistrant

Description

@capistrant

Fabric8 KubernetesClient Overview

Fabric8 KubernetesClient is the client library we use in the druid-kubernetes-overlord-extensions.

Underlying HTTP Client

Fabric8 uses an underlying HTTP client for client/server interaction with the K8s cluster. This HTTP client is pluggable. Fabric8 supports four different clients as of this writing: ['vert.x', 'okhttp', 'jetty', 'native-jdk']. vert.x is currently the default client used by Fabric8.

Druid History with the Fabric8 client

#17913 switched Druid to use vert.x.

#18013 got Druid caught up the latest Fabric8 versions.

Druid's Path Forward

The reason for this issue is that there have been issues with both okhttp and vert.x in production Druid clusters. In the wild, Druid operators have reported issues with both the vert.x and okhttp clients.

  • vert.x: Issues with failures communicating with the API server due to unhealthy connections in the connection pool, leading to sporadic task failures.
  • okhttp: Issues with large amounts of threads being created and polluting memory if there are many tasks being launched.

The Druid developer community wants to reach a state where a stable default HTTP client and configuration is identified, simplifying configuration and distribution packaging. In the interim, Druid operators can select the HTTP client and configure some its parameters. This will help operators tailor the HTTP client to their use case and provide feedback to the Druid developer community on what works well in practice.

Known Issues

vert.x

  • Issues with K8s API requests failing with ConnectionClosed exceptions due to unhealthy connections in the underlying connection pool.
    • This can lead to sporadic task failures.
    • The issue appears to be due to connections being closed on the server side, but the client side not cleaning them up before trying to use them in future requests.
    • Related vert.x issue has been opened with fabric8 to investigate exposing more configuration knobs to tune the connection pool.

okhttp

  • With the default configuration, the client creates a large number of threads, which can lead to memory issues if there are many tasks being launched.
    • The underlying issue appears to be related to an unbounded thread pool being used by the client. We are exposing experimental configuration knobs to tune the thread pool size to attempt to mitigate this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions