Skip to content

gardener/gardener-extension-otelcol

gardener-extension-otelcol

The gardener-extension-otelcol repo provides a Gardener Extension for an OpenTelemetry Collector, which runs in the shoot control-plane namespace and forwards observability signals for control-plane components to a remote OpenTelemetry Collector receiver.

Warning

This extension is in early development state. Do not use it in a production environment.

High Level Overview

Requirements

Code structure

The project repo uses the following code structure.

Package Description
cmd Command-line application of the extension
pkg/admission Implementations for the Gardener extension admission Validator and Mutator interfaces
pkg/apis Extension API types, e.g. configuration spec, etc.
pkg/actuator Implementations for the Gardener Extension Actuator interfaces
pkg/controller Utility wrappers for creating Kubernetes reconcilers for Gardener Actuators
pkg/imagevector Image vector for container images
pkg/heartbeat Utility wrappers for creating heartbeat reconcilers for Gardener extensions
pkg/metrics Metrics emitted by the extension
pkg/mgr Utility wrappers for creating controller-runtime managers using functional options API
pkg/version Version metadata information about the extension
internal/tools Go-based tools used for testing and linting the project
charts Helm charts for deploying the extension
examples Example Kubernetes resources, which can be used in a dev environment
test Various files (e.g. schemas, CRDs, etc.), used during testing

Usage

You can enable the extension for a Gardener Shoot cluster by updating the .spec.extensions of your shoot manifest.

The following example shoot manifest snippet enables the extension and configures the OpenTelemetry Collector to emit the signals for the shoot control-plane components via the Debug Exporter.

...

spec:
  extensions:
    - type: otelcol
      providerConfig:
        apiVersion: otelcol.extensions.gardener.cloud/v1alpha1
        kind: CollectorConfig
        spec:
          exporters:
            debug:
              enabled: true
              verbosity: basic  # basic, normal or detailed

This configuration however is only useful while developing or troubleshooting an issue with the collector, because signals are not actually forwarded to a remote OpenTelemetry Collector receiver.

The following configuration snippet enables the extension for a shoot and configures it to forward the signals of the control-plane components to a remote collector using the OTLP HTTP exporter.

...

spec:
  extensions:
    - type: otelcol
      providerConfig:
        apiVersion: otelcol.extensions.gardener.cloud/v1alpha1
        kind: CollectorConfig
        spec:
          exporters:
            # OTLP HTTP exporter settings
            otlp_http:
              enabled: true
              endpoint: "https://opentelemetry-receiver.example.org"

The following example snippet expands on the previous one by adding TLS configuration settings and Bearer token authentication with the remote collector.

...

spec:
  extensions:
    - type: otelcol
      providerConfig:
        apiVersion: otelcol.extensions.gardener.cloud/v1alpha1
        kind: CollectorConfig
        spec:
          exporters:
            # OTLP HTTP exporter settings
            otlp_http:
              enabled: true
              endpoint: "https://opentelemetry-receiver.example.org"
              token:
                resourceRef:
                  name: otelcol-bearer-token
                  dataKey: token
              tls:
                ca:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: ca.crt
                cert:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: client.crt
                key:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: client.key
  resources:
  - name: otelcol-bearer-token
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: my-otelcol-bearer-token
  - name: otelcol-tls
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: my-otelcol-tls

In order to provide the otelcol-tls and otelcol-bearer-token secrets from the example above to the extension, you should first create the respective secrets in the shoot project namespace, which can then be referenced via Gardener Referenced Resources.

This example snippet enables the extension to forward the signals of the control-plane components to a remote collector using the OTLP gRPC exporter.

  extensions:
    - type: otelcol
      providerConfig:
        apiVersion: otelcol.extensions.gardener.cloud/v1alpha1
        kind: CollectorConfig
        spec:
          # Exporters settings
          exporters:
            # OTLP gRPC exporter settings
            otlp_grpc:
              enabled: true
              endpoint: "https://opentelemetry-receiver.default.svc.cluster.local:4317"
              token:
                resourceRef:
                  name: otelcol-bearer-token
                  dataKey: token
              tls:
                ca:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: ca.crt
                cert:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: client.crt
                key:
                  resourceRef:
                    name: otelcol-tls
                    dataKey: client.key

For additional configuration settings, which can be provided to the extension, please make sure to check the OTel Extension API spec documentation.

Development

In order to build a binary of the extension, you can use the following command.

make build

The resulting binary can be found in bin/extension.

In order to build a Docker image of the extension, you can use the following command.

make docker-build

Run the following command to get usage info about the available Makefile targets.

make help

For local development of the gardener-extension-otelcol it is recommended that you setup a development Gardener environment.

Please refer to the next sections for more information about deploying and testing the extension in a Gardener development environment.

Development Environment with Gardener Operator

The extension can also be deployed via the Gardener Operator.

In order to start a local development environment with the Gardener Operator, please refer to the following documentations.

In summary, these are the steps you need to follow in order to start a local development environment with the Gardener Operator, however, please make sure that you read the documents above for additional details.

make kind-up gardener-up

Before you continue with the next steps, make sure that you configure your KUBECONFIG to point to the kubeconfig file of the cluster, which runs the Gardener Operator.

There will be two kubeconfig files created for you, after the dev environment has been created.

Path Description
/path/to/gardener/dev-setup/kubeconfigs/runtime/kubeconfig The runtime cluster (gardener-operator runs in it)
/path/to/gardener/dev-setup/kubeconfigs/virtual-garden/kubeconfig The virtual garden cluster

Throughout this document we will refer to the kubeconfigs for runtime and virtual clusters as $KUBECONFIG_RUNTIME and $KUBECONFIG_VIRTUAL respectively.

Before deploying the extension we need to target the runtime cluster, since this is where the extension resources for gardener-operator reside.

export KUBECONFIG=$KUBECONFIG_RUNTIME

In order to deploy the extension, execute the following command.

make deploy-operator

The deploy-operator target takes care of the following.

  1. Builds a Docker image of the extension
  2. Loads the image into the kind cluster nodes
  3. Packages the Helm charts and pushes them to the local registry
  4. Deploys the Extension (from group operator.gardener.cloud/v1alpha1) to the runtime cluster

Verify that we have successfully created the Extension (from group operator.gardener.cloud/v1alpha1) resource.

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME get extop otelcol
NAME      INSTALLED   REQUIRED RUNTIME   REQUIRED VIRTUAL   AGE
otelcol   True        False              False              13s

Verify that the respective ControllerRegistration and ControllerDeployment resources have been created by the gardener-operator in the virtual garden cluster.

$ kubectl --kubeconfig $KUBECONFIG_VIRTUAL get controllerregistrations,controllerdeployments otelcol
NAME                                                 RESOURCES           AGE
controllerregistration.core.gardener.cloud/otelcol   Extension/otelcol   42s

NAME                                               AGE
controllerdeployment.core.gardener.cloud/otelcol   42s

Finally, we can create an example shoot with our extension enabled. The examples/shoot.yaml file provides a ready-to-use shoot manifest with the extension enabled and configured.

The provided example shoot references secrets from the project namespace, which are used to configure the TLS settings between the exporter and a local dev receiver, running in the default namespace.

The following commands will create the TLS secrets, a dev OpenTelemetry receiver in the default namespace, and a dev shoot, configured with the extension.

kubectl --kubeconfig $KUBECONFIG_RUNTIME apply -f examples/opentelemetry-receiver.yaml
kubectl --kubeconfig $KUBECONFIG_VIRTUAL apply -f examples/secret-tls.yaml
kubectl --kubeconfig $KUBECONFIG_VIRTUAL apply -f examples/secret-bearer-token.yaml
kubectl --kubeconfig $KUBECONFIG_VIRTUAL apply -f examples/shoot.yaml

If you have an already existing and running shoot, for which you want to enable the extension, simply follow the instructions from the previous sections in order to enable and configure the extension manually.

Once we create the shoot cluster, gardenlet will start deploying our gardener-extension-otelcol, since it is required by our shoot.

Verify that the extension has been successfully installed by checking the corresponding ControllerInstallation resource for our extension.

$ kubectl --kubeconfig $KUBECONFIG_VIRTUAL get controllerinstallations
NAME                      REGISTRATION        SEED    VALID   INSTALLED   HEALTHY   PROGRESSING   AGE
otelcol-8rvmn             otelcol             local   True    True        True      False         64s

After your shoot cluster has been successfully created and reconciled, verify that the extension is healthy.

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local get extensions otelcol
NAME                INSTALLED   REQUIRED RUNTIME   REQUIRED VIRTUAL   AGE
otelcol             True        False              True               13m

Verify that the ManagedResource created by the extension is healthy as well.

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local get managedresource external-otelcol
NAME               CLASS   APPLIED   HEALTHY   PROGRESSING   AGE
external-otelcol   seed    True      True      False         6m20s

After successful reconciliation we should see the following OpenTelemetry collectors in the shoot control-plane namespace.

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local get otelcol external-otelcol
NAME                      MODE          VERSION   READY   AGE     IMAGE                                                                                                                          MANAGEMENT
external-otelcol          statefulset   0.141.0   1/1     6m45s   europe-docker.pkg.dev/gardener-project/releases/3rd/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.141.0   managed

We should also see that the Collector and Target Allocator are running and healthy.

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local get sts external-otelcol-collector
NAME                         READY   AGE
external-otelcol-collector   1/1     3m30s

$ kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local get deployment external-otelcol-targetallocator
NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
external-otelcol-targetallocator   1/1     1            1           3m38s

In order to trigger reconciliation of the extension you can annotate the extension resource.

kubectl --kubeconfig $KUBECONFIG_RUNTIME --namespace shoot--local--local annotate extensions otelcol gardener.cloud/operation=reconcile

In order to delete the dev shoot, TLS secrets and dev OpenTelemetry receiver you can run the following commands.

kubectl --kubeconfig $KUBECONFIG_VIRTUAL --namespace garden-local annotate shoot local confirmation.gardener.cloud/deletion=true --overwrite
kubectl --kubeconfig $KUBECONFIG_VIRTUAL delete -f examples/shoot.yaml --ignore-not-found=true --wait=false
kubectl --kubeconfig $KUBECONFIG_RUNTIME delete -f examples/opentelemetry-receiver.yaml --ignore-not-found=true --wait=false
kubectl --kubeconfig $KUBECONFIG_VIRTUAL delete -f examples/secret-tls.yaml --ignore-not-found=true --wait=false
kubectl --kubeconfig $KUBECONFIG_VIRTUAL delete -f examples/secret-bearer-token.yaml --ignore-not-found=true --wait=false

Troubleshooting

This section provides some hints related to troubleshooting the OpenTelemetry Collector, which is managed by the Gardener extension.

Check the official OpenTelemetry Troubleshooting Guides

Make sure that you check the following official OpenTelemetry documentation:

Check the logs of the OpenTelemetry Collector and Target Allocator

Check the logs of the deployment/external-otelcol-targetallocator and statefulset/external-otelcol-collector, e.g.

kubectl --namespace shoot--local--local logs -f deployments/external-otelcol-targetallocator
kubectl --namespace shoot--local--local logs -f statefulset/external-otelcol-collector

Verify that there are ServiceMonitors in the shoot control-plane namespace

The Target Allocator deployed by the extension is configured to discover ServiceMonitor resources with the following labels:

  • prometheus=shoot

Confirm that ServiceMonitors with these labels exist in the shoot control-plane namespace, e.g.

kubectl --namespace shoot--local--local get servicemonitors -l prometheus=shoot

Check the configuration of the Collector and Target Allocator

The Target Allocator and Collector configmaps are labeled with observability.gardener.cloud/app=external-otelcol. Check and confirm that the configuration settings in these configmaps are correct.

$ kubectl --namespace shoot--local--local get cm -l observability.gardener.cloud/app=external-otelcol
NAME                                      DATA   AGE
external-otelcol-collector-c30d03f4       1      13m
external-otelcol-targetallocator-config   1      13m

Verify that the Target Allocator discovers scrape targets

The communication between the Target Allocator and the Collector happens over mTLS, so we will need the client certificate of the collector, in order to confirm that the Target Allocator has discovered targets for scraping.

First, get the secret which contains the client certificate used by the Collector, e.g.

kubectl --namespace shoot--local--local get secret -l name=otelcol-collector-client

Save the client TLS secret locally:

mkdir client-cert
for k in tls.key tls.crt; do
  kubectl --namespace shoot--local--local get secret -l name=otelcol-collector-client -o yaml | \
    yq ".items[0].data.\"${k}\"" | base64 -d > "./client-cert/${k}"
done

Next, we need to port-forward the Target Allocator service locally.

kubectl --namespace shoot--local--local port-forward service/external-otelcol-targetallocator-https 8443:443

Now we can query the Target Allocator and review the jobs and the scrape targets it can dispatch to collectors.

curl -k --cert client-cert/tls.crt --key client-cert/tls.key -X GET 'https://localhost:8443/jobs' | jq '.'

The following command will query the Target Allocator for the scrape configs.

curl -k --cert client-cert/tls.crt --key client-cert/tls.key -X GET 'https://localhost:8443/scrape_configs' | jq '.'

In addition to the API paths served by the Target Allocator you can also inspect the configuration via /debug endpoints using your browser.

In order to do that we can use mitmproxy. Keep in mind that mitmpoxy expects to find the key and certificate in a single PEM-encoded file.

cat client-cert/tls.crt client-cert/tls.key > client-cert/mitmproxy.pem

Now we can start the mitmproxy.

mitmproxy -k \
          --listen-port=8080 \
          --set client_certs=client-cert/mitmproxy.pem \
          --mode upstream:https://localhost:8443

Open up your browser at http://localhost:8080/ in order to view the Target Collector jobs, scrape configs, and assigned collectors.

Tests

In order to run the tests use the command below:

make test

In order to test the Helm chart and the manifests provided by it you can run the following command.

make check-helm

In order to test the example resources from the examples/ directory you can run the following command.

make check-examples

Documentation

Make sure to check the following documents for more information about Gardener Extensions and the available extensions API.

Contributing

gardener-extension-otelcol is hosted on Github.

Please contribute by reporting issues, suggesting features or by sending patches using pull requests.

License

This project is Open Source and licensed under Apache License 2.0.

About

Gardener Extension for OpenTelemetry Collector in shoot clusters

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors