diff --git a/README.md b/README.md index cf26d254..9d7e7caf 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,10 @@ These pages detail the components and how to configure the EDOT Collector. - [Manual configurations](docs/manual-configuration.md): Manually configure the EDOT Collector to send data to Elastic Observability. - [Limitations](docs/collector-limitations.md): Understand the current limitations of the EDOT Collector. +## Kubernetes Observability using the EDOT Collector + +- [Kubernetes guided onboarding](docs/kubernetes/operator/README.md): Use the guided onboarding to send Kubernetes logs, metrics, and application traces to Elasticsearch using the EDOT Collector and [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator/). + ## Collect application data using the EDOT language SDKs Elastic offers several Distributions that extend [OpenTelemetry language SDKs](https://opentelemetry.io/docs/languages/). The following languages are currently available: diff --git a/docs/kubernetes/operator/README.md b/docs/kubernetes/operator/README.md new file mode 100644 index 00000000..be62ea2b --- /dev/null +++ b/docs/kubernetes/operator/README.md @@ -0,0 +1,198 @@ +# Get started with OpenTelemetry for Kubernetes Observability + +This guide describes how to: + +- Install the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator/) using the [kube-stack Helm Chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-kube-stack). +- Use the EDOT Collector to send Kubernetes logs, metrics, and application traces to an Elasticsearch cluster. +- Use the operator for applications [auto-instrumentation](https://opentelemetry.io/docs/kubernetes/operator/automatic/) in all supported languages. + +## Table of Contents + +- [Prerequisites](#prerequisites) +- [Compatibility Matrix](#compatibility-matrix) +- [Components description](#components-description) +- [Deploying components using Kibana Onboarding UX](#deploying-components-using-kibana-onboarding-ux) +- [Manual deployment of all components](#manual-deployment-of-all-components) +- [Installation verification](#installation-verification) +- [Instrumenting applications](#instrumenting-applications) +- [Limitations](#limitations) + +## Prerequisites + +- Elastic Stack (self-managed or [Elastic Cloud](https://www.elastic.co/cloud)) version 8.16.0 or higher, or an [Elasticsearch serverless](https://www.elastic.co/docs/current/serverless/elasticsearch/get-started) project. + +- A Kubernetes version supported by the OpenTelemetry Operator (refer to the operator's [compatibility matrix](https://github.com/open-telemetry/opentelemetry-operator?#compatibility-matrix) for more details). + +## Compatibility Matrix + +The minimum supported version of the Elastic Stack for OpenTelemetry-based monitoring on Kubernetes is `8.16.0`. Different Elastic Stack releases support specific versions of the [kube-stack Helm Chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-kube-stack). + +The following is the current list of supported versions: + +| Stack Version | Helm Chart Version | Values file | +|---------------|--------------------|--------------------| +| Serverless | 0.3.0 | values.yaml | +| 8.16.0 | 0.3.0 | values.yaml | + +When [installing the release](#manual-deployment-of-all-components), ensure you use the right `--version` and `-f ` parameters. Values files are available in the [resources directory](/resources/kubernetes/operator/helm). + +## Components description + +### OpenTelemetry Operator + +The OpenTelemetry Operator is a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) implementation designed to manage OpenTelemetry resources in a Kubernetes environment. It defines and oversees the following Custom Resource Definitions (CRDs): + +- [OpenTelemetry Collectors](https://github.com/open-telemetry/opentelemetry-collector): Agents responsible for receiving, processing and exporting telemetry data such as logs, metrics, and traces. +- [Instrumentation](https://opentelemetry.io/docs/kubernetes/operator/automatic): Used for the atomatic instrumentation of workloads by leveraging OpenTelemetry instrumentation libraries. + +All signals including logs, metrics, traces are processed by the collectors and sent directly to Elasticsearch via the ES exporter. A collector's processor pipeline replaces the traditional APM server functionality for handling application traces. + +### Kube-stack Helm Chart + +The [kube-stack Helm Chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-kube-stack) is used to manage the installation of the operator (including its CRDs) and to configure a suite of collectors, which instrument various Kubernetes components to enable comprehensive observability and monitoring. + +The chart is installed with a provided default `values.yaml` file that can be customized when needed. + +### DaemonSet collectors + +The OpenTelemetry components deployed within the DaemonSet collectors are responsible for observing specific signals from each node. To ensure complete data collection, these components must be deployed on every node in the cluster. Failing to do so will result in partial and potentially incomplete data. + +The DaemonSet collectors handle the following data: + +- Host Metrics: Collects host metrics (hostmetrics receiver) specific to each node. +- Kubernetes Metrics: Captures metrics related to the Kubernetes infrastructure on each node. +- Logs: Utilizes a filelog receiver to gather logs from all Pods running on the respective node. +- OTLP Traces Receiver: Opens an HTTP and a GRPC port on the node to receive OTLP trace data. + +### Deployment collector + +The OpenTelemetry components deployed within a Deployment collector focus on gathering data at the cluster level rather than at individual nodes. Unlike DaemonSet collectors, which need to be deployed on every node, a Deployment collector operates as a standalone instance. + +The Deployment collector handles the following data: + +- Kubernetes Events: Monitors and collects events occurring across the entire Kubernetes cluster. +- Cluster Metrics: Captures metrics that provide insights into the overall health and performance of the Kubernetes cluster. + +### Auto-instrumentation + +The Helm Chart is configured to enable zero-code instrumentation using the [Operator's Instrumentation resource](https://github.com/open-telemetry/opentelemetry-operator/?tab=readme-ov-file#opentelemetry-auto-instrumentation-injection) for the following programming languages: + +- Go +- Java +- Node.js +- Python +- .NET + +## Deploying components using Kibana Onboarding UX + +The preferred method for deploying all components is through the Kibana Onboarding UX. Follow these steps: + +1. Navigate in Kibana to **Observability** --> **Add data** +2. Select **Kubernetes**, then choose **Kubernetes monitoring with EDOT Collector**. +3. Follow the on-screen instructions to install the OpenTelemetry Operator using the Helm Chart and the provided `values.yaml`. + +Notes: +- If the `elastic_endpoint` showed by the UI is not valid for your environment, replace it with the correct Elasticsearch endpoint. +- The displayed `elastic_api_key` corresponds to an API key that is automatically generated when the onboarding process is initiated. + +## Manual deployment of all components + +### Elastic Stack preparations + +Before installing the operator follow these actions: + +1. Create an [API Key](https://www.elastic.co/guide/en/kibana/current/api-keys.html), and make note of its value. +(TBD: details of API key permissions). + +2. Install the following integrations in Kibana: + - `System` + - `Kubernetes` + - `Kubernetes OpenTelemetry Assets` + +Notes: +- When using the [Kibana onboarding UX](#deploying-components-using-kibana-onboarding-ux), the previous actions are automatically handled by Kibana. + +### Operator Installation + +1. Create the `opentelemetry-operator-system` Kubernetes namespace: +``` +$ kubectl create namespace opentelemetry-operator-system +``` + +2. Create a secret in Kubernetes with the following command. + ``` + kubectl create -n opentelemetry-operator-system secret generic elastic-secret-otel \ + --from-literal=elastic_endpoint='YOUR_ELASTICSEARCH_ENDPOINT' \ + --from-literal=elastic_api_key='YOUR_ELASTICSEARCH_API_KEY' + ``` + Don't forget to replace + - `YOUR_ELASTICSEARCH_ENDPOINT`: your Elasticsearch endpoint (*with* `https://` prefix example: `https://1234567.us-west2.gcp.elastic-cloud.com:443`). + - `YOUR_ELASTICSEARCH_API_KEY`: your Elasticsearch API Key + +3. Execute the following commands to deploy the Helm Chart. + +``` +$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts +$ helm repo update +$ helm upgrade --install --namespace opentelemetry-operator-system opentelemetry-kube-stack open-telemetry/opentelemetry-kube-stack --values ./resources/kubernetes/operator/helm/values.yaml --version 0.3.0 +``` + +## Installation verification: + +Regardless of the installation method followed, perform the following checks to verify that everything is running properly: + +1. **Check Pods Status** + - Ensure the following components are running without errors: + - **Operator Pod** + - **DaemonSet Collector Pod** + - **Deployment Collector Pod** + +2. **Validate Instrumentation Object** + - Confirm that the **Instrumentation object** is deployed and configured with a valid **endpoint**. + +3. **Kibana Dashboard Check** + - Verify that the **[OTEL][Metrics Kubernetes] Cluster Overview** dashboard in **Kibana** is displaying data correctly. + +4. **Log Data Availability in Kibana** + - In **Kibana Discovery**, confirm the availability of data under the `__logs-*__` data view. + +5. **Metrics Data Availability in Kibana** + - In **Kibana Discovery**, ensure data is available under the `__metrics-*__` data view. + +## Instrumenting Applications + +To enable auto-instrumentation, add the corresponding annotation to the pods of existing deployments (`spec.template.metadata.annotations`), or to the desired namespace (to auto-instrument all pods in the namespace): + +```yaml +metadata: + annotations: + instrumentation.opentelemetry.io/inject-: "opentelemetry-operator-system/elastic-instrumentation" +``` + +where is one of: `go` , `java`, `nodejs`, `python`, `dotnet` + +For detailed instructions and examples on how to instrument applications in Kubernetes using the OpenTelemetry Operator, refer to this guide (TBD-add link and document). + +## Limitations + +### Cert manager + +In Kubernetes, in order for the API server to communicate with the webhook component (created by the Operator), the webhook requires a TLS certificate that the API server is configured to trust. The previous provided configurations sets the Helm Chart to auto generate the required TLS certificates with an expiration policy of 365 days. These certificates **won't be renewed** if the Helm Chart's release is not manually updated. For production environments, it is highly recommended to use a certificate manger like [cert-manager](https://cert-manager.io/docs/installation/). + +If `cert-manager` CRDs are already present in your Kubernetes environment, you can configure the Operator to use them with the following modifications in the values file: + + +```diff +opentelemetry-operator: + manager: + extraArgs: + - --enable-go-instrumentation + admissionWebhooks: + certManager: +- enabled: false ++ enabled: true + +-autoGenerateCert: +- enabled: true +- recreate: true +``` diff --git a/docs/onboarding/8_16/operator/README.md b/docs/onboarding/8_16/operator/README.md deleted file mode 100644 index f9940547..00000000 --- a/docs/onboarding/8_16/operator/README.md +++ /dev/null @@ -1,104 +0,0 @@ -# Kubernetes Onboarding 8.16 - -This guide will help you get up and running with Kubernetes by walking through the setup and integration of key components, starting with the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator/). The OpenTelemetry Operator is an implementation of a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). - -The operator manages: - -- [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) -- [auto-instrumentation](https://opentelemetry.io/docs/concepts/instrumentation/automatic/) of the workloads using OpenTelemetry instrumentation libraries - -## OpenTelemetry Operator - -The [kube-stack Helm Chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-kube-stack) will be utilized to manage the installation of the Operator's Custom Resource Definitions (CRDs) alongside the configuration of a suite of collectors, which will instrument various components of the Kubernetes environment to enable comprehensive observability and monitoring. - -### Daemonset collectors - -The OpenTelemetry components deployed within the DaemonSet collectors are responsible for observing specific signals from each node. To ensure complete data collection, these components must be deployed on every node in the cluster. Failing to do so will result in partial and potentially incomplete data. - -- Host Metrics: Collects host metrics (hostmetrics receiver) specific to each node. -- Kubernetes Metrics: Captures metrics related to the Kubernetes infrastructure on each node. -- Logs: Utilizes a filelog receiver to gather logs from all Pods running on the respective node. -- OTLP Traces Receiver: Opens an HTTP and a GRPC port on the node to receive OTLP trace data. - -### Deployment collector - -The OpenTelemetry components deployed within a Deployment collector focus on gathering data at the cluster level rather than at individual nodes. Unlike DaemonSet collectors, which need to be deployed on every node, a Deployment collector operates as a standalone instance. - -- Kubernetes Events: Monitors and collects events occurring across the entire Kubernetes cluster. -- Cluster Metrics: Captures metrics that provide insights into the overall health and performance of the Kubernetes cluster. - -### Auto-instrumentation - -The Helm Chart is configured to enable zero-code instrumentation using the [Operator's Instrumentation resource](https://github.com/open-telemetry/opentelemetry-operator/?tab=readme-ov-file#opentelemetry-auto-instrumentation-injection) for the following programming languages: - -- Go -- Java -- Node.js -- Python -- .NET - -Auto-instrumentation is enabled by adding the corresponding annotation to the deployment (or namespace to auto-instrument all pods in the namespace) - -```yaml -metadata: - annotations: - instrumentation.opentelemetry.io/inject-: "opentelemetry-operator-system/elastic-instrumentation" -``` - -where is one of: `go` , `java`, `nodejs`, `python`, `dotnet` - - -## Configuration - -Depending on the deployment model (i.e. self-managed, ESS, serverless), different configuration will be needed. - -### Installation - -All signals including logs, metrics, traces/APM go through the collector directly into Elasticsearch using the ES exporter, a collector's processor pipeline will be used to replace the APM server functionality. - -1. Create the `opentelemetry-operator-system` Kubernetes namespace: -``` -$ kubectl create namespace opentelemetry-operator-system -``` - -2. Create a secret in Kubernetes with the following command. - ``` - kubectl create -n opentelemetry-operator-system secret generic elastic-secret-otel \ - --from-literal=elastic_endpoint='YOUR_ELASTICSEARCH_ENDPOINT' \ - --from-literal=elastic_api_key='YOUR_ELASTICSEARCH_API_KEY' - ``` - Don't forget to replace - - `YOUR_ELASTICSEARCH_ENDPOINT`: your Elasticsearch endpoint (*with* `https://` prefix example: `https://1234567.us-west2.gcp.elastic-cloud.com:443`). - - `YOUR_ELASTICSEARCH_API_KEY`: your Elasticsearch API Key - -3. Execute the following commands to deploy the Helm Chart. - -``` -$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts -$ helm repo update -$ helm upgrade --install --namespace opentelemetry-operator-system opentelemetry-kube-stack open-telemetry/opentelemetry-kube-stack --values ./resources/kubernetes/operator/helm/values.yaml --version 0.3.0 -``` - -## Limitations - -### Cert manager - -In Kubernetes, in order for the API server to communicate with the webhook component (created by the Operator), the webhook requires a TLS certificate that the API server is configured to trust. The previous provided configurations sets the Helm Chart to auto generate the required TLS certificates with an expiration policy of 365 days. These certificates **won't be renewed** if the Helm Chart's release is not manually updated. For production environments, it is highly recommended to use a certificate manger like [cert-manager](https://cert-manager.io/docs/installation/). - -If `cert-manager` CRDs are already present in your Kubernetes environment, you can configure the Operator to use them with the following modifications in the values file: - - -```diff -opentelemetry-operator: - manager: - extraArgs: - - --enable-go-instrumentation - admissionWebhooks: - certManager: -- enabled: false -+ enabled: true - --autoGenerateCert: -- enabled: true -- recreate: true -``` diff --git a/docs/onboarding/8_16/operator/troubleshoot-auto-instrumentation.md b/docs/onboarding/8_16/operator/troubleshoot-auto-instrumentation.md new file mode 100644 index 00000000..28276985 --- /dev/null +++ b/docs/onboarding/8_16/operator/troubleshoot-auto-instrumentation.md @@ -0,0 +1,85 @@ +# Troubleshooting auto-instrumentation + +1. Check the operator is running, eg +``` +$ kubectl get pods -n opentelemetry-operator-system +NAME READY STATUS RESTARTS AGE +opentelemetry-kube-stack-opentelemetry-operator-7b8684cfbdbv4hj 2/2 Running 0 58s +... +``` + +2. Check the `Instrumentation` object has been deployed, eg +``` +$ kubectl describe Instrumentation -n opentelemetry-operator-system +Name: elastic-instrumentation +Namespace: opentelemetry-operator-system + ... +Kind: Instrumentation +Metadata: + ... +Spec: + Dotnet: + Image: docker.elastic.co/observability/elastic-otel-dotnet:edge + Go: + Image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha + Java: + Image: docker.elastic.co/observability/elastic-otel-javaagent:1.0.0 + Nodejs: + Image: docker.elastic.co/observability/elastic-otel-node:edge + Python: + Image: docker.elastic.co/observability/elastic-otel-python:edge + ... +``` + +3. Check your pod is running, eg (using example running in banana namespace) +``` +$ kubectl get pods -n banana +NAME READY STATUS RESTARTS AGE +example-otel-app 1/1 Running 0 104s +``` + +4. Check the pod has had the instrumentation initcontainer installed (for golang, container not initcontainer) and that the events show the docker image was successfully pulled and containers started +``` +$ kubectl describe pod/example-otel-app -n banana +Name: example-otel-app +Namespace: banana +... +Annotations: instrumentation.opentelemetry.io/inject-java: opentelemetry-operator-system/elastic-instrumentation +Init Containers: + opentelemetry-auto-instrumentation-java: + Container ID: docker://7ecdf3954263d591b994ed1c0519d16322479b1515b58c1fbbe51d3066210d99 + Image: docker.elastic.co/observability/elastic-otel-javaagent:1.0.0 + Image ID: docker-pullable://docker.elastic.co/observability/elastic-otel-javaagent@sha256:28d65d04a329c8d5545ed579d6c17f0d74800b7b1c5875e75e0efd29e210566a + ... +Containers: + example-otel-app: +... +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal Scheduled 5m3s default-scheduler Successfully assigned banana/example-otel-app to docker-desktop + Normal Pulled 5m3s kubelet Container image "docker.elastic.co/observability/elastic-otel-javaagent:1.0.0" already present on machine + Normal Created 5m3s kubelet Created container opentelemetry-auto-instrumentation-java + Normal Started 5m3s kubelet Started container opentelemetry-auto-instrumentation-java + Normal Pulling 5m2s kubelet Pulling image "docker.elastic.co/demos/apm/k8s-webhook-test" + Normal Pulled 5m1s kubelet Successfully pulled image "docker.elastic.co/demos/apm/k8s-webhook-test" in 1.139s (1.139s including waiting). Image size: 406961626 bytes. + Normal Created 5m1s kubelet Created container example-otel-app + Normal Started 5m1s kubelet Started container example-otel-app +``` + +5.(a) check your pod logs - look for agent output eg +``` +$ kubectl logs example-otel-app -n banana +... +[otel.javaagent 2024-10-11 13:32:44:127 +0000] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 1.0.0 +... +``` + +5.(b) if there is no obvious agent log output, restart the pod with agent log level set to debug and look for agent debug output. Setting the agent to debug is different for the different language agents. +- All langs: add/set environment variable OTEL_LOG_LEVEL set to debug, eg +``` + env: + - name: OTEL_LOG_LEVEL + value: "debug" +``` +- Java: add/set environment variable OTEL_JAVAAGENT_DEBUG set to true diff --git a/resources/kubernetes/operator/helm/values.yaml b/resources/kubernetes/operator/helm/values.yaml index 092ad3c7..06f114db 100644 --- a/resources/kubernetes/operator/helm/values.yaml +++ b/resources/kubernetes/operator/helm/values.yaml @@ -71,11 +71,11 @@ collectors: resourcedetection/gcp: detectors: [env, gcp] timeout: 2s - override: false + override: true resourcedetection/aks: detectors: [env, aks] timeout: 2s - override: false + override: true aks: resource_attributes: k8s.cluster.name: @@ -447,7 +447,7 @@ collectors: mode: ecs processors: batch: {} - elastictrace: + elastictrace: {} lsminterval: intervals: - duration: 1m @@ -480,11 +480,11 @@ collectors: resourcedetection/gcp: detectors: [env, gcp] timeout: 2s - override: false + override: true resourcedetection/aks: detectors: [env, aks] timeout: 2s - override: false + override: true aks: resource_attributes: k8s.cluster.name: diff --git a/test/operator/elastic-instrumentation.yml b/test/operator/elastic-instrumentation.yml index d3f3f956..ff1218ee 100644 --- a/test/operator/elastic-instrumentation.yml +++ b/test/operator/elastic-instrumentation.yml @@ -16,10 +16,10 @@ spec: java: image: docker.elastic.co/observability/elastic-otel-javaagent:1.0.0 nodejs: - image: docker.elastic.co/observability/elastic-otel-node:edge + image: docker.elastic.co/observability/elastic-otel-node:0.4.1 dotnet: image: docker.elastic.co/observability/elastic-otel-dotnet:edge python: - image: docker.elastic.co/observability/elastic-otel-python:edge + image: docker.elastic.co/observability/elastic-otel-python:0.3.0 go: image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha