|
| 1 | +# Agent Check: Kueue |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This check monitors Kueue through the Datadog Agent. |
| 6 | + |
| 7 | +Kueue is a Kubernetes workload queueing system that allows you to manage and schedule workloads on your Kubernetes cluster. It provides a way to prioritize and manage workloads, and to ensure that workloads are scheduled in a fair and efficient manner. This integration collects metrics from the Kueue controller manager and Kueue API server to help you monitor the health and performance of your Kueue cluster. |
| 8 | + |
| 9 | +## Setup |
| 10 | + |
| 11 | +Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates][3]. |
| 12 | + |
| 13 | +### Installation |
| 14 | + |
| 15 | +The Kueue check is included in the [Datadog Agent][2] package. |
| 16 | +No additional installation is required on your server. |
| 17 | + |
| 18 | +### Configuration |
| 19 | + |
| 20 | +Kueue is a cluster-level service. Configure this integration as a Cluster Agent cluster check so only one Agent instance scrapes the Kueue metrics endpoint. |
| 21 | + |
| 22 | +1. To collect optional ClusterQueue resource metrics, such as `kueue.cluster_queue.resource_usage.gpu`, configure Kueue with `metrics.enableClusterQueueResources: true` and restart the Kueue controller manager. |
| 23 | + |
| 24 | +2. Provide a [cluster check configuration][10] to the Cluster Agent. For file or ConfigMap based configuration, set `cluster_check: true` in the instance: |
| 25 | + |
| 26 | + ```yaml |
| 27 | + clusterAgent: |
| 28 | + confd: |
| 29 | + kueue.yaml: |- |
| 30 | + cluster_check: true |
| 31 | + init_config: |
| 32 | + instances: |
| 33 | + - openmetrics_endpoint: http://kueue-controller-manager-metrics-service.kueue-system.svc:8080/metrics |
| 34 | + ``` |
| 35 | +
|
| 36 | +3. Alternatively, annotate the Kueue metrics service with Autodiscovery cluster check annotations: |
| 37 | +
|
| 38 | + ```yaml |
| 39 | + ad.datadoghq.com/endpoints.checks: | |
| 40 | + { |
| 41 | + "kueue": { |
| 42 | + "instances": [ |
| 43 | + { |
| 44 | + "openmetrics_endpoint": "http://%%host%%:%%port%%/metrics" |
| 45 | + } |
| 46 | + ] |
| 47 | + } |
| 48 | + } |
| 49 | + ``` |
| 50 | +
|
| 51 | +See the [sample kueue.d/conf.yaml][4] for all available configuration options. |
| 52 | +
|
| 53 | +### Validation |
| 54 | +
|
| 55 | +[Run the Cluster Agent's `clusterchecks` subcommand][11] and look for `kueue` under the Checks section. |
| 56 | + |
| 57 | +## Data Collected |
| 58 | + |
| 59 | +### Metrics |
| 60 | + |
| 61 | +See [metadata.csv][7] for a list of metrics provided by this integration. |
| 62 | + |
| 63 | +### Events |
| 64 | + |
| 65 | +The Kueue integration does not include any events. |
| 66 | + |
| 67 | +## Troubleshooting |
| 68 | + |
| 69 | +Need help? Contact [Datadog support][8]. |
| 70 | + |
| 71 | + |
| 72 | +[2]: https://app.datadoghq.com/account/settings/agent/latest |
| 73 | +[3]: https://docs.datadoghq.com/containers/kubernetes/integrations/ |
| 74 | +[4]: https://github.com/DataDog/integrations-core/blob/master/kueue/datadog_checks/kueue/data/conf.yaml.example |
| 75 | +[5]: https://docs.datadoghq.com/agent/configuration/agent-commands/#start-stop-and-restart-the-agent |
| 76 | +[6]: https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-status-and-information |
| 77 | +[7]: https://github.com/DataDog/integrations-core/blob/master/kueue/metadata.csv |
| 78 | +[8]: https://docs.datadoghq.com/help/ |
| 79 | +[10]: https://docs.datadoghq.com/containers/cluster_agent/clusterchecks/?tab=helm#configuration-from-configuration-files |
| 80 | +[11]: https://docs.datadoghq.com/containers/troubleshooting/cluster-and-endpoint-checks/#dispatching-logic-in-the-cluster-agent |
0 commit comments