Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updated metrics in documentation #7454

Merged
merged 3 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions website/content/en/docs/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,19 +252,19 @@ Total number of adds handled by workqueue

### `operator_status_condition_transitions_total`
The count of transitions of a given object, type and status.
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. e.g. Alarm := P99(Updated=False) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Alarm := P99(Updated=Unknown) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_count`
The number of an condition for a given object, type and status. e.g. Alarm := Available=False > 0
- Stability Level: BETA
- Stability Level: DEPRECATED

## Client Go Metrics

Expand Down
112 changes: 107 additions & 5 deletions website/content/en/preview/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,30 @@ Number of nodeclaims disrupted in total by Karpenter. Labeled by reason the node
Number of nodeclaims created in total by Karpenter. Labeled by reason the nodeclaim was created and the owning nodepool.
- Stability Level: STABLE

### `operator_nodeclaim_status_condition_transitions_total`
The count of transitions of a nodeclaim, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_count`
The number of a condition for a nodeclaim, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_termination_current_time_seconds`
The current amount of time in seconds that a nodeclaim has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodeclaim_termination_duration_seconds`
The amount of time taken by a nodeclaim to terminate completely.
- Stability Level: BETA

## Nodes Metrics

### `karpenter_nodes_total_pod_requests`
Expand Down Expand Up @@ -92,6 +116,34 @@ Number of nodes created in total by Karpenter. Labeled by owning nodepool.
Node allocatable are the resources allocatable by nodes.
- Stability Level: BETA

### `operator_node_status_condition_transitions_total`
The count of transitions of a node, type and status.
- Stability Level: BETA

### `operator_node_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_node_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_status_condition_count`
The number of a condition for a node, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_termination_current_time_seconds`
The current amount of time in seconds that a node has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_node_termination_duration_seconds`
The amount of time taken by a node to terminate completely.
- Stability Level: BETA

### `operator_node_event_count`
The number of a events for a node.
- Stability Level: BETA

## Pods Metrics

### `karpenter_pods_state`
Expand All @@ -110,7 +162,7 @@ The amount of time taken by an object to terminate completely.

### `operator_termination_current_time_seconds`
The current amount of time in seconds that an object has been in terminating state.
- Stability Level: ALPHA
- Stability Level: DEPRECATED

## Voluntary Disruption Metrics

Expand Down Expand Up @@ -158,6 +210,56 @@ Limits specified on the nodepool that restrict the quantity of resources provisi
The number of nodes for a given NodePool that can be concurrently disrupting at a point in time. Labeled by NodePool. Note that allowed disruptions can change very rapidly, as new nodes may be created and others may be deleted at any point.
- Stability Level: ALPHA

### `operator_nodepool_status_condition_transitions_total`
The count of transitions of a nodepool, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodepool_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodepool_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_status_condition_count`
The number of an condition for a nodepool, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_termination_current_time_seconds`
The current amount of time in seconds that a nodepool has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodepool_termination_duration_seconds`
Duration of NodePool termination in seconds.
- Stability Level: BETA

## EC2NodeClass Metrics

### `operator_ec2nodeclass_status_condition_transitions_total`
The count of transitions of a ec2nodeclass, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_count`
The number of an condition for an ec2nodeclass, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_current_time_seconds`
The current amount of time in seconds that an ec2nodeclass has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_duration_seconds`
Duration of ec2nodeclass termination in seconds.
- Stability Level: BETA

## Interruption Metrics

### `karpenter_interruption_received_messages_total`
Expand Down Expand Up @@ -292,19 +394,19 @@ Total number of adds handled by workqueue

### `operator_status_condition_transitions_total`
The count of transitions of a given object, type and status.
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. e.g. Alarm := P99(Updated=False) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Alarm := P99(Updated=Unknown) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_count`
The number of an condition for a given object, type and status. e.g. Alarm := Available=False > 0
- Stability Level: BETA
- Stability Level: DEPRECATED

## Client Go Metrics

Expand Down
1 change: 1 addition & 0 deletions website/content/en/preview/upgrading/upgrade-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ WHEN CREATING A NEW SECTION OF THE UPGRADE GUIDANCE FOR NEWER VERSIONS, ENSURE T
* Note: This userData configuration is _only_ valid on Bottlerocket v1.22.0+. If you are using an earlier version of a Bottlerocket image (< v1.22.0) with `amiFamily: Bottlerocket` and `instanceStorePolicy: RAID0`, nodes will fail to join the cluster.
* The AWS Neuron accelerator well known name label (`karpenter.k8s.aws/instance-accelerator-name`) values now reflect their correct names of `trainium`, `inferentia`, and `inferentia2`. Previously, all Neuron accelerators were assigned the label name of `inferentia`.
* Karpenter drops the internal `karpenter.k8s.aws/cluster` tag used for launch template management in favor of `eks:eks-cluster-name` and consistency with other Karpenter-provisioned resources
* Generic operator metrics have been have been deprecated and replaced by resource-specific metrics.

### Upgrading to `1.0.0`+

Expand Down