Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cloudwatch alarms #7825

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ require (
github.com/PuerkitoBio/goquery v1.10.2
github.com/avast/retry-go v3.0.0+incompatible
github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3
github.com/aws/aws-sdk-go-v2 v1.36.2
github.com/aws/aws-sdk-go-v2 v1.36.3
github.com/aws/aws-sdk-go-v2/config v1.29.7
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.29
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.44.0
github.com/aws/aws-sdk-go-v2/service/ec2 v1.203.1
github.com/aws/aws-sdk-go-v2/service/eks v1.58.1
github.com/aws/aws-sdk-go-v2/service/fis v1.32.1
Expand Down Expand Up @@ -52,8 +53,8 @@ require (
github.com/Masterminds/semver/v3 v3.2.1 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.17.60 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.33 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.33 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.34 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.14 // indirect
Expand Down
14 changes: 8 additions & 6 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,22 @@ github.com/avast/retry-go v3.0.0+incompatible h1:4SOWQ7Qs+oroOTQOYnAHqelpCO0biHS
github.com/avast/retry-go v3.0.0+incompatible/go.mod h1:XtSnn+n/sHqQIpZ10K1qAevBhOOCWBLXXy3hyiqqBrY=
github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3 h1:B4o15iZP8CQoyDjoNAoQiyEPabLsgxXLY5tv3uvvCic=
github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3/go.mod h1:k4zcf2Dz/Mvrgo8NVzAEWP5HK4USqbJTD93pVVDxvc0=
github.com/aws/aws-sdk-go-v2 v1.36.2 h1:Ub6I4lq/71+tPb/atswvToaLGVMxKZvjYDVOWEExOcU=
github.com/aws/aws-sdk-go-v2 v1.36.2/go.mod h1:LLXuLpgzEbD766Z5ECcRmi8AzSwfZItDtmABVkRLGzg=
github.com/aws/aws-sdk-go-v2 v1.36.3 h1:mJoei2CxPutQVxaATCzDUjcZEjVRdpsiiXi2o38yqWM=
github.com/aws/aws-sdk-go-v2 v1.36.3/go.mod h1:LLXuLpgzEbD766Z5ECcRmi8AzSwfZItDtmABVkRLGzg=
github.com/aws/aws-sdk-go-v2/config v1.29.7 h1:71nqi6gUbAUiEQkypHQcNVSFJVUFANpSeUNShiwWX2M=
github.com/aws/aws-sdk-go-v2/config v1.29.7/go.mod h1:yqJQ3nh2HWw/uxd56bicyvmDW4KSc+4wN6lL8pYjynU=
github.com/aws/aws-sdk-go-v2/credentials v1.17.60 h1:1dq+ELaT5ogfmqtV1eocq8SpOK1NRsuUfmhQtD/XAh4=
github.com/aws/aws-sdk-go-v2/credentials v1.17.60/go.mod h1:HDes+fn/xo9VeszXqjBVkxOo/aUy8Mc6QqKvZk32GlE=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.29 h1:JO8pydejFKmGcUNiiwt75dzLHRWthkwApIvPoyUtXEg=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.29/go.mod h1:adxZ9i9DRmB8zAT0pO0yGnsmu0geomp5a3uq5XpgOJ8=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.33 h1:knLyPMw3r3JsU8MFHWctE4/e2qWbPaxDYLlohPvnY8c=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.33/go.mod h1:EBp2HQ3f+XCB+5J+IoEbGhoV7CpJbnrsd4asNXmTL0A=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.33 h1:K0+Ne08zqti8J9jwENxZ5NoUyBnaFDTu3apwQJWrwwA=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.33/go.mod h1:K97stwwzaWzmqxO8yLGHhClbVW1tC6VT1pDLk1pGrq4=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34 h1:ZK5jHhnrioRkUNOc+hOgQKlUL5JeC3S6JgLxtQ+Rm0Q=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34/go.mod h1:p4VfIceZokChbA9FzMbRGz5OV+lekcVtHlPKEO0gSZY=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.34 h1:SZwFm17ZUNNg5Np0ioo/gq8Mn6u9w19Mri8DnJ15Jf0=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.34/go.mod h1:dFZsC0BLo346mvKQLWmoJxT+Sjp+qcVR1tRVHQGOH9Q=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 h1:bIqFDwgGXXN1Kpp99pDOdKMTTb5d2KyU5X/BZxjOkRo=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3/go.mod h1:H5O/EsxDWyU+LP/V8i5sm8cxoZgc2fdNR9bxlOFrQTo=
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.44.0 h1:0cF07Fs0CT8XSLGGFqp0VNJD+sb447S8UQU7hz95xJo=
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.44.0/go.mod h1:HJlcOk+S/wjJuR/8jPa8GhnEKdKqqiQ5wjsE1PjuO1o=
github.com/aws/aws-sdk-go-v2/service/ec2 v1.203.1 h1:ZgY9zeVAe+54Qa7o1GXKRNTez79lffCeJSSinhl+qec=
github.com/aws/aws-sdk-go-v2/service/ec2 v1.203.1/go.mod h1:0naMk66LtdeTmE+1CWQTKwtzOQ2t8mavOhMhR0Pv1m0=
github.com/aws/aws-sdk-go-v2/service/eks v1.58.1 h1:w/GEycBxTO4psb9Mw8g3b9/dktLE5GeYP1uj3nZ+85M=
Expand Down
160 changes: 160 additions & 0 deletions pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -807,8 +807,168 @@ spec:
- zone
type: object
type: array
cloudWatchAlarms:
description: CloudWatchAlarms defines the CloudWatch alarms to be created for EC2 instances in this NodeClass
type: array
items:
type: object
properties:
actionsEnabled:
description: Indicates whether actions should be executed during any changes to the alarm state
type: boolean
alarmActions:
description: Actions to execute when this alarm transitions to the ALARM state
type: array
items:
type: string
pattern: ^arn:aws:.*$
alarmDescription:
description: Description for the alarm
type: string
alarmName:
description: Name of the alarm
type: string
comparisonOperator:
description: Comparison operator used to compare the metric with the threshold
type: string
enum:
- GreaterThanThreshold
- GreaterThanOrEqualToThreshold
- LessThanThreshold
- LessThanOrEqualToThreshold
- LessThanLowerOrGreaterThanUpperThreshold
- LessThanLowerThreshold
- GreaterThanUpperThreshold
datapointsToAlarm:
description: Number of datapoints that must be breaching to trigger the alarm
type: integer
minimum: 1
dimensions:
description: Dimensions for the metric associated with the alarm
type: array
items:
type: object
required:
- name
- value
properties:
name:
description: Name of the dimension
type: string
value:
description: Value of the dimension
type: string
evaluateLowSampleCountPercentile:
description: Used only for alarms based on percentiles
type: string
enum:
- evaluate
- ignore
evaluationPeriods:
description: Number of periods over which data is compared to the threshold
type: integer
minimum: 1
extendedStatistic:
description: Percentile statistic for the metric
type: string
pattern: ^p(\d{1,2}(\.\d{0,2})?|100)$
insufficientDataActions:
description: Actions to execute when this alarm transitions to the INSUFFICIENT_DATA state
type: array
items:
type: string
pattern: ^arn:aws:.*$
metricName:
description: Name of the metric associated with the alarm
type: string
namespace:
description: Namespace of the metric
type: string
okActions:
description: Actions to execute when this alarm transitions to the OK state
type: array
items:
type: string
pattern: ^arn:aws:.*$
period:
description: Period in seconds over which the metric is evaluated
type: integer
minimum: 10
statistic:
description: Statistic for the metric
type: string
enum:
- SampleCount
- Average
- Sum
- Minimum
- Maximum
tags:
description: Tags to be attached to the alarm
type: array
items:
type: object
required:
- key
- value
properties:
key:
type: string
value:
type: string
threshold:
description: Threshold value for the metric
type: number
thresholdMetricId:
description: ID of a threshold metric
type: string
treatMissingData:
description: How to treat missing data points
type: string
enum:
- breaching
- notBreaching
- ignore
- missing
unit:
description: Unit of the metric
type: string
enum:
- Seconds
- Microseconds
- Milliseconds
- Bytes
- Kilobytes
- Megabytes
- Gigabytes
- Terabytes
- Bits
- Kilobits
- Megabits
- Gigabits
- Terabits
- Percent
- Count
- Bytes/Second
- Kilobytes/Second
- Megabytes/Second
- Gigabytes/Second
- Terabytes/Second
- Bits/Second
- Kilobits/Second
- Megabits/Second
- Gigabits/Second
- Terabits/Second
- Count/Second
- None
required:
- alarmName
- comparisonOperator
- evaluationPeriods
- threshold
type: object
type: object

served: true
storage: true
subresources:
Expand Down
119 changes: 119 additions & 0 deletions pkg/apis/v1/ec2nodeclass.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,12 +137,131 @@ type EC2NodeClassSpec struct {
// +kubebuilder:default={"httpEndpoint":"enabled","httpProtocolIPv6":"disabled","httpPutResponseHopLimit":1,"httpTokens":"required"}
// +optional
MetadataOptions *MetadataOptions `json:"metadataOptions,omitempty"`
// TODO: Cloudwatch Alarms
CloudWatchAlarms []*CloudWatchAlarms `json:"cloudwatchAlarms,omitempty"`
// Context is a Reserved field in EC2 APIs
// https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html
// +optional
Context *string `json:"context,omitempty"`
}

// CloudWatchAlarms defines the configuration for CloudWatch alarms to be created for EC2 instances
type CloudWatchAlarms struct {
// ActionsEnabled indicates whether actions should be executed during any changes to the alarm state
// +optional
ActionsEnabled *bool `json:"actionsEnabled,omitempty"`

// AlarmActions is a list of ARNs to execute when this alarm transitions to the ALARM state
// +kubebuilder:validation:MaxItems=5
// +optional
AlarmActions []string `json:"alarmActions,omitempty"`

// AlarmDescription is a description for the alarm
// +optional
AlarmDescription *string `json:"alarmDescription,omitempty"`

// AlarmName is the name of the alarm
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=255
AlarmName string `json:"alarmName"`

// ComparisonOperator is the arithmetic operation to use when comparing the specified statistic and threshold
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum={"GreaterThanThreshold","GreaterThanOrEqualToThreshold","LessThanThreshold","LessThanOrEqualToThreshold","LessThanLowerOrGreaterThanUpperThreshold","LessThanLowerThreshold","GreaterThanUpperThreshold"}
ComparisonOperator string `json:"comparisonOperator"`

// DatapointsToAlarm is the number of datapoints that must be breaching to trigger the alarm
// +kubebuilder:validation:Minimum=1
// +optional
DatapointsToAlarm *int32 `json:"datapointsToAlarm,omitempty"`

// Dimensions for the metric associated with the alarm
// +optional
Dimensions []CloudwatchAlarmDimension `json:"dimensions,omitempty"`

// EvaluateLowSampleCountPercentile is used only for alarms based on percentiles
// +kubebuilder:validation:Enum={"evaluate","ignore"}
// +optional
EvaluateLowSampleCountPercentile *string `json:"evaluateLowSampleCountPercentile,omitempty"`

// EvaluationPeriods is the number of periods over which data is compared to the threshold
// +kubebuilder:validation:Required
// +kubebuilder:validation:Minimum=1
EvaluationPeriods int32 `json:"evaluationPeriods"`

// ExtendedStatistic is the percentile statistic for the metric
// +kubebuilder:validation:Pattern="^p(100|[0-9]{1,2}(\\.[0-9]{0,2})?)$"
// +optional
ExtendedStatistic *string `json:"extendedStatistic,omitempty"`

// InsufficientDataActions is a list of ARNs to execute when this alarm transitions to the INSUFFICIENT_DATA state
// +kubebuilder:validation:MaxItems=5
// +optional
InsufficientDataActions []string `json:"insufficientDataActions,omitempty"`

// MetricName is the name of the metric associated with the alarm
// +optional
MetricName *string `json:"metricName,omitempty"`

// Namespace is the namespace of the metric
// +optional
Namespace *string `json:"namespace,omitempty"`

// OKActions is a list of ARNs to execute when this alarm transitions to the OK state
// +kubebuilder:validation:MaxItems=5
// +optional
OKActions []string `json:"okActions,omitempty"`

// Period is the length of time in seconds over which the statistic is applied
// +kubebuilder:validation:Minimum=10
// +kubebuilder:validation:MultipleOf=60
// +optional
Period *int32 `json:"period,omitempty"`

// Statistic is the statistic to apply to the alarm's associated metric
// +kubebuilder:validation:Enum={"SampleCount","Average","Sum","Minimum","Maximum"}
// +optional
Statistic *string `json:"statistic,omitempty"`

// Tags to be attached to the alarm
// +optional
Tags map[string]string `json:"tags,omitempty"`

// Threshold is the value against which the specified statistic is compared
// +kubebuilder:validation:Required
Threshold float64 `json:"threshold"`

// ThresholdMetricId is the ID of a threshold metric
// +optional
ThresholdMetricId *string `json:"thresholdMetricId,omitempty"`

// TreatMissingData specifies how to treat missing data points in the alarm evaluation
// +kubebuilder:validation:Enum={"breaching","notBreaching","ignore","missing"}
// +optional
TreatMissingData *string `json:"treatMissingData,omitempty"`

// Unit is the unit of the metric
// +kubebuilder:validation:Enum={"Seconds","Microseconds","Milliseconds","Bytes","Kilobytes","Megabytes","Gigabytes","Terabytes","Bits","Kilobits","Megabits","Gigabits","Terabits","Percent","Count","Bytes/Second","Kilobytes/Second","Megabytes/Second","Gigabytes/Second","Terabytes/Second","Bits/Second","Kilobits/Second","Megabits/Second","Gigabits/Second","Terabits/Second","Count/Second","None"}
// +optional
Unit *string `json:"unit,omitempty"`
}

// Dimension represents a CloudWatch metric dimension
type CloudwatchAlarmDimension struct {
// Name of the dimension
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=255
Name string `json:"name"`

// Value of the dimension
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=255
Value string `json:"value"`
}

// SubnetSelectorTerm defines selection logic for a subnet used by Karpenter to launch nodes.
// If multiple fields are used for selection, the requirements are ANDed.
type SubnetSelectorTerm struct {
Expand Down
5 changes: 5 additions & 0 deletions pkg/aws/sdk.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ package sdk
import (
"context"

"github.com/aws/aws-sdk-go-v2/service/cloudwatch"
"github.com/aws/aws-sdk-go-v2/service/ec2"
"github.com/aws/aws-sdk-go-v2/service/eks"
"github.com/aws/aws-sdk-go-v2/service/iam"
Expand Down Expand Up @@ -73,3 +74,7 @@ type SQSAPI interface {
type TimestreamWriteAPI interface {
WriteRecords(ctx context.Context, params *timestreamwrite.WriteRecordsInput, optFns ...func(*timestreamwrite.Options)) (*timestreamwrite.WriteRecordsOutput, error)
}

type CloudWatchAPI interface {
PutMetricAlarm(context.Context, *cloudwatch.PutMetricAlarmInput, ...func(*cloudwatch.Options)) (*cloudwatch.PutMetricAlarmOutput, error)
}
5 changes: 5 additions & 0 deletions pkg/controllers/nodeclass/cloudwatchalarms.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
package nodeclass

type CloudwatchAlarm struct {
cloudwatchAlarmProvider cloudwatchalarm.Provider
}
Loading