-
Notifications
You must be signed in to change notification settings - Fork 541
Monitoring API: Add AlertmanagerMainConfig #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
752d758
3ab2ffe
4419f85
48ac865
a5955df
3f096df
7e6994b
b24d0e7
351b447
e3ed165
0bea26c
b758adf
2a869cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -17,6 +17,8 @@ limitations under the License. | |||
package v1alpha1 | ||||
|
||||
import ( | ||||
v1 "k8s.io/api/core/v1" | ||||
"k8s.io/apimachinery/pkg/api/resource" | ||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||||
) | ||||
|
||||
|
@@ -72,15 +74,19 @@ type ClusterMonitoringList struct { | |||
} | ||||
|
||||
// ClusterMonitoringSpec defines the desired state of Cluster Monitoring Operator | ||||
// +required | ||||
type ClusterMonitoringSpec struct { | ||||
// userDefined set the deployment mode for user-defined monitoring in addition to the default platform monitoring. | ||||
// +required | ||||
// userDefined is optional. | ||||
// +optional | ||||
UserDefined UserDefinedMonitoring `json:"userDefined"` | ||||
// alertmanagerConfig allows users to configure how the default Alertmanager instance | ||||
// should be deployed in the `openshift-monitoring` namespace. | ||||
// alertmanagerConfig is optional. | ||||
// +optional | ||||
AlertmanagerConfig AlertmanagerConfig `json:"alertmanagerConfig"` | ||||
} | ||||
|
||||
// UserDefinedMonitoring config for user-defined projects. | ||||
// +required | ||||
type UserDefinedMonitoring struct { | ||||
// mode defines the different configurations of UserDefinedMonitoring | ||||
// Valid values are Disabled and NamespaceIsolated | ||||
|
@@ -101,3 +107,222 @@ const ( | |||
// UserDefinedNamespaceIsolated enables monitoring for user-defined projects with namespace-scoped tenancy. This ensures that metrics, alerts, and monitoring data are isolated at the namespace level. | ||||
UserDefinedNamespaceIsolated UserDefinedMode = "NamespaceIsolated" | ||||
) | ||||
|
||||
// alertmanagerConfig provides configuration options for the default Alertmanager instance | ||||
// that runs in the `openshift-monitoring` namespace. Use this configuration to control | ||||
// whether the default Alertmanager is deployed, how it logs, and how its pods are scheduled. | ||||
// | ||||
// +union | ||||
// +kubebuilder:validation:XValidation:rule="self.deploymentMode == 'Deployed' ? has(self.deployed) : !has(self.deployed)",message="deployed must be set when deploymentMode is Deployed, and must be unset otherwise" | ||||
type AlertmanagerConfig struct { | ||||
// deploymentMode determines whether the default Alertmanager instance should be deployed | ||||
// as part of the monitoring stack. | ||||
// Allowed values are Deployed and NotDeployed. | ||||
// When set to Deployed, the Cluster Monitoring Operator | ||||
// ensures that an Alertmanager instance is created and managed in the `openshift-monitoring` namespace. | ||||
// When set to NotDeployed, the operator will not deploy the Alertmanager instance. | ||||
// Use this field if you want to explicitly opt in or out of running a platform-level Alertmanager. | ||||
// | ||||
// deploymentMode is required. | ||||
// +unionDiscriminator | ||||
// +kubebuilder:validation:Enum=Deployed;NotDeployed | ||||
// +required | ||||
DeploymentMode string `json:"deploymentMode"` | ||||
|
||||
// deployed contains configuration options for the deployed Alertmanager instance. | ||||
// +optional | ||||
Deployed *AlertmanagerDeployedConfig `json:"deployed,omitempty"` | ||||
Comment on lines
+132
to
+134
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For discriminated unions, this field must be set when the discriminator is set to Line 9 in 7318813
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is correct now? |
||||
} | ||||
|
||||
// alertmanagerConfig provides configuration options for the default Alertmanager instance | ||||
// that runs in the `openshift-monitoring` namespace. Use this configuration to control | ||||
// whether the default Alertmanager is deployed, how it logs, and how its pods are scheduled. | ||||
// | ||||
// Required: This field must be specified. | ||||
type AlertmanagerDeployedConfig struct { | ||||
// userModeConfig controls whether Alertmanager should process configurations from user-defined (non-platform) | ||||
// namespaces for AlertmanagerConfig lookups. | ||||
// Alertmanager will search for AlertmanagerConfig resources in user-defined namespaces. | ||||
// This field is only effective when the user workload Alertmanager instance is not enabled. | ||||
// If the user workload monitoring Alertmanager is enabled, this field is ignored. | ||||
// userMode is required. | ||||
// Allowed values are Selectable and None | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens when each of these values are specified? How is something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. empty is the same as none. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well I did selectable and none instead of enable or disable, correct? |
||||
// Default value is None | ||||
// +kubebuilder:validation:Enum="";Selectable;None | ||||
// +optional | ||||
UserModeConfig UserAlertManagerModeConfig `json:"userModeConfig"` | ||||
// logLevel defines the verbosity of logs emitted by Alertmanager. | ||||
// This field allows users to control the amount and severity of logs generated, which can be useful | ||||
// for debugging issues or reducing noise in production environments. | ||||
// Allowed values are Error, Warn, Info, Debug, and omitted. | ||||
// When set to Error, only errors will be logged. | ||||
// When set to Warn, both warnings and errors will be logged. | ||||
// When set to Info, general information, warnings, and errors will all be logged. | ||||
// When set to Debug, detailed debugging information will be logged. | ||||
// When omitted, this means no opinion and the platform is left to choose a default that is subject to change over time. | ||||
// Currently, the default is Info. | ||||
// +optional | ||||
LogLevel LogLevel `json:"logLevel"` | ||||
// nodeSelector is the node selector applied to network diagnostics components | ||||
// nodeSelector is optional. | ||||
// | ||||
// When omitted, this means the user has no opinion and the platform is left | ||||
// to choose reasonable defaults. These defaults are subject to change over time. | ||||
// +optional | ||||
NodeSelector map[string]string `json:"nodeSelector,omitempty"` | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
// resources defines the compute resource requests and limits for the Alertmanager container. | ||||
// This includes CPU, memory and HugePages constraints to help control scheduling and resource usage. | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
// When not specified, defaults are used by the platform. Requests cannot exceed limits. | ||||
// This field is optional. | ||||
// More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ | ||||
// This is a simplified API that maps to Kubernetes ResourceRequirements. | ||||
// +optional | ||||
Resources *AlertmanagerContainerResources `json:"resources,omitempty"` | ||||
// secrets Defines a list of secrets that need to be mounted into the Alertmanager. | ||||
// The secrets must reside within the same namespace as the Alertmanager object. | ||||
// They will be added as volumes named secret-<secret-name> and mounted at | ||||
// /etc/alertmanager/secrets/<secret-name> within the 'alertmanager' container of | ||||
// the Alertmanager Pods. | ||||
// This field is optional. | ||||
Comment on lines
+181
to
+186
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You've explained what this does, but why would a user care to configure these secrets? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we assume that a user who uses OpenShift knows why to use secrets. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, an OpenShift user likely understands the benefits of using secrets - how does adding secrets here help in configuring alertmanager? What things would it configure on alertmanager and/or allow alertmanager to do differently? These are the things I suspect users may not know about (nor do I know). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, let me think about it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||
// Maximum length for this list is 10 | ||||
// +optional | ||||
// +kubebuilder:validation:MaxItems=10 | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
Secrets []SecretName `json:"secrets,omitempty"` | ||||
// tolerations is a list of tolerations applied to network diagnostics components | ||||
// tolerations is optional. | ||||
// | ||||
// When omitted, this means the user has no opinion and the platform is left | ||||
// to choose reasonable defaults. These defaults are subject to change over time. | ||||
// Maximum length for this list is 10 | ||||
// +kubebuilder:validation:MaxItems=10 | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
// +optional | ||||
Tolerations []v1.Toleration `json:"tolerations,omitempty"` | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
// topologySpreadConstraints defines rules for how Alertmanager Pods should be distributed | ||||
// across topology domains such as zones, nodes, or other user-defined labels. | ||||
// topologySpreadConstraints is optional. | ||||
// This helps improve high availability and resource efficiency by avoiding placing | ||||
// too many replicas in the same failure domain. | ||||
// | ||||
// When omitted, this means no opinion and the platform is left to choose a default, which is subject to change over time. | ||||
// This field maps directly to the `topologySpreadConstraints` field in the Pod spec. | ||||
// Maximum length for this list is 10 | ||||
// +kubebuilder:validation:MaxItems=10 | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
// +optional | ||||
TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"` | ||||
// volumeClaimTemplate Defines persistent storage for Alertmanager. Use this setting to | ||||
// configure the persistent volume claim, including storage class, volume | ||||
// size, and name. | ||||
// If omitted, the Pod uses ephemeral storage and alert data will not persist | ||||
// across restarts. | ||||
// // This field is optional. | ||||
// +optional | ||||
VolumeClaimTemplate *v1.PersistentVolumeClaim `json:"volumeClaimTemplate,omitempty"` | ||||
} | ||||
|
||||
// SecretName is a type that represents the name of a Secret in the same namespace. | ||||
// It must be at most 253 characters in length. | ||||
// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | ||||
// +kubebuilder:validation:MaxLength=253 | ||||
type SecretName string | ||||
|
||||
// AlertManagerDeployMode defines the deployment state of the platform Alertmanager instance. | ||||
// | ||||
// Possible values: | ||||
// - "Deployed": The Alertmanager instance will be deployed and managed by the operator. | ||||
// - "NotDeployed": The operator will not deploy an Alertmanager instance. | ||||
type AlertManagerDeployMode string | ||||
|
||||
const ( | ||||
// AlertManagerModeEnabled means the Alertmanager instance will be deployed and managed by the operator. | ||||
AlertManagerDeployModeDeployed AlertManagerDeployMode = "Deployed" | ||||
|
||||
// AlertManagerModeDisabled means the operator will not deploy the Alertmanager instance. | ||||
AlertManagerDeployModeNotDeployed AlertManagerDeployMode = "NotDeployed" | ||||
) | ||||
|
||||
// UserAlertManagerModeConfig defines mode for user-defines namespaced | ||||
// | ||||
// Possible values: | ||||
// - "Selectable": User-defined namespaces can be selected for AlertmanagerConfig lookups. | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, where is the selector that the user would configure to determine which namespaces to use? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the selector is use alertmanagerMain or user defined config |
||||
// - "None": User-defined namespaces cannot be selected for AlertmanagerConfig lookups. | ||||
type UserAlertManagerModeConfig string | ||||
|
||||
const ( | ||||
// UserAlertmanagerEnabled enables user-defined namespaces to be selected for `AlertmanagerConfig` lookups. This setting only | ||||
// applies if the user workload monitoring instance of Alertmanager is not enabled. | ||||
UserAlertManagerModeSelectable UserAlertManagerModeConfig = "Selectable" | ||||
// UserAlertManagerDisabled disables user-defined namespaces to be selected for `AlertmanagerConfig` lookups. This setting only | ||||
// applies if the user workload monitoring instance of Alertmanager is not enabled. | ||||
UserAlertManagerModeNone UserAlertManagerModeConfig = "None" | ||||
) | ||||
|
||||
// logLevel defines the verbosity of logs emitted by Alertmanager. | ||||
// Valid values are Error, Warn, Info and Debug. | ||||
// +kubebuilder:validation:Enum="";Error;Warn;Info;Debug | ||||
type LogLevel string | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
|
||||
const ( | ||||
LogLevelEmpty LogLevel = "" | ||||
// Error only errors will be logged. | ||||
LogLevelError LogLevel = "Error" | ||||
// Warn, both warnings and errors will be logged. | ||||
LogLevelWarn LogLevel = "Warn" | ||||
// Info, general information, warnings, and errors will all be logged. | ||||
LogLevelInfo LogLevel = "Info" | ||||
// Debug, detailed debugging information will be logged. | ||||
LogLevelDebug LogLevel = "Debug" | ||||
) | ||||
|
||||
// ResourceSpec defines the requested and limited value of a resource. | ||||
type ResourceSpec struct { | ||||
// request is the minimum amount of the resource required (e.g. "2Mi", "1Gi"). | ||||
// This field is optional. | ||||
// +optional | ||||
Request resource.Quantity `json:"request,omitempty"` | ||||
|
||||
// limit is the maximum amount of the resource allowed (e.g. "2Mi", "1Gi"). | ||||
// This field is optional. | ||||
// +optional | ||||
Limit resource.Quantity `json:"limit,omitempty"` | ||||
} | ||||
|
||||
// AlertmanagerContainerResources defines simplified resource requirements for a container. | ||||
type AlertmanagerContainerResources struct { | ||||
// cpu defines the CPU resource limits and requests. | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why might a user care about setting this value? What happens if it is not set? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's not set, containers have no resource limits, which can be harmful to the system. Users configuring containers in OpenShift should be aware of this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because not setting this could be harmful to the system, are there any defaults that we set on a users behalf? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no |
||||
// This filed is optional | ||||
// +optional | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
CPU *ResourceSpec `json:"cpu,omitempty"` | ||||
|
||||
// memory defines the memory resource limits and requests. | ||||
// This filed is optional | ||||
// +optional | ||||
Memory *ResourceSpec `json:"memory,omitempty"` | ||||
|
||||
// hugepages is a list of hugepage resource specifications by page size. | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why might a user care to set these? What happens if they don't? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same in other comments: |
||||
// defines an optional list of unique configurations identified by their `size` field. | ||||
// A maximum of 10 items is allowed. | ||||
// The list is treated as a map, using `size` as the key | ||||
// +optional | ||||
// +listType=map | ||||
// +listMapKey=size | ||||
// +kubebuilder:validation:MaxItems=10 | ||||
marioferh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
HugePages []HugePageResource `json:"hugepages,omitempty"` | ||||
} | ||||
|
||||
// HugePageResource describes hugepages resources by page size (e.g. 2Mi, 1Gi). | ||||
type HugePageResource struct { | ||||
// size of the hugepage (e.g. "2Mi", "1Gi"). | ||||
// This field is required. | ||||
// +required | ||||
Size resource.Quantity `json:"size"` | ||||
|
||||
// request amount for this hugepage size. | ||||
// This filed is optional | ||||
// +optional | ||||
Request resource.Quantity `json:"request,omitempty"` | ||||
|
||||
// limit amount for this hugepage size. | ||||
// This filed is optional | ||||
// +optional | ||||
Limit resource.Quantity `json:"limit,omitempty"` | ||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent struct is optional, so what does it mean when the parent is omitted?
The parent also does not have omitempty, nor is it a pointer. Which means it is discoverable (++ for config API), however, this field being required, is going to cause issues.
If I asked you to allow
""
as a valid value for the enum, what would that mean to the controller?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AlertmanagerConfig should be required?