Skip to content

Monitoring API: Add AlertmanagerMainConfig #2148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
231 changes: 228 additions & 3 deletions config/v1alpha1/types_cluster_monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ limitations under the License.
package v1alpha1

import (
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

Expand Down Expand Up @@ -72,15 +74,19 @@ type ClusterMonitoringList struct {
}

// ClusterMonitoringSpec defines the desired state of Cluster Monitoring Operator
// +required
type ClusterMonitoringSpec struct {
// userDefined set the deployment mode for user-defined monitoring in addition to the default platform monitoring.
// +required
// userDefined is optional.
// +optional
UserDefined UserDefinedMonitoring `json:"userDefined"`
// alertmanagerConfig allows users to configure how the default Alertmanager instance
// should be deployed in the `openshift-monitoring` namespace.
// alertmanagerConfig is optional.
// +optional
AlertmanagerConfig AlertmanagerConfig `json:"alertmanagerConfig"`
}

// UserDefinedMonitoring config for user-defined projects.
// +required
type UserDefinedMonitoring struct {
// mode defines the different configurations of UserDefinedMonitoring
// Valid values are Disabled and NamespaceIsolated
Expand All @@ -101,3 +107,222 @@ const (
// UserDefinedNamespaceIsolated enables monitoring for user-defined projects with namespace-scoped tenancy. This ensures that metrics, alerts, and monitoring data are isolated at the namespace level.
UserDefinedNamespaceIsolated UserDefinedMode = "NamespaceIsolated"
)

// alertmanagerConfig provides configuration options for the default Alertmanager instance
// that runs in the `openshift-monitoring` namespace. Use this configuration to control
// whether the default Alertmanager is deployed, how it logs, and how its pods are scheduled.
//
// +union
// +kubebuilder:validation:XValidation:rule="self.deploymentMode == 'Deployed' ? has(self.deployed) : !has(self.deployed)",message="deployed must be set when deploymentMode is Deployed, and must be unset otherwise"
type AlertmanagerConfig struct {
// deploymentMode determines whether the default Alertmanager instance should be deployed
// as part of the monitoring stack.
// Allowed values are Deployed and NotDeployed.
// When set to Deployed, the Cluster Monitoring Operator
// ensures that an Alertmanager instance is created and managed in the `openshift-monitoring` namespace.
// When set to NotDeployed, the operator will not deploy the Alertmanager instance.
// Use this field if you want to explicitly opt in or out of running a platform-level Alertmanager.
//
// deploymentMode is required.
// +unionDiscriminator
// +kubebuilder:validation:Enum=Deployed;NotDeployed
// +required
DeploymentMode string `json:"deploymentMode"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent struct is optional, so what does it mean when the parent is omitted?

The parent also does not have omitempty, nor is it a pointer. Which means it is discoverable (++ for config API), however, this field being required, is going to cause issues.

If I asked you to allow "" as a valid value for the enum, what would that mean to the controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AlertmanagerConfig should be required?


// deployed contains configuration options for the deployed Alertmanager instance.
// +optional
Deployed *AlertmanagerDeployedConfig `json:"deployed,omitempty"`
Comment on lines +132 to +134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discriminated unions, this field must be set when the discriminator is set to Deployed and unset otherwise. We have a pretty standard CEL expression we use for this:

// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Filters' ? has(self.filters) : !has(self.filters)",message="filters is required when type is Filters, and forbidden otherwise"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is correct now?

}

// alertmanagerConfig provides configuration options for the default Alertmanager instance
// that runs in the `openshift-monitoring` namespace. Use this configuration to control
// whether the default Alertmanager is deployed, how it logs, and how its pods are scheduled.
//
// Required: This field must be specified.
type AlertmanagerDeployedConfig struct {
// userModeConfig controls whether Alertmanager should process configurations from user-defined (non-platform)
// namespaces for AlertmanagerConfig lookups.
// Alertmanager will search for AlertmanagerConfig resources in user-defined namespaces.
// This field is only effective when the user workload Alertmanager instance is not enabled.
// If the user workload monitoring Alertmanager is enabled, this field is ignored.
// userMode is required.
// Allowed values are Selectable and None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when each of these values are specified? How is something like None different than leaving this field empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty is the same as none.
Comment added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should None even be an option then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I did selectable and none instead of enable or disable, correct?

// Default value is None
// +kubebuilder:validation:Enum="";Selectable;None
// +optional
UserModeConfig UserAlertManagerModeConfig `json:"userModeConfig"`
// logLevel defines the verbosity of logs emitted by Alertmanager.
// This field allows users to control the amount and severity of logs generated, which can be useful
// for debugging issues or reducing noise in production environments.
// Allowed values are Error, Warn, Info, Debug, and omitted.
// When set to Error, only errors will be logged.
// When set to Warn, both warnings and errors will be logged.
// When set to Info, general information, warnings, and errors will all be logged.
// When set to Debug, detailed debugging information will be logged.
// When omitted, this means no opinion and the platform is left to choose a default that is subject to change over time.
// Currently, the default is Info.
// +optional
LogLevel LogLevel `json:"logLevel"`
// nodeSelector is the node selector applied to network diagnostics components
// nodeSelector is optional.
//
// When omitted, this means the user has no opinion and the platform is left
// to choose reasonable defaults. These defaults are subject to change over time.
// +optional
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// resources defines the compute resource requests and limits for the Alertmanager container.
// This includes CPU, memory and HugePages constraints to help control scheduling and resource usage.
// When not specified, defaults are used by the platform. Requests cannot exceed limits.
// This field is optional.
// More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
// This is a simplified API that maps to Kubernetes ResourceRequirements.
// +optional
Resources *AlertmanagerContainerResources `json:"resources,omitempty"`
// secrets Defines a list of secrets that need to be mounted into the Alertmanager.
// The secrets must reside within the same namespace as the Alertmanager object.
// They will be added as volumes named secret-<secret-name> and mounted at
// /etc/alertmanager/secrets/<secret-name> within the 'alertmanager' container of
// the Alertmanager Pods.
// This field is optional.
Comment on lines +181 to +186
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've explained what this does, but why would a user care to configure these secrets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we assume that a user who uses OpenShift knows why to use secrets.

Copy link
Contributor

@everettraven everettraven May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an OpenShift user likely understands the benefits of using secrets - how does adding secrets here help in configuring alertmanager? What things would it configure on alertmanager and/or allow alertmanager to do differently?

These are the things I suspect users may not know about (nor do I know).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me think about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Maximum length for this list is 10
// +optional
// +kubebuilder:validation:MaxItems=10
Secrets []SecretName `json:"secrets,omitempty"`
// tolerations is a list of tolerations applied to network diagnostics components
// tolerations is optional.
//
// When omitted, this means the user has no opinion and the platform is left
// to choose reasonable defaults. These defaults are subject to change over time.
// Maximum length for this list is 10
// +kubebuilder:validation:MaxItems=10
// +optional
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
// topologySpreadConstraints defines rules for how Alertmanager Pods should be distributed
// across topology domains such as zones, nodes, or other user-defined labels.
// topologySpreadConstraints is optional.
// This helps improve high availability and resource efficiency by avoiding placing
// too many replicas in the same failure domain.
//
// When omitted, this means no opinion and the platform is left to choose a default, which is subject to change over time.
// This field maps directly to the `topologySpreadConstraints` field in the Pod spec.
// Maximum length for this list is 10
// +kubebuilder:validation:MaxItems=10
// +optional
TopologySpreadConstraints []v1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
// volumeClaimTemplate Defines persistent storage for Alertmanager. Use this setting to
// configure the persistent volume claim, including storage class, volume
// size, and name.
// If omitted, the Pod uses ephemeral storage and alert data will not persist
// across restarts.
// // This field is optional.
// +optional
VolumeClaimTemplate *v1.PersistentVolumeClaim `json:"volumeClaimTemplate,omitempty"`
}

// SecretName is a type that represents the name of a Secret in the same namespace.
// It must be at most 253 characters in length.
// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character."
// +kubebuilder:validation:MaxLength=253
type SecretName string

// AlertManagerDeployMode defines the deployment state of the platform Alertmanager instance.
//
// Possible values:
// - "Deployed": The Alertmanager instance will be deployed and managed by the operator.
// - "NotDeployed": The operator will not deploy an Alertmanager instance.
type AlertManagerDeployMode string

const (
// AlertManagerModeEnabled means the Alertmanager instance will be deployed and managed by the operator.
AlertManagerDeployModeDeployed AlertManagerDeployMode = "Deployed"

// AlertManagerModeDisabled means the operator will not deploy the Alertmanager instance.
AlertManagerDeployModeNotDeployed AlertManagerDeployMode = "NotDeployed"
)

// UserAlertManagerModeConfig defines mode for user-defines namespaced
//
// Possible values:
// - "Selectable": User-defined namespaces can be selected for AlertmanagerConfig lookups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, where is the selector that the user would configure to determine which namespaces to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the selector is use alertmanagerMain or user defined config

// - "None": User-defined namespaces cannot be selected for AlertmanagerConfig lookups.
type UserAlertManagerModeConfig string

const (
// UserAlertmanagerEnabled enables user-defined namespaces to be selected for `AlertmanagerConfig` lookups. This setting only
// applies if the user workload monitoring instance of Alertmanager is not enabled.
UserAlertManagerModeSelectable UserAlertManagerModeConfig = "Selectable"
// UserAlertManagerDisabled disables user-defined namespaces to be selected for `AlertmanagerConfig` lookups. This setting only
// applies if the user workload monitoring instance of Alertmanager is not enabled.
UserAlertManagerModeNone UserAlertManagerModeConfig = "None"
)

// logLevel defines the verbosity of logs emitted by Alertmanager.
// Valid values are Error, Warn, Info and Debug.
// +kubebuilder:validation:Enum="";Error;Warn;Info;Debug
type LogLevel string

const (
LogLevelEmpty LogLevel = ""
// Error only errors will be logged.
LogLevelError LogLevel = "Error"
// Warn, both warnings and errors will be logged.
LogLevelWarn LogLevel = "Warn"
// Info, general information, warnings, and errors will all be logged.
LogLevelInfo LogLevel = "Info"
// Debug, detailed debugging information will be logged.
LogLevelDebug LogLevel = "Debug"
)

// ResourceSpec defines the requested and limited value of a resource.
type ResourceSpec struct {
// request is the minimum amount of the resource required (e.g. "2Mi", "1Gi").
// This field is optional.
// +optional
Request resource.Quantity `json:"request,omitempty"`

// limit is the maximum amount of the resource allowed (e.g. "2Mi", "1Gi").
// This field is optional.
// +optional
Limit resource.Quantity `json:"limit,omitempty"`
}

// AlertmanagerContainerResources defines simplified resource requirements for a container.
type AlertmanagerContainerResources struct {
// cpu defines the CPU resource limits and requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why might a user care about setting this value? What happens if it is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not set, containers have no resource limits, which can be harmful to the system. Users configuring containers in OpenShift should be aware of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because not setting this could be harmful to the system, are there any defaults that we set on a users behalf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

// This filed is optional
// +optional
CPU *ResourceSpec `json:"cpu,omitempty"`

// memory defines the memory resource limits and requests.
// This filed is optional
// +optional
Memory *ResourceSpec `json:"memory,omitempty"`

// hugepages is a list of hugepage resource specifications by page size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why might a user care to set these? What happens if they don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same in other comments:
If it's not set, containers have no resource limits, which can be harmful to the system. Users configuring containers in OpenShift should be aware of this.

// defines an optional list of unique configurations identified by their `size` field.
// A maximum of 10 items is allowed.
// The list is treated as a map, using `size` as the key
// +optional
// +listType=map
// +listMapKey=size
// +kubebuilder:validation:MaxItems=10
HugePages []HugePageResource `json:"hugepages,omitempty"`
}

// HugePageResource describes hugepages resources by page size (e.g. 2Mi, 1Gi).
type HugePageResource struct {
// size of the hugepage (e.g. "2Mi", "1Gi").
// This field is required.
// +required
Size resource.Quantity `json:"size"`

// request amount for this hugepage size.
// This filed is optional
// +optional
Request resource.Quantity `json:"request,omitempty"`

// limit amount for this hugepage size.
// This filed is optional
// +optional
Limit resource.Quantity `json:"limit,omitempty"`
}
Loading