-
Notifications
You must be signed in to change notification settings - Fork 541
Monitoring API: Add AlertmanagerMainConfig #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Monitoring API: Add AlertmanagerMainConfig #2148
Conversation
Hello @marioferh! Some important instructions when contributing to openshift/api: |
c0d2965
to
c35227f
Compare
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the comments are centered around godoc. I'd recommend looking at https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc for more information on what makes a good godoc that is helpful to users.
Another thing that stood out was multiple fields related to pod spec configuration - you may want to group those into a separate struct to have a single field that clearly denotes that the sub-fields in that object map directly to pod spec fields.
c35227f
to
75fbb23
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: marioferh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
450dd7b
to
a6d7bc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only got about halfway through the changes on this round, will circle back soon to review the rest.
// userMode controls whether Alertmanager should process configurations from user-defined (non-platform) | ||
// namespaces for AlertmanagerConfig lookups. | ||
// When set to true, Alertmanager will search for AlertmanagerConfig resources in user-defined namespaces. | ||
// This field is only effective when the user workload Alertmanager instance is not enabled. | ||
// If the user workload monitoring Alertmanager is enabled, this field is ignored. | ||
// Required: This field must be specified. | ||
// +kubebuilder:validation:Enum="";Enabled;Disabled | ||
// +required | ||
UserMode UserAlertManagerMode `json:"userMode"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things:
- The godoc seems a bit outdated here, referencing "true" as value that can be set
- Is
UserMode
the most appropriate name here? Maybe this would make more sense as something likeConfigurationPolicy
with options likePlatformDefined
andUserDefined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to highlight how the monitoring stack works. We have the default stack and the user-defined mode, and with these two fields, I believe it becomes clearer than with a ConfigurationPolicy.
Also because this: // This field is only effective when the user workload Alertmanager instance is not enabled.
// This field is optional. | ||
// More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ | ||
// +optional | ||
Resources *v1.ResourceRequirements `json:"resources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we don't encourage the use of the ResourceRequirements
type and instead recommend using a list of a custom type that gets translated to the ResourceRequirements
you set on a Pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this conversation recently related to ResourceLists and because ResourceRequirements at least includes fields of type ResourceList the same principle applies: #2222 (comment)
@JoelSpeed Would probably have more explicit knowledge as to the limitations of using these types, but my understanding is that they don't align with our API conventions any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could work?
ee3b854
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something along those lines may be reasonable if you need explicit fields for each resource type. For example, that change would result in a UX of writing YAML like:
...
resources:
cpu:
requests: ...
limits: ...
memory:
requests: ...
limits: ...
hugepages:
requests: ...
limits: ...
size: ...
...
An alternative would be to do something like:
...
resources:
- name: cpu
requests: ...
limits: ...
- name: memory
requests: ...
limits: ...
- name: hugepages
requests: ...
limits: ...
...
Based on my understanding of the comment I linked to, I think the alternative I've shared is the preferred approach for something like this because it can more easily grow as your needs to support other resource types do.
I'll leave @JoelSpeed to explain any further and dictate what approach is actually preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case here? Are we just passing the data through to the pod template itself?
Are there any restrictions that we would be applying, eg, are there certain types of resources that we wouldn't want to support?
Does each type of resource support the same options, for example, in the comment above, it looks as though maybe hugepages has other options that CPU and memory do not support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the CRD passes these values through to the pod template.
AFAIK this is the info we have: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.limits.hugepages-<size>
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.requests.hugepages-<size>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we aren't interpreting the content of the field, and simply copying it through, it can be ok to re-use the types from upstream.
However, in this case, there are alpha fields in the struct that we would want to not expose, so creating a mirror that meets the parts of the API that we do want to expose makes sense. It also means we can exclude other resources that we do not support users configuring
e062ce8
to
ee3b854
Compare
// +optional | ||
Memory *ResourceSpec `json:"memory,omitempty"` | ||
|
||
// hugepages is a list of hugepage resource specifications by page size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why might a user care to set these? What happens if they don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same in other comments:
If it's not set, containers have no resource limits, which can be harmful to the system. Users configuring containers in OpenShift should be aware of this.
// HugePageResource describes hugepages resources by page size (e.g. 2Mi, 1Gi). | ||
type HugePageResource struct { | ||
// size of the hugepage (e.g. "2Mi", "1Gi"). | ||
// +kubebuilder:validation:Pattern=`^[0-9]+(Ki|Mi|Gi)$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally try to avoid using the kubebuilder:validation:Pattern
marker now in favor of writing a CEL expression that performs the regular expression evaluation. Using CEL expressions allows us to provide a more human readable error message than returned when using the pattern marker.
An example:
api/config/v1/types_kmsencryption.go
Line 34 in 108fecf
// +kubebuilder:validation:XValidation:rule="self.matches('^arn:aws:kms:[a-z0-9-]+:[0-9]{12}:key/[a-f0-9-]+$')",message="keyARN must follow the format `arn:aws:kms:<region>:<account_id>:key/<key_id>`. The account ID must be a 12 digit number and the region and key ID should consist only of lowercase hexadecimal characters and hyphens (-)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, but I not sure if it is needed a validation pattern, remove it.
b324126
to
d9fba48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments. Additionally, I would like to see tests added to ensure the API and validations you have are working as expected.
// userDefined is optional. | ||
// +optional` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this moved to optional?
Also, minor typo:
// userDefined is optional. | |
// +optional` | |
// userDefined is optional. | |
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By making this optional, you now have no required fields in the spec.
Are we ok with folks creating an object like
spec: {}
What does that object mean? Should there be a required field or a minProperties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that with the latest changes, the user-defined field had to be optional. From the configmap // The default value is false. — so I have to make it optional, because was enable, disabled.
// A Boolean flag that enables or disables user-defined namespaces
// to be selected for `AlertmanagerConfig` lookups. This setting only
// applies if the user workload monitoring instance of Alertmanager
// is not enabled.
// The default value is `false`.
EnableUserAlertManagerConfig bool `json:"enableUserAlertmanagerConfig,omitempty"`
// deployed contains configuration options for the deployed Alertmanager instance. | ||
// +optional | ||
Deployed *AlertmanagerDeployedConfig `json:"deployed,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For discriminated unions, this field must be set when the discriminator is set to Deployed
and unset otherwise. We have a pretty standard CEL expression we use for this:
Line 9 in 7318813
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Filters' ? has(self.filters) : !has(self.filters)",message="filters is required when type is Filters, and forbidden otherwise" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is correct now?
|
||
// AlertmanagerContainerResources defines simplified resource requirements for a container. | ||
type AlertmanagerContainerResources struct { | ||
// cpu defines the CPU resource limits and requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because not setting this could be harmful to the system, are there any defaults that we set on a users behalf?
// The list is treated as a map, using `size` as the key, which simplifies updates and replacements | ||
// of individual entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an end-user, I believe this means that entries must be unique. I'm not sure an end-user cares about whether or not this simplifies updates and replacements of individual entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// userDefined is optional. | ||
// +optional` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By making this optional, you now have no required fields in the spec.
Are we ok with folks creating an object like
spec: {}
What does that object mean? Should there be a required field or a minProperties?
// should be deployed in the `openshift-monitoring` namespace. | ||
// alertmanagerMainConfig is optional. | ||
// +optional` | ||
AlertmanagerMainConfig AlertmanagerConfig `json:"alertmanagerMainConfig"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explained before, removed.
// +unionDiscriminator | ||
// +kubebuilder:validation:Enum=Deployed;NotDeployed | ||
// +kubebuilder:validation:Required | ||
DeploymentMode string `json:"deploymentMode"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent struct is optional, so what does it mean when the parent is omitted?
The parent also does not have omitempty, nor is it a pointer. Which means it is discoverable (++ for config API), however, this field being required, is going to cause issues.
If I asked you to allow ""
as a valid value for the enum, what would that mean to the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AlertmanagerConfig should be required?
// | ||
// When omitted, this means the user has no opinion and the platform is left | ||
// to choose reasonable defaults. These defaults are subject to change over time. | ||
// The current default is `- operator: "Exists"` which means that all taints are tolerated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that safe? Not even an API question, but, tolerating all taints is generally not something we would do for control plane components. There are many valid taints (uninitialized for the CCM, network not ready) that I would expect this pod not to tolerate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me think about it
// SecretName is a type that represents the name of a Secret in the same namespace. | ||
// It must be at most 256 characters in length. | ||
// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | ||
// +kubebuilder:validation:MaxLength=256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 253
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// UserAlertManagerMode defines mode for user-defines namespaced | ||
// | ||
// Possible values: | ||
// - "Selectable": User-defined namespaces can be selected for AlertmanagerConfig lookups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, where is the selector that the user would configure to determine which namespaces to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the selector is use alertmanagerMain or user defined config
// +kubebuilder:validation:MaxLength=24 | ||
// This filed is optional | ||
// +optional | ||
Request string `json:"request,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a quantity type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Continue tomorrow with last comments |
ea249c1
to
d60f672
Compare
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
commit suggestions, thanks Co-authored-by: Bryce Palmer <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
Signed-off-by: Mario Fernandez <[email protected]>
d60f672
to
b758adf
Compare
@marioferh: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Every component will be in a separated PR in order to improve review process
First PR: #1929
Related: Enhancements Proposal openshift/enhancements#1627