-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Problem Statement
The current User Cluster MLA Admin Guide primarily documents the API-based approach for managing alert rules via REST endpoints. However, it doesn't document the CRD-based approach which is more suitable for GitOps workflows.
Through cluster investigation, I discovered that Kubermatic provides native Kubernetes CRDs for managing alerting rules:
rulegroups.kubermatic.k8c.io- For defining Prometheus alert/recording rulesalertmanagers.kubermatic.k8c.io- For configuring Alertmanager routing
These CRDs can be applied directly to user cluster namespaces (e.g., cluster-*) in the seed cluster, enabling GitOps-based alert management.
Proposed Documentation Addition
Add a new section: "Managing User Cluster Alerting via GitOps" to the User Cluster MLA documentation.
Content to Include:
1. RuleGroup CRD Overview
- Explain that RuleGroups can be created as Kubernetes resources in user cluster namespaces
- Document the CRD structure and fields
Example:
apiVersion: kubermatic.k8c.io/v1
kind: RuleGroup
metadata:
name: haproxy-alerts
namespace: cluster-xxxxx # User cluster namespace in seed
spec:
cluster:
name: xxxxx
ruleGroupType: Metrics # or "Logs"
isDefault: false
data: |
groups:
- name: haproxy-service-specific-alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{code=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: High 5xx error rate detected2. Alertmanager Configuration via Secret
- Document how to configure Alertmanager by updating the
alertmanagersecret - Show the secret structure and key names
Example:
apiVersion: v1
kind: Secret
metadata:
name: alertmanager
namespace: cluster-xxxxx
type: Opaque
stringData:
alertmanager.yaml: |
template_files: {}
alertmanager_config: |
route:
receiver: 'default'
group_by: ['alertname', 'cluster', 'service']
routes:
- receiver: 'slack-critical'
match:
severity: critical
receivers:
- name: 'default'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX'
channel: '#alerts'
- name: 'slack-critical'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX'
channel: '#critical-alerts'3. Alertmanager CRD Reference
- Document the
alertmanagers.kubermatic.k8c.ioCRD - Explain its relationship with the secret
Example:
apiVersion: kubermatic.k8c.io/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: cluster-xxxxx
spec:
configSecret:
name: alertmanager # References the secret above4. GitOps Workflow Examples
- Show how to structure alert rules in Git repository
- Provide ArgoCD/Flux application examples
- Best practices for organizing rules by service/component
5. API vs CRD Comparison Table
| Aspect | API Approach | CRD Approach |
|---|---|---|
| GitOps Support | Requires CI/CD integration | Native Kubernetes resources |
| Version Control | Manual API calls | Git history |
| Declarative | No | Yes |
| Access Control | KKP API permissions | Kubernetes RBAC |
| Tooling | curl, API clients | kubectl, ArgoCD, Flux |
| Use Case | Programmatic management | Infrastructure as Code |
Benefits
This documentation would:
- Enable GitOps workflows for alert management
- Provide a more Kubernetes-native approach
- Help teams already using ArgoCD/Flux for infrastructure
- Reduce the learning curve for Kubernetes users
- Fill a gap in current documentation
Additional Context
Current documentation focuses on:
- API endpoints:
GET/POST/PUT/DELETE /api/v2/projects/{project_id}/clusters/{cluster_id}/rulegroups - UI-based management via KKP dashboard
Missing:
- CRD-based declarative approach
- GitOps integration patterns
- Complete CRD specification examples
Related Documentation
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.