Summary
The cluster currently has two parallel paths for getting Alertmanager notifications into Teams:
- Legacy: AM →
prometheus-msteams proxy (Helm chart in infra/prometheus-msteams/) → Teams Power Automate webhook directly. The proxy formats the Adaptive Card itself.
- New: AM →
homerun2-omni-pitcher /pitch/grafana → Redis stream messages → homerun2-notification-catcher → Teams. The catcher formats the Adaptive Card and applies YAML-driven filters/routing.
The new path is now end-to-end verified on platform-sthings:
- AM config in
infra/kube-prometheus-stack/release.yaml:120-136 already points the msteams receiver at http://homerun2-omni-pitcher.<ns>.svc.cluster.local/pitch/grafana with Bearer auth (${HOMERUN2_OMNI_PITCHER_AUTH_TOKEN}).
- The omni-pitcher's transformer (
homerun2-omni-pitcher/internal/handlers/grafana.go:93) maps the AM webhook payload to a homerun.Message with Author=grafana, System=<receiver>, Severity mapped from prometheus labels, and Tags joined from remaining labels.
- The catcher now has a
teams-grafana-alerts route added in stuttgart-things/stuttgart-things#2230 that filters on match: { author: grafana } + severity_min: warning and dispatches to the configured Teams webhook.
Until the legacy proxy is removed, every warning/critical AM alert hits Teams twice — once via the proxy's direct webhook, once via the catcher's Adaptive Card. That's the cost of the cutover window.
Scope of cleanup
- Remove
infra/prometheus-msteams/ from this repo:
release.yaml (the HelmRelease)
kustomization.yaml
requirements.yaml (the GitRepository pointing at the upstream chart)
README.md
- Remove the
prometheus-msteams entry from any parent kustomization that includes it (likely infra/kustomization.yaml or a profile-level kustomization).
- Remove
MSTEAMS_WEBHOOK_URL and PROMETHEUS_MSTEAMS_NAMESPACE substitute vars from any cluster overlay Kustomization that's currently feeding them (search stuttgart-things/clusters/**/*.yaml).
- Verify the
monitoring namespace no longer hosts a prometheus-msteams Deployment after the next Flux reconciliation.
- Update
infra/kube-prometheus-stack/release.yaml — drop the legacy-related comment block (lines 90-98 in the current file) since the migration narrative is no longer relevant.
Acceptance criteria
Risk / sequencing
Low-risk. The new path has been observed delivering real Teams notifications during today's rollout. Suggested sequence:
- Land the catcher
teams-grafana-alerts route (stuttgart-things/stuttgart-things#2230) — done in parallel with filing this issue.
- Trigger a synthetic AM alert (e.g.
amtool alert add or pause a node briefly) and confirm the Teams channel shows both the legacy card AND the catcher's Adaptive Card. That's the cutover-window evidence.
- Open the cleanup PR (this issue). Merge. Reconcile.
- Trigger another synthetic alert and confirm only one card appears.
Related
Summary
The cluster currently has two parallel paths for getting Alertmanager notifications into Teams:
prometheus-msteamsproxy (Helm chart ininfra/prometheus-msteams/) → Teams Power Automate webhook directly. The proxy formats the Adaptive Card itself.homerun2-omni-pitcher/pitch/grafana→ Redis streammessages→homerun2-notification-catcher→ Teams. The catcher formats the Adaptive Card and applies YAML-driven filters/routing.The new path is now end-to-end verified on
platform-sthings:infra/kube-prometheus-stack/release.yaml:120-136already points themsteamsreceiver athttp://homerun2-omni-pitcher.<ns>.svc.cluster.local/pitch/grafanawith Bearer auth (${HOMERUN2_OMNI_PITCHER_AUTH_TOKEN}).homerun2-omni-pitcher/internal/handlers/grafana.go:93) maps the AM webhook payload to ahomerun.MessagewithAuthor=grafana,System=<receiver>,Severitymapped from prometheus labels, andTagsjoined from remaining labels.teams-grafana-alertsroute added in stuttgart-things/stuttgart-things#2230 that filters onmatch: { author: grafana }+severity_min: warningand dispatches to the configured Teams webhook.Until the legacy proxy is removed, every warning/critical AM alert hits Teams twice — once via the proxy's direct webhook, once via the catcher's Adaptive Card. That's the cost of the cutover window.
Scope of cleanup
infra/prometheus-msteams/from this repo:release.yaml(the HelmRelease)kustomization.yamlrequirements.yaml(the GitRepository pointing at the upstream chart)README.mdprometheus-msteamsentry from any parent kustomization that includes it (likelyinfra/kustomization.yamlor a profile-level kustomization).MSTEAMS_WEBHOOK_URLandPROMETHEUS_MSTEAMS_NAMESPACEsubstitute vars from any cluster overlay Kustomization that's currently feeding them (searchstuttgart-things/clusters/**/*.yaml).monitoringnamespace no longer hosts aprometheus-msteamsDeployment after the next Flux reconciliation.infra/kube-prometheus-stack/release.yaml— drop the legacy-related comment block (lines 90-98 in the current file) since the migration narrative is no longer relevant.Acceptance criteria
kubectl get deploy -n monitoring prometheus-msteams→NotFound.prometheus-msteamsin this repo (grep -rn prometheus-msteams .returns only commit-history mentions).MSTEAMS_WEBHOOK_URL,PROMETHEUS_MSTEAMS_NAMESPACE) in cluster overlays.Risk / sequencing
Low-risk. The new path has been observed delivering real Teams notifications during today's rollout. Suggested sequence:
teams-grafana-alertsroute (stuttgart-things/stuttgart-things#2230) — done in parallel with filing this issue.amtool alert addor pause a node briefly) and confirm the Teams channel shows both the legacy card AND the catcher's Adaptive Card. That's the cutover-window evidence.Related