Skip to content

Remove prometheus-msteams proxy now that AM → omni-pitcher → notification-catcher path is wired #147

@patrick-hermann-sva

Description

@patrick-hermann-sva

Summary

The cluster currently has two parallel paths for getting Alertmanager notifications into Teams:

  1. Legacy: AM → prometheus-msteams proxy (Helm chart in infra/prometheus-msteams/) → Teams Power Automate webhook directly. The proxy formats the Adaptive Card itself.
  2. New: AM → homerun2-omni-pitcher /pitch/grafana → Redis stream messageshomerun2-notification-catcher → Teams. The catcher formats the Adaptive Card and applies YAML-driven filters/routing.

The new path is now end-to-end verified on platform-sthings:

  • AM config in infra/kube-prometheus-stack/release.yaml:120-136 already points the msteams receiver at http://homerun2-omni-pitcher.<ns>.svc.cluster.local/pitch/grafana with Bearer auth (${HOMERUN2_OMNI_PITCHER_AUTH_TOKEN}).
  • The omni-pitcher's transformer (homerun2-omni-pitcher/internal/handlers/grafana.go:93) maps the AM webhook payload to a homerun.Message with Author=grafana, System=<receiver>, Severity mapped from prometheus labels, and Tags joined from remaining labels.
  • The catcher now has a teams-grafana-alerts route added in stuttgart-things/stuttgart-things#2230 that filters on match: { author: grafana } + severity_min: warning and dispatches to the configured Teams webhook.

Until the legacy proxy is removed, every warning/critical AM alert hits Teams twice — once via the proxy's direct webhook, once via the catcher's Adaptive Card. That's the cost of the cutover window.

Scope of cleanup

  1. Remove infra/prometheus-msteams/ from this repo:
    • release.yaml (the HelmRelease)
    • kustomization.yaml
    • requirements.yaml (the GitRepository pointing at the upstream chart)
    • README.md
  2. Remove the prometheus-msteams entry from any parent kustomization that includes it (likely infra/kustomization.yaml or a profile-level kustomization).
  3. Remove MSTEAMS_WEBHOOK_URL and PROMETHEUS_MSTEAMS_NAMESPACE substitute vars from any cluster overlay Kustomization that's currently feeding them (search stuttgart-things/clusters/**/*.yaml).
  4. Verify the monitoring namespace no longer hosts a prometheus-msteams Deployment after the next Flux reconciliation.
  5. Update infra/kube-prometheus-stack/release.yaml — drop the legacy-related comment block (lines 90-98 in the current file) since the migration narrative is no longer relevant.

Acceptance criteria

  • kubectl get deploy -n monitoring prometheus-msteamsNotFound.
  • No further duplicate Teams cards for AM alerts (verified by triggering a synthetic critical alert).
  • No dangling references to prometheus-msteams in this repo (grep -rn prometheus-msteams . returns only commit-history mentions).
  • No dangling substitute vars (MSTEAMS_WEBHOOK_URL, PROMETHEUS_MSTEAMS_NAMESPACE) in cluster overlays.

Risk / sequencing

Low-risk. The new path has been observed delivering real Teams notifications during today's rollout. Suggested sequence:

  1. Land the catcher teams-grafana-alerts route (stuttgart-things/stuttgart-things#2230) — done in parallel with filing this issue.
  2. Trigger a synthetic AM alert (e.g. amtool alert add or pause a node briefly) and confirm the Teams channel shows both the legacy card AND the catcher's Adaptive Card. That's the cutover-window evidence.
  3. Open the cleanup PR (this issue). Merge. Reconcile.
  4. Trigger another synthetic alert and confirm only one card appears.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions