Skip to content

fix(alertmanager): replace hardcoded externalUrl with dynamic ingress endpoint#4913

Open
eve-scality wants to merge 2 commits intodevelopment/133.0from
feature/MK8S-259-implement
Open

fix(alertmanager): replace hardcoded externalUrl with dynamic ingress endpoint#4913
eve-scality wants to merge 2 commits intodevelopment/133.0from
feature/MK8S-259-implement

Conversation

@eve-scality
Copy link
Copy Markdown

@eve-scality eve-scality commented May 6, 2026

Summary

Replace the hardcoded Alertmanager externalUrl (http://prometheus-operator-alertmanager.metalk8s-monitoring:9093) in chart.sls with a dynamically resolved value from salt.metalk8s_network.get_control_plane_ingress_endpoint(). This ensures the Alertmanager CR reflects the correct public ingress URL (e.g. https://<cp-ip>/api/alertmanager) and is re-applied whenever the control-plane ingress endpoint changes.

References

  • JIRA: MK8S-259
  • Tracking Issue: scality/agent-task#131

Step Adherence

  • Step 1: Replace hardcoded Alertmanager externalUrl with dynamic value

Changes

  1. Added {%- set cp_ingress_ep = salt.metalk8s_network.get_control_plane_ingress_endpoint() %} in the header section of chart.sls (line 11, alongside existing grafana, prometheus, and alertmanager set statements, before the {% raw %} block).
  2. Replaced the hardcoded externalUrl: http://prometheus-operator-alertmanager.metalk8s-monitoring:9093 with externalUrl: {% endraw -%}{{ cp_ingress_ep }}{%- raw %}/api/alertmanager, following the canonical inline variable injection pattern used on the adjacent image: line.

Deviations

None.

Test Strategy

  • The {% raw %}/{% endraw %} block balance was verified to be maintained (no Jinja parse errors).
  • The canonical test fixture (salt/tests/unit/formulas/fixtures/salt.py) already has register_basic("metalk8s_network.get_control_plane_ingress_endpoint") registered with MagicMock(return_value="https://192.168.1.240:8443"), ensuring the template renders correctly in unit tests.
  • The /api/alertmanager suffix matches the existing ingress rule in ui/deployed/ingress.sls.

… endpoint

Part of the implementation plan from scality/agent-task#131.
@eve-scality eve-scality requested a review from a team as a code owner May 6, 2026 16:32
@eve-scality
Copy link
Copy Markdown
Author

/approve

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 6, 2026

Hello eve-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 6, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

The following options are set: approve

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 6, 2026

LGTM

Review by Claude Code

alertmanagerConfigSelector: {}
automountServiceAccountToken: true
externalUrl: http://prometheus-operator-alertmanager.metalk8s-monitoring:9093
externalUrl: {% endraw -%}{{ cp_ingress_ep }}{%- raw %}/api/alertmanager
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
externalUrl: {% endraw -%}{{ cp_ingress_ep }}{%- raw %}/api/alertmanager
externalUrl: {% endraw -%}{{ cp_ingress_ep }}{%- raw %}

@JBWatenbergScality
Copy link
Copy Markdown
Contributor

  • build error:

FAILED LINT codegen [0s]
TaskError - taskid:lint:codegen
outdated generated files, did you run doit.sh codegen?

…rt.sls

- Move the dynamic externalUrl override into charts/kube-prometheus-stack.yaml
  using __var__(salt.metalk8s_network.get_control_plane_ingress_endpoint())
  so codegen produces the desired output (fixes lint:codegen failure).
- Drop the /api/alertmanager suffix from externalUrl per review feedback;
  the suffix is already handled by the UI ingress route.
- Regenerate salt/metalk8s/addons/prometheus-operator/deployed/chart.sls.

Part of the implementation plan from #4913.
@eve-scality
Copy link
Copy Markdown
Author

@cerebro-check Complete

Addressed both the CI failure and @JBWatenbergScality's inline review.

What changed

The previous commit edited the generated file salt/metalk8s/addons/prometheus-operator/deployed/chart.sls directly, which is why lint:codegen failed with "outdated generated files, did you run doit.sh codegen?" — the file no longer matched what doit.sh codegen would produce from the chart values, and the cp_ingress_ep variable that was added to the SLS header would be stripped on regeneration.

Fix

  1. Moved the override to the codegen source — added under alertmanager.alertmanagerSpec in charts/kube-prometheus-stack.yaml:

    externalUrl: '__var__(salt.metalk8s_network.get_control_plane_ingress_endpoint())'

    This mirrors the pattern already used elsewhere in this repo (e.g. clusterIP: '__var__(salt.metalk8s_network.get_oidc_service_ip())' in charts/dex.yaml). The __var__(...) placeholder is rewritten by charts/render.py into {% endraw -%}{{ ... }}{%- raw %}.

  2. Regenerated chart.sls — ran charts/render.py with the same arguments as codegen_chart_kube_prometheus_stack. The resulting file is now reproducible from sources (verified — re-running render.py produces a byte-identical chart.sls).

  3. Dropped the /api/alertmanager suffix per the reviewer's suggestion — that path prefix is already handled by the UI ingress route in salt/metalk8s/addons/ui/deployed/ingress.sls.

Net diff vs. base branch (development/133.0)

Now exactly one line in chart.sls:

- externalUrl: http://prometheus-operator-alertmanager.metalk8s-monitoring:9093
+ externalUrl: {% endraw -%}{{ salt.metalk8s_network.get_control_plane_ingress_endpoint() }}{%- raw %}

…plus the corresponding 6 lines in charts/kube-prometheus-stack.yaml.

Validation

  • diff between baseline (pre-PR) chart.sls and freshly regenerated chart.sls shows only the intended single-line change → lint:codegen will pass.
  • The variable resolves dynamically at deploy time via the existing salt helper.
    Addressed: 2 feedback items

Comment thread salt/metalk8s/addons/prometheus-operator/deployed/chart.sls
@eve-scality
Copy link
Copy Markdown
Author

@cerebro-check: Human Confirmation Needed

I detected failing CI checks on this PR and analyzed the logs, but my confidence that the failure is caused by these PR changes is 5% — below the 80% threshold required to auto-fix.

Failing Checks

  • build-devcontainer / build-and-publish-devcontainer

Analysis

Re-check status

I re-read all PR threads and re-validated the current branch state (221ce35). All actionable feedback from @JBWatenbergScality appears to already be resolved by the previous cerebro-check commit:

Inline comments — already addressed

  1. Line 80319 — "shouldn't have this change anymore as it's codegen-ed file"
    The hand-edit to chart.sls was reverted; the override now lives in the codegen source charts/kube-prometheus-stack.yaml (under alertmanager.alertmanagerSpec.externalUrl).
  2. Line 80320 suggestion — externalUrl: {% endraw -%}{{ cp_ingress_ep }}{%- raw %} ✅ (functionally)
    The codegen output now reads externalUrl: {% endraw -%}{{ salt.metalk8s_network.get_control_plane_ingress_endpoint() }}{%- raw %}. The reviewer's exact cp_ingress_ep form would require a {% set cp_ingress_ep = ... %} block at the top of the SLS, which doesn't exist — calling the salt helper inline is the functional equivalent and matches the __var__(...) pattern used elsewhere (e.g. clusterIP: '__var__(salt.metalk8s_network.get_oidc_service_ip())' in charts/dex.yaml).

lint:codegen failure (PR comment)

✅ Verified locally: I re-ran charts/render.py with the same arguments as the codegen_chart_kube-prometheus-stack doit task. The regenerated chart.sls is byte-identical to the file currently committed (diff returned 0 lines). The lint:codegen check would pass on a fresh build.

build-devcontainer CI failure — likely unrelated

  • The PR only modifies two files: charts/kube-prometheus-stack.yaml and salt/metalk8s/addons/prometheus-operator/deployed/chart.sls. Neither touches .devcontainer/ (Dockerfile, devcontainer.json, setup.sh).
  • The available logs from run 25451467013 are truncated and only show checkout/cleanup steps — the actual "Build dev container" step's stderr isn't in the snippet returned. No error message is visible that ties to the PR's diff.
  • This looks like infra/registry flakiness unrelated to the change set.

Recommendation

No code changes are needed from my side. The PR should be safe to merge once a maintainer:

  1. Re-runs the failed build-devcontainer job (likely transient), and
  2. Confirms the inline-comment resolutions above match @JBWatenbergScality's intent.

I'm not pushing any commits to avoid no-op churn on a green-modulo-flake PR.

Proposed Fix

No code fix proposed. Suggest re-running the build-devcontainer job — the PR diff doesn't touch any devcontainer files, and lint:codegen (the originally-failing check) is now satisfied (verified by regenerating chart.sls locally and getting a byte-identical file). If build-devcontainer keeps failing on retry with the same error, the full job log would be needed to diagnose — likely unrelated to this PR.

Next Steps

Please review the analysis above and reply with one of:

  • @cerebro-check proceed — to have me apply the proposed fix
  • @cerebro-check skip-build — to skip build-failure analysis and only address PR feedback
  • Or provide additional context to help me understand the failure

@eve-scality
Copy link
Copy Markdown
Author

@cerebro-check Complete

@cerebro-check — no changes needed

After re-reading all threads and validating the branch state at 221ce35, all reviewer feedback already appears resolved by the previous cerebro-check commit:

Item Status
Inline: "shouldn't have this change anymore as it's codegen-ed file" ✅ Already fixed — override moved to codegen source charts/kube-prometheus-stack.yaml
Inline suggestion: externalUrl: {{ cp_ingress_ep }} ✅ Functionally equivalent — codegen produces {{ salt.metalk8s_network.get_control_plane_ingress_endpoint() }} (the cp_ingress_ep form would require a {% set %} block that doesn't exist; inline helper call matches the __var__(...) pattern used elsewhere)
PR comment: lint:codegen failed ✅ Verified locally — re-running charts/render.py produces a byte-identical chart.sls (diff returned 0 lines)

About the failing build-devcontainer check

This PR only touches charts/kube-prometheus-stack.yaml and salt/metalk8s/addons/prometheus-operator/deployed/chart.sls — nothing under .devcontainer/. The available logs are truncated and don't reveal a code-related error. Confidence this failure is caused by the PR: ~5%. Posted a separate comment requesting human confirmation rather than making no-op churn commits.

Addressed: 0 new items (all actionable feedback was already resolved on the branch). Skipped: 0.
Addressed: 0 feedback items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants