Add PodMonitor resource for Prometheus monitoring#85
Add PodMonitor resource for Prometheus monitoring#85RobuRishabh merged 1 commit intoopendatahub-io:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces Prometheus monitoring infrastructure for Spark operator metrics collection across ODH and RHOAI overlays. Adds a new Prometheus kustomization directory with a PodMonitor resource that scrapes Spark operator metrics from the Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Security & Architecture IssuesCWE-829 (Untrusted Source Control): The Missing RBAC/Authentication Context: The PodMonitor at
Pod Label Assumption Risk: The PodMonitor uses Indirect Manifest Patching: Using JSON6902 patches in overlays to apply labels post-resource-definition adds operational friction. Consider moving the 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/overlays/odh/kustomization.yaml`:
- Around line 22-31: The JSON6902 patch adds /metadata/labels which can fail if
labels already exist; update the patch for the PodMonitor named
spark-operator-metrics-monitor to either use a strategic-merge patch (add a
patchesStrategicMerge entry that sets metadata.labels.opendatahub.io/monitoring:
"true" for kind PodMonitor, apiVersion monitoring.coreos.com/v1) or keep
JSON6902 but change the path to the escaped single-key path
/metadata/labels/opendatahub.io~1monitoring so only that label is added instead
of replacing the whole labels map.
In `@config/overlays/rhoai/kustomization.yaml`:
- Around line 22-31: The current JSON patch replaces the entire /metadata/labels
object which drops inherited labels; change the patch to a JSON6902-style
operation that adds only the single label key by escaping the slash in the key
(use path /metadata/labels/opendatahub.io~1monitoring) and set its value to
"true", keeping the same target (PodMonitor, name:
spark-operator-metrics-monitor) so the base labels (app.kubernetes.io/name,
app.kubernetes.io/managed-by) remain unchanged.
In `@config/prometheus/podmonitor.yaml`:
- Around line 11-12: Remove the redundant tlsConfig block that sets
insecureSkipVerify when the PodMonitor uses scheme: http: locate the PodMonitor
spec (look for scheme: http) and delete the tlsConfig: insecureSkipVerify: true
stanza (the tlsConfig/insecureSkipVerify setting is ignored for HTTP and only
confuses readers).
- Around line 13-15: The PodMonitor currently contains a hardcoded
namespaceSelector block (namespaceSelector / matchNames -> spark-operator) that
prevents discovery in overlay namespaces; remove the entire namespaceSelector
section from the PodMonitor manifest so it falls back to the default behavior of
selecting pods in the same namespace as the resource (allowing overlays that set
metadata.namespace to work), i.e., delete the namespaceSelector / matchNames
entries in the PodMonitor YAML so pods are discovered in the overlay-configured
namespace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 22d3c37f-6ff6-49a2-a830-4821b8aa71bb
📒 Files selected for processing (5)
config/default/kustomization.yamlconfig/overlays/odh/kustomization.yamlconfig/overlays/rhoai/kustomization.yamlconfig/prometheus/kustomization.yamlconfig/prometheus/podmonitor.yaml
a0a33c6 to
9e5610c
Compare
9e5610c to
6f03f74
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/openshift/Makefile`:
- Around line 17-18: The Makefile currently sets CLI via "CLI ?= $(shell command
-v oc 2>/dev/null || command -v kubectl 2>/dev/null)" but doesn't guard when
both are missing; add a new phony target check-cli that verifies $(CLI) is
non-empty (or runs command -v oc/kubectl and exits non‑zero with a clear error
message when neither exists) and then add check-cli as a dependency to any
targets that use $(CLI) (the variable name CLI and the new check-cli target are
the unique symbols to modify) so the build fails fast with a helpful message
instead of crashing later.
- Around line 87-90: The e2e-kustomize-test target currently streams a remote
manifest via "$(CLI) apply -f $(PODMONITOR_CRD_URL)" which is unsafe and also
ignores failures; update the target to (1) use a pinned URL (tag or commit) for
PODMONITOR_CRD_URL or change it to a pinned variable, (2) download the manifest
first (curl/wget to a temp file referenced by PODMONITOR_CRD_URL), verify its
checksum/signature before applying, and (3) enable failure propagation by adding
set -e at the top of the recipe or explicitly check the download and apply exit
codes and exit non‑zero on error so the e2e-kustomize-test target (and CLI apply
invocation) cannot silently continue on failure.
- Line 23: The PODMONITOR_CRD_URL default currently points at the repository's
main branch; change the default value of the PODMONITOR_CRD_URL variable to
reference a specific Prometheus Operator release tag (replace "main" with a
stable release tag like vX.Y.Z) so it pins the CRD to a known immutable commit,
update the Makefile's PODMONITOR_CRD_URL assignment accordingly, and add a short
comment explaining the pin and the chosen tag so future maintainers know why it
is not tracking main.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 11c75127-e364-4237-b6dc-f53b38bda628
📒 Files selected for processing (6)
config/default/kustomization.yamlconfig/overlays/odh/kustomization.yamlconfig/overlays/rhoai/kustomization.yamlconfig/prometheus/kustomization.yamlconfig/prometheus/podmonitor.yamlexamples/openshift/Makefile
✅ Files skipped from review due to trivial changes (3)
- config/overlays/odh/kustomization.yaml
- config/prometheus/kustomization.yaml
- config/prometheus/podmonitor.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- config/default/kustomization.yaml
- config/overlays/rhoai/kustomization.yaml
examples/openshift/Makefile
Outdated
| # CLI tool: prefer oc if available, fall back to kubectl | ||
| CLI ?= $(shell command -v oc 2>/dev/null || command -v kubectl 2>/dev/null) |
There was a problem hiding this comment.
Missing validation when CLI tool is unavailable
If neither oc nor kubectl is installed, $(CLI) resolves to empty string and line 89 fails with an unhelpful error. Add a guard.
Proposed fix
# CLI tool: prefer oc if available, fall back to kubectl
CLI ?= $(shell command -v oc 2>/dev/null || command -v kubectl 2>/dev/null)
+
+.PHONY: check-cli
+check-cli:
+ifndef CLI
+ $(error "Neither oc nor kubectl found in PATH")
+endifThen add check-cli as a dependency to targets using $(CLI).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # CLI tool: prefer oc if available, fall back to kubectl | |
| CLI ?= $(shell command -v oc 2>/dev/null || command -v kubectl 2>/dev/null) | |
| # CLI tool: prefer oc if available, fall back to kubectl | |
| CLI ?= $(shell command -v oc 2>/dev/null || command -v kubectl 2>/dev/null) | |
| .PHONY: check-cli | |
| check-cli: | |
| ifndef CLI | |
| $(error "Neither oc nor kubectl found in PATH") | |
| endif |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/openshift/Makefile` around lines 17 - 18, The Makefile currently
sets CLI via "CLI ?= $(shell command -v oc 2>/dev/null || command -v kubectl
2>/dev/null)" but doesn't guard when both are missing; add a new phony target
check-cli that verifies $(CLI) is non-empty (or runs command -v oc/kubectl and
exits non‑zero with a clear error message when neither exists) and then add
check-cli as a dependency to any targets that use $(CLI) (the variable name CLI
and the new check-cli target are the unique symbols to modify) so the build
fails fast with a helpful message instead of crashing later.
examples/openshift/Makefile
Outdated
| # Configuration | ||
| KIND_CLUSTER_NAME ?= spark-operator | ||
| CLEANUP ?= true | ||
| PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest stable release version of prometheus-operator?
💡 Result:
The latest stable release version of prometheus-operator is v0.90.1.
Citations:
- 1: https://github.com/prometheus-operator/prometheus-operator
- 2: https://pkg.go.dev/github.com/prometheus-operator/prometheus-operator
- 3: https://releasealert.dev/github/prometheus-operator/prometheus-operator
- 4: https://github.com/prometheus-operator/prometheus-operator/actions/workflows/release.yaml
🏁 Script executed:
#!/bin/bash
# Verify the URL pattern works with prometheus-operator releases
# Check if the jsonnet file exists at specific release versions
# Test with the latest version from web search (v0.90.1)
curl -s -o /dev/null -w "v0.90.1 HTTP Status: %{http_code}\n" \
"https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.90.1/jsonnet/prometheus-operator/podmonitors-crd.json"
# Also test with the suggested version from the review (v0.75.0)
curl -s -o /dev/null -w "v0.75.0 HTTP Status: %{http_code}\n" \
"https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.75.0/jsonnet/prometheus-operator/podmonitors-crd.json"
# And verify main branch still works (to establish baseline)
curl -s -o /dev/null -w "main HTTP Status: %{http_code}\n" \
"https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json"Repository: opendatahub-io/spark-operator
Length of output: 144
Pin PODMONITOR_CRD_URL to a release tag instead of main branch
Referencing main creates a supply chain risk (CWE-829): any upstream change or repository compromise silently propagates to your build. Pin to a specific release tag.
Suggested fix
-PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json
+PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.90.1/jsonnet/prometheus-operator/podmonitors-crd.json📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json | |
| PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.90.1/jsonnet/prometheus-operator/podmonitors-crd.json |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/openshift/Makefile` at line 23, The PODMONITOR_CRD_URL default
currently points at the repository's main branch; change the default value of
the PODMONITOR_CRD_URL variable to reference a specific Prometheus Operator
release tag (replace "main" with a stable release tag like vX.Y.Z) so it pins
the CRD to a known immutable commit, update the Makefile's PODMONITOR_CRD_URL
assignment accordingly, and add a short comment explaining the pin and the
chosen tag so future maintainers know why it is not tracking main.
There was a problem hiding this comment.
good catch. need to add checksum validation too.
There was a problem hiding this comment.
@RobuRishabh — Agreed. Here's a complete fix that pins the version and adds SHA256 integrity verification before apply. The maintainer must compute the real digest once and embed it:
-PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json
+# Pin to a specific release; update both URL and SHA256 together when upgrading.
+PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.90.1/jsonnet/prometheus-operator/podmonitors-crd.json
+PODMONITOR_CRD_SHA256 ?= <replace-with-sha256-of-v0.90.1-file>And in the e2e-kustomize-test target, replace the direct apply -f <url> with a download-verify-apply-cleanup sequence:
.PHONY: e2e-kustomize-test
e2e-kustomize-test:
`@echo` "Running Go e2e tests with Kustomize installation..."
- $(CLI) apply -f $(PODMONITOR_CRD_URL)
+ $(eval CRD_TMP := $(shell mktemp /tmp/podmonitors-crd.XXXXXX.json))
+ curl -fsSL $(PODMONITOR_CRD_URL) -o $(CRD_TMP)
+ echo "$(PODMONITOR_CRD_SHA256) $(CRD_TMP)" | sha256sum -c -
+ $(CLI) apply -f $(CRD_TMP)
+ rm -f $(CRD_TMP)
cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30mTo compute the digest to embed:
curl -fsSL https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.90.1/jsonnet/prometheus-operator/podmonitors-crd.json | sha256sumTwo additional notes:
- macOS:
sha256sumis part of GNU coreutils; on macOS CI runners, useshasum -a 256 -cinstead or guard with aunamecheck. - Cleanup on failure: the
rm -fwon't run ifsha256sumfails andmakeaborts. If that's a concern, use a.ONESHELL:target or atrapin the recipe to guarantee cleanup.
examples/openshift/Makefile
Outdated
| e2e-kustomize-test: kind-setup ## Run Go e2e tests using Kustomize manifests for operator installation. | ||
| @echo "Running Go e2e tests with Kustomize installation..." | ||
| $(CLI) apply -f $(PODMONITOR_CRD_URL) | ||
| cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30m |
There was a problem hiding this comment.
Remote URL applied without integrity check
$(CLI) apply -f $(PODMONITOR_CRD_URL) fetches and applies a remote manifest in one step—functionally equivalent to curl-pipe-shell. Combined with the unpinned URL, this is exploitable if the upstream is tampered with (CWE-829).
Mitigations:
- Pin the URL to a commit/tag (see prior comment).
- Optionally download first and verify checksum before applying.
Additionally, the target silently continues if the apply fails. Consider adding set -e behavior or explicit error handling.
Proposed fix with error handling
e2e-kustomize-test: kind-setup ## Run Go e2e tests using Kustomize manifests for operator installation.
`@echo` "Running Go e2e tests with Kustomize installation..."
- $(CLI) apply -f $(PODMONITOR_CRD_URL)
+ `@if` [ -z "$(CLI)" ]; then echo "ERROR: Neither oc nor kubectl found"; exit 1; fi
+ $(CLI) apply -f $(PODMONITOR_CRD_URL) || { echo "ERROR: Failed to apply PodMonitor CRD"; exit 1; }
cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30m📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| e2e-kustomize-test: kind-setup ## Run Go e2e tests using Kustomize manifests for operator installation. | |
| @echo "Running Go e2e tests with Kustomize installation..." | |
| $(CLI) apply -f $(PODMONITOR_CRD_URL) | |
| cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30m | |
| e2e-kustomize-test: kind-setup ## Run Go e2e tests using Kustomize manifests for operator installation. | |
| `@echo` "Running Go e2e tests with Kustomize installation..." | |
| `@if` [ -z "$(CLI)" ]; then echo "ERROR: Neither oc nor kubectl found"; exit 1; fi | |
| $(CLI) apply -f $(PODMONITOR_CRD_URL) || { echo "ERROR: Failed to apply PodMonitor CRD"; exit 1; } | |
| cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30m |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/openshift/Makefile` around lines 87 - 90, The e2e-kustomize-test
target currently streams a remote manifest via "$(CLI) apply -f
$(PODMONITOR_CRD_URL)" which is unsafe and also ignores failures; update the
target to (1) use a pinned URL (tag or commit) for PODMONITOR_CRD_URL or change
it to a pinned variable, (2) download the manifest first (curl/wget to a temp
file referenced by PODMONITOR_CRD_URL), verify its checksum/signature before
applying, and (3) enable failure propagation by adding set -e at the top of the
recipe or explicitly check the download and apply exit codes and exit non‑zero
on error so the e2e-kustomize-test target (and CLI apply invocation) cannot
silently continue on failure.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/openshift/Makefile (1)
89-90:⚠️ Potential issue | 🟡 MinorNo error propagation on CRD apply failure
If
$(CLI) applyfails (network issue, auth failure, invalid manifest), execution continues togo test, which will likely fail with confusing errors when PodMonitor CRD doesn't exist.Proposed fix
e2e-kustomize-test: `@echo` "Running Go e2e tests with Kustomize installation..." - $(CLI) apply -f $(PODMONITOR_CRD_URL) + `@if` [ -z "$(CLI)" ]; then echo "ERROR: Neither oc nor kubectl found in PATH"; exit 1; fi + $(CLI) apply -f $(PODMONITOR_CRD_URL) || { echo "ERROR: Failed to apply PodMonitor CRD"; exit 1; } cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ./examples/openshift/tests/e2e/ -v -ginkgo.v -timeout 30m🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/openshift/Makefile` around lines 89 - 90, The Makefile runs $(CLI) apply -f $(PODMONITOR_CRD_URL) but does not propagate failures, so subsequent cd $(REPO_ROOT) && INSTALL_METHOD=kustomize go test ... will run even if the CRD apply failed; change the recipe to fail fast by ensuring the apply command's non-zero exit aborts the recipe (for example, add an explicit check like appending "|| exit 1" to $(CLI) apply -f $(PODMONITOR_CRD_URL) or chain the commands with "&&" so the go test only runs when the apply succeeds), keeping the existing $(CLI) and $(PODMONITOR_CRD_URL) tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/openshift/Makefile`:
- Around line 89-90: The Makefile runs $(CLI) apply -f $(PODMONITOR_CRD_URL) but
does not propagate failures, so subsequent cd $(REPO_ROOT) &&
INSTALL_METHOD=kustomize go test ... will run even if the CRD apply failed;
change the recipe to fail fast by ensuring the apply command's non-zero exit
aborts the recipe (for example, add an explicit check like appending "|| exit 1"
to $(CLI) apply -f $(PODMONITOR_CRD_URL) or chain the commands with "&&" so the
go test only runs when the apply succeeds), keeping the existing $(CLI) and
$(PODMONITOR_CRD_URL) tokens.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 09187bcd-a660-4a19-9125-e2b78e5f230c
📒 Files selected for processing (1)
examples/openshift/Makefile
a636e4e to
5e03427
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
config/prometheus/podmonitor.yaml (1)
11-14: Narrow the PodMonitor selector to one operator instance.The selector currently matches any controller pod with the same
name+componentlabels in the namespace. Addapp.kubernetes.io/instanceto prevent accidental cross-instance scraping.Suggested change
spec: podMetricsEndpoints: - port: metrics path: /metrics scheme: http selector: matchLabels: app.kubernetes.io/name: spark-operator + app.kubernetes.io/instance: spark-operator app.kubernetes.io/component: controllerAs per coding guidelines, this follows
**: REVIEW PRIORITIES:2. Architectural issues and anti-patternsand3. Bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/prometheus/podmonitor.yaml` around lines 11 - 14, The PodMonitor's selector.matchLabels is too broad and should include the instance label to avoid scraping other operator instances; update the selector.matchLabels block (the selector and matchLabels entries) to add app.kubernetes.io/instance with the specific instance value for this operator so the PodMonitor only targets that single operator instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@config/prometheus/podmonitor.yaml`:
- Around line 11-14: The PodMonitor's selector.matchLabels is too broad and
should include the instance label to avoid scraping other operator instances;
update the selector.matchLabels block (the selector and matchLabels entries) to
add app.kubernetes.io/instance with the specific instance value for this
operator so the PodMonitor only targets that single operator instance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 38f824b2-5a96-4125-ad67-43bbb2323508
📒 Files selected for processing (5)
config/overlays/odh/kustomization.yamlconfig/overlays/rhoai/kustomization.yamlconfig/prometheus/kustomization.yamlconfig/prometheus/podmonitor.yamlexamples/openshift/Makefile
✅ Files skipped from review due to trivial changes (2)
- config/prometheus/kustomization.yaml
- config/overlays/rhoai/kustomization.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/openshift/Makefile
examples/openshift/Makefile
Outdated
| # Configuration | ||
| KIND_CLUSTER_NAME ?= spark-operator | ||
| CLEANUP ?= true | ||
| PODMONITOR_CRD_URL ?= https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/jsonnet/prometheus-operator/podmonitors-crd.json |
There was a problem hiding this comment.
good catch. need to add checksum validation too.
Add opendatahub.io/monitoring: 'true' label for odh/rhoai overlays. Signed-off-by: Ali Maredia <amaredia@redhat.com>
01e3aae to
23544c9
Compare
Add opendatahub.io/monitoring: 'true' label for odh/rhoai overlays.
Purpose of this PR
Proposed changes:
Change Category
Rationale
Checklist
Additional Notes
Summary by CodeRabbit
New Features
Tests