Add ovirt-engine-health codebundle#682
Open
theyashl wants to merge 7 commits into
Open
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comprehensive SLI + runbook for oVirt/RHV/OLVM environments via the engine REST API. Authenticates with an SSO bearer token and checks engine reachability, host status, VM status, storage domain capacity, cluster health, recent critical events, and stale VM snapshots. Optional CA cert for TLS verification. Includes .runwhen templates and a lightweight .test harness (no infra provisioning, since oVirt is self-hosted). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stdlib-only mock REST server serving the SSO token endpoint and all 7 API endpoints the bundle calls, with healthy/unhealthy scenarios and now-relative timestamps so event/snapshot windows behave realistically. Wired into the .test Taskfile (mock, test-mock, run-sli-mock) with a Dockerfile and README. Verified all check scripts end-to-end against both scenarios. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Importing OVIRT_CA_CERT as a required secret failed the entire suite
whenever no CA cert was configured (the common system-trust-store case).
Mark the import optional in both robots; when unset, Run Bash File skips
the non-Secret value and ovirt_auth.sh falls back to the system trust
store. Guard the secret entry in the SLI/taskset templates with an
{% if custom.ovirt_ca_cert %} so it is only referenced when provided.
Verified end-to-end against the mock (with RW.Core/RW.platform installed):
- SLI healthy -> composite 1.0; unhealthy -> 0.14 with correct sub-scores
- Runbook unhealthy -> 7 issues (engine-reachable branch correctly skipped),
healthy -> 0 issues
- Both robots run cleanly with OVIRT_CA_CERT entirely absent
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
oVirt is not a RunWhen-discoverable platform type, so a generation rule has nothing to match and the commented placeholder file only produced a 'generation rules file does not contain any data' warning during workspace upload. Remove it; document that the SLX is created directly from the templates (config + secrets) rather than auto-generated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
oVirt has no discoverable resource of its own, so anchor the generation rule on the kubernetes 'cluster' resource purely as a trigger (qualifiers: [cluster]) -> one oVirt SLX per discovered cluster. All SLX/SLI/runbook content comes from workspaceInfo custom.* + workspace secrets, not the matched cluster. Mirrors the k8s-cluster-resource-health singleton pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
ovirt-engine-healthCodeBundle that monitors the health of an oVirt virtualization environment (oVirt / Red Hat Virtualization / Oracle Linux Virtualization Manager) via the oVirt engine REST API (/ovirt-engine/api).Modeled on the existing
jenkins-healthbundle: an SLI that emits a composite 0–1 health score and a runbook that raises an actionable issue per problem, both backed by bash scripts usingcurl+jq.Checks (7, shared across
sli.robotandrunbook.robot)The SLI adds a
Generate ... Health Scoretask that averages the 7 sub-scores.Key decisions
/ovirt-engine/sso/oauth/token(grant_type=password,scope=ovirt-app-api), sharedovirt_auth.shhelper.OVIRT_CA_CERT→curl --cacert, otherwise the system trust store (self-signed engine certs are common).gh-actions-healthprecedent — the generation rule is a commented "how it would look" template and the README is explicit that SLXs are config-driven, not auto-discovered.try tonumber catch 0) so an unexpected date format can't crash a check or produce false-positive stale snapshots.Testing
.test/ships a lightweight Taskfile (check-config,smoke-scripts,run-sli,run-runbook) + README — no infra provisioning, since oVirt is self-hosted.bash -nclean on all 8 scripts; all 7 jq filters validated against representative payloads; Robot dry-run parsed every task with no syntax errors (only the RW platform libraries are unavailable locally, as expected).🤖 Generated with Claude Code