Skip to content

feat: add cadvisor as an opt-in additional service#1426

Open
barnabasbusa wants to merge 4 commits into
mainfrom
feat/cadvisor
Open

feat: add cadvisor as an opt-in additional service#1426
barnabasbusa wants to merge 4 commits into
mainfrom
feat/cadvisor

Conversation

@barnabasbusa

@barnabasbusa barnabasbusa commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #764.

Adds cAdvisor as an opt-in additional service for per-container resource metrics, plus a bundled Grafana dashboard. Modeled on how disruptoor is handled.

Important

Blocked on upstream Kurtosis. Per-container metrics require cAdvisor to run in the host cgroup namespace, which Kurtosis does not expose yet. This PR uses a new host_cgroup_namespace ServiceConfig field added in kurtosis-tech/kurtosis#3152. It will only work once that PR merges and a Kurtosis release containing it is out; until then, running cadvisor will fail interpretation with an unknown-argument error. Do not merge before bumping the minimum Kurtosis version to the release that includes #3152.

Why the cgroup namespace is needed

Without it, cAdvisor's container runs in Docker's default private cgroup namespace, so /sys/fs/cgroup only shows its own subtree — cAdvisor reports just the root cgroup, not per-container metrics (verified live: it only resolved id="/"). Kurtosis caps bind_mounts to /var/run/docker.sock, so mounting host /sys/fs/cgroup isn't an option. The fix is host_cgroup_namespace=True (Docker --cgroupns=host), which makes /sys/fs/cgroup reflect the full host hierarchy so cAdvisor can see sibling containers' cgroups.

How it works

  • Opt-in via additional_services: [cadvisor] — off by default.
  • Privileged + host namespaces: runs with privileged: True, host_pid_namespace: True, host_cgroup_namespace: True, and bind mounts /var/run/docker.sock (the same bind_mounts mechanism disruptoor uses).
  • Docker-backend guard: like disruptoor, fails fast with a clear message on non-Docker backends.
  • Prometheus scraping: the launcher returns a metrics job (/metrics on :8080) appended to prometheus_additional_metrics_jobs, so cadvisor is auto-scraped when prometheus_grafana is enabled.
  • Grafana dashboard: cadvisor-dashboard.json is added to the always-provisioned dashboards dir; its panels only query cadvisor metrics, so they're empty unless the service runs.
  • Configurable via cadvisor_params (image + cpu/mem limits).

Files

  • src/cadvisor/cadvisor_launcher.star — new launcher.
  • main.star — import, Docker-backend guard, and the additional_services branch.
  • src/package_io/constants.starDEFAULT_CADVISOR_IMAGE (gcr.io/cadvisor/cadvisor:v0.52.1).
  • src/package_io/input_parser.starcadvisor_params defaults/struct/merge.
  • src/package_io/sanity_check.starcadvisor service + cadvisor_params allowlist.
  • static_files/grafana-config/dashboards/cadvisor-dashboard.json — Grafana dashboard.
  • .github/tests/cadvisor.yaml — standalone example (kept out of the per-PR matrix because it needs --privileged, same as disruptoor.yaml).
  • README.md — docs for the service and cadvisor_params.

Usage

kurtosis run --enclave cadvisor . --args-file .github/tests/cadvisor.yaml --privileged
additional_services:
  - cadvisor
  - prometheus_grafana

Tested

Ran with --privileged locally: cadvisor RUNNING and scraped (up{service="cadvisor"}=1), Grafana provisions the cAdvisor dashboard (10 panels). Per-container resolution depends on the host cgroup namespace landing upstream (#3152).

Opt-in via additional_services: [cadvisor], modeled on disruptoor. Runs
privileged with bind mounts for the host docker socket, /sys and
/var/lib/docker, and is guarded to the Docker backend. Exposes /metrics on
:8080 and is auto-scraped by Prometheus when prometheus_grafana is enabled.
…ist)

Kurtosis only permits /var/run/docker.sock as a bind-mount host path; mounting
/var/run, /sys, /var/lib/docker is rejected at validation. Mount just the
socket and add host_pid_namespace, matching disruptoor.
Bundled in the always-provisioned dashboards dir, so it loads whenever
grafana runs. Panels query cadvisor metrics (CPU/memory/network/filesystem
per container), so they only populate when the cadvisor service is enabled.
Requires kurtosis-tech/kurtosis#3152 (host_cgroup_namespace) and a Kurtosis
release that includes it. Without the host cgroup namespace cAdvisor only sees
the root cgroup; with it /sys/fs/cgroup reflects the full host hierarchy so it
can report per-container metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add working node exporter to the default dashboards list of the package

1 participant