Kubernetes Pod Resource Scanner

Kubernetes resource monitoring made simple. Scan CPU, memory, and disk usage across all namespaces and nodes. Export human-readable CSV and Google Sheets with scaling recommendations—perfect for capacity planning, cost optimization, and Kubernetes cluster visibility.

A lightweight, read-only Kubernetes tool that runs as a CronJob on AKS, GKE, EKS, or any Kubernetes cluster. Get a single append-only CSV (raw values for parsing) and optional Google Sheet with one new tab per run (Run <timestamp> for historical data) and a Dashboard tab that visualizes the latest run.

✨ Features

☁️ Cluster-agnostic — Works on AKS, GKE, EKS, and any Kubernetes (1.21+)
🔒 Read-only — No cluster changes; lists pods, nodes, namespaces, workloads
📁 Single CSV — One append-only file (all-resources.csv) with scan_date for long-term history
👁️ Human-readable — Memory/CPU/disk in Mi, Gi, cores, and % (no raw bytes or millicores)
💡 Recommendations — Scale up/down, limit changes, OOM risk, and growth alerts
📈 Week-over-Week Comparison — Tracks resource changes per namespace with growth alerts
📊 Actual Usage Metrics — CPU/memory actual usage via metrics-server (optional, degrades gracefully)
💀 OOM Kill Detection — Flags containers that were OOM-killed and emits oom_risk recommendations
📋 ResourceQuota Reporting — Scans namespace quotas (hard vs used) into resource-quotas.csv
🔄 HPA Awareness — Marks HPA-managed containers; annotates scale-down recs accordingly
💰 Cost Estimation — Estimates monthly cost per container based on CPU/memory requests
📡 Prometheus Textfile Export — Writes pod-scanner.prom for node_exporter scraping
🚫 Namespace Exclusion — Skip specific namespaces (e.g. kube-system,monitoring)
🧪 Dry-Run Mode — Full scan with no file writes; logs recommendations only
📋 Optional Google Sheet — Same data appended to one sheet for dashboards and sharing
⏰ Helm + CronJob — Deploy once; runs on a schedule (e.g. weekly)

🎯 Why Use This

Use case	How it helps
Capacity planning	See requested vs allocatable CPU/memory/disk per node and namespace.
Cost visibility	Export to CSV/Sheets for billing, showback, or chargeback.
Right-sizing	Get recommendations when limits are much higher than requests.
Multi-cluster	Set `cluster` name per cluster; one CSV or sheet for all.
Compliance & audit	Append-only history with `scan_date` for trend and audit.

📦 What It Collects

Area	Data
Pods / Containers	Namespace, pod, container, node, workload kind/name, replicas, CPU/memory/ephemeral-storage request & limit, status
Actual Usage	Per-container CPU/memory actual usage from metrics-server (`cpu_usage`, `memory_usage`) — optional
OOM Kills	Whether each container was OOM-killed in its last termination (`oom_killed` 0/1)
HPA	Whether each container's workload is managed by an HPA (`hpa_managed` 0/1)
Cost	Estimated monthly cost per container based on CPU/memory requests (`est_monthly_cost_usd`)
Nodes	Per-node CPU, memory, and disk (ephemeral-storage) capacity and allocatable
Node Usage	Actual CPU/memory usage per node from metrics-server (when available)
Utilization	Requested vs allocatable per node (CPU, memory, disk %)
Namespace	Pod count, container count, CPU/memory requested per namespace
ResourceQuotas	Hard and used values for CPU, memory, and pod count per namespace
Week-over-Week	CPU/memory/pod count changes vs previous scan with % growth
Recommendations	Scale up/down nodes, change limits (set or lower), OOM risk, growth alerts

📊 Output

File	Description
`all-resources.csv`	Single append-only file; each run adds rows with `scan_date`. Contains pod/container/node/namespace data, usage, OOM, HPA, cost, and recommendations.
`resource-quotas.csv`	Append-only; namespace ResourceQuota hard and used values per run.
`pod-scanner.prom`	Prometheus textfile format for node_exporter scraping. Includes namespace CPU/memory requested, node utilization %, usage % (if metrics-server available), and recommendation counts.
`last_success.txt`	Timestamp and cluster name of the last successful scan (for monitoring).

📋 Google Sheet (optional) — One new tab per run (historical data) + Dashboard: Each run creates "Run <timestamp>" with summary tables (namespace, node utilization, recommendations, and 13-column container details including usage, OOM, HPA, and cost); only the last N run tabs are kept (configurable). Dashboard shows KPIs including total estimated monthly cost.

🚀 Quick Start

# Clone and install from chart
helm install pod-resource-scanner ./chart \
  --namespace pod-resource-scanner \
  --create-namespace \
  --set fullnameOverride=pod-resource-scanner \
  --set image.repository=ghcr.io/clouddrove/pod-resource-scanner \
  --set image.tag=latest

The CronJob runs weekly by default (Sunday 00:00 UTC). To run once manually:

kubectl create job --from=cronjob/pod-resource-scanner manual-$(date +%s) -n pod-resource-scanner
kubectl logs -n pod-resource-scanner job/manual-<timestamp> -f

📥 Installation (Helm)

1. Image

Use the pre-built image from GitHub Container Registry, or build and push your own:

docker build -t ghcr.io/clouddrove/pod-resource-scanner:latest .
docker push ghcr.io/clouddrove/pod-resource-scanner:latest

2. Install

helm install pod-resource-scanner ./chart \
  --namespace pod-resource-scanner \
  --create-namespace \
  --set fullnameOverride=pod-resource-scanner \
  --set image.repository=ghcr.io/clouddrove/pod-resource-scanner \
  --set image.tag=latest

3. Override schedule and config

helm upgrade pod-resource-scanner ./chart -n pod-resource-scanner \
  --set config.clusterName=prod-us-east-1 \
  --set cronjob.schedule="0 9 * * 1"

See Configuration and chart/values.yaml for all options.

Useful commands

🔼 Upgrade: helm upgrade pod-resource-scanner ./chart -n pod-resource-scanner [--set ...]
🗑️ Uninstall: helm uninstall pod-resource-scanner -n pod-resource-scanner
✔️ Lint: helm lint ./chart

⚙️ Configuration

Env var / Helm value	Description	Default
`POD_SCANNER_OUTPUT_DIR`	Directory for CSV output	`/output`
`POD_SCANNER_CLUSTER_NAME`	Cluster identifier (for multi-cluster CSV/Sheet)	(empty)
`POD_SCANNER_EXCLUDE_NAMESPACES`	Comma-separated namespaces to skip (e.g. `kube-system,monitoring`)	(empty)
`POD_SCANNER_DRY_RUN`	`true`/`1` — scan fully but write no files	`false`
`POD_SCANNER_METRICS_ENABLED`	`false` to skip metrics-server calls	`true`
`POD_SCANNER_COST_CPU_CORE_HOUR`	Estimated cost per CPU core per hour (USD)	`0.048`
`POD_SCANNER_COST_MEM_GB_HOUR`	Estimated cost per GiB memory per hour (USD)	`0.006`
`POD_SCANNER_UPDATE_GOOGLE_SHEET`	Set to `true`/`1` to update Google Sheet	unset
`POD_SCANNER_SHEET_ID`	Google Sheet ID (or use secret)	-
`POD_SCANNER_SHEET_RUN_TABS_KEEP`	Number of Run <timestamp> tabs to keep	`10`
`GOOGLE_APPLICATION_CREDENTIALS`	Path to service account JSON	-
`POD_SCANNER_UTIL_SCALE_UP_PCT`	Utilization % above which to recommend scale up	`75`
`POD_SCANNER_UTIL_SCALE_DOWN_PCT`	Utilization % below which to recommend scale down	`25`
`POD_SCANNER_GROWTH_ALERT_PCT`	Namespace growth % to trigger alert (week-over-week)	`20`
`POD_SCANNER_RETENTION_DAYS`	Delete snapshot CSVs older than N days (`0` = keep all)	`0`
`POD_SCANNER_LOG_LEVEL`	Logging level	`INFO`

RBAC: the chart creates a ClusterRole and ClusterRoleBinding (read-only) for pods, nodes, namespaces, workloads, resourcequotas, horizontalpodautoscalers, and metrics.k8s.io (for metrics-server).

📋 Google Sheet (Optional)

Google Cloud — Enable Google Sheets API; create a Service Account and download JSON key.
Sheet — Create a sheet and share it with the service account email as Editor. Copy the Sheet ID from the URL: https://docs.google.com/spreadsheets/d/<SHEET_ID>/edit.

Secret (with fullnameOverride=pod-resource-scanner):

kubectl create secret generic pod-resource-scanner-google -n pod-resource-scanner \
  --from-literal=sheet-id="YOUR_SHEET_ID" \
  --from-file=credentials.json=/path/to/service-account-key.json

Enable in Helm

helm upgrade pod-resource-scanner ./chart -n pod-resource-scanner --set googleSheet.enabled=true

The job appends to all-resources.csv and updates the sheet: a new Run <timestamp> tab each run (namespace summary, node utilization, recommendations, container details—keeps last N for history) and Dashboard (KPIs for the latest run). Set POD_SCANNER_SHEET_RUN_TABS_KEEP (default 10) to control how many run tabs are retained.

💻 Running Locally

Without deploying to a cluster:

pip install -r requirements.txt
export POD_SCANNER_OUTPUT_DIR=./output
python scanner.py

Output goes to ./output/all-resources.csv. For Google Sheet, set GOOGLE_APPLICATION_CREDENTIALS, POD_SCANNER_SHEET_ID, and POD_SCANNER_UPDATE_GOOGLE_SHEET=true.

Docker (local kubeconfig)

docker build -t pod-resource-scanner:local .
docker run --rm \
  -v ~/.kube:/home/appuser/.kube:ro \
  -v "$(pwd)/output":/output \
  pod-resource-scanner:local

Builds the image and runs the scanner using your local kubeconfig; CSV is written to ./output.

🧪 Testing

pip install pytest
python3 -m pytest tests/ -v

No cluster or Google Sheets account required — all tests run with mocked dependencies.

✅ Production Checklist

🏷️ Use a tagged image (e.g. image.tag=0.1.0); avoid :latest in production.
☁️ Set config.clusterName for multi-cluster visibility.
📐 Override resources and cronjob.activeDeadlineSeconds for large clusters.
📡 Monitor CronJob failure (e.g. Prometheus or last_success.txt age).
🔑 For Google Sheet: use a dedicated service account; rotate keys periodically.

🐛 Troubleshooting

Issue	What to do
Permission denied on /output	Ensure `podSecurityContext.fsGroup: 1000` and image runs as UID 1000.
Google Sheet 403 / 404	Share the sheet with the service account email; check Sheet ID.
API timeout / connection refused	Increase `cronjob.activeDeadlineSeconds` or retry; check network policies.
Out of memory	Increase `resources.limits.memory` in Helm values.

Logs and one-off run

kubectl get jobs -n pod-resource-scanner --sort-by=.metadata.creationTimestamp
kubectl logs -n pod-resource-scanner job/<job-name> --tail=200
kubectl create job --from=cronjob/pod-resource-scanner manual-test -n pod-resource-scanner
kubectl logs -n pod-resource-scanner job/manual-test -f

The scanner writes last_success.txt in the output directory (timestamp=, cluster=) for monitoring.

📡 Grafana Dashboard

A pre-built Grafana dashboard is included at grafana/dashboard.json. Import it in seconds — no manual panel setup required.

Prerequisites

Configure Prometheus node_exporter textfile collector to scrape pod-scanner.prom:

# values.yaml (kube-prometheus-stack or node-exporter chart)
extraArgs:
  - --collector.textfile.directory=/output

extraVolumeMounts:
  - name: pod-scanner-output
    mountPath: /output
    readOnly: true

Or if running node_exporter directly:

node_exporter --collector.textfile.directory=/path/to/output

The scanner and node_exporter must share the same output directory (PVC or hostPath).

Import

Open Grafana → Dashboards → Import
Upload grafana/dashboard.json or paste its contents
Select your Prometheus data source
Pick your cluster from the variable drop-down

Dashboard Panels

Panel	Metric
Last Scan Age	`pod_scanner_last_scan_timestamp_seconds`
Namespaces / Nodes / Recommendations / Cost	Stat cards
CPU Requested by Namespace	`pod_scanner_namespace_cpu_requested_millicores`
Memory Requested by Namespace	`pod_scanner_namespace_memory_requested_bytes`
CPU Week-over-Week Change %	`pod_scanner_namespace_cpu_change_pct`
Node CPU / Memory Utilization %	`pod_scanner_node_cpu_util_pct`, `pod_scanner_node_memory_util_pct`
Node CPU / Memory Actual Usage %	`pod_scanner_node_cpu_usage_pct`, `pod_scanner_node_memory_usage_pct`
Recommendations by Type	`pod_scanner_recommendations_total`
Namespace Cost, OOM Kills & CPU Change	`pod_scanner_namespace_est_monthly_cost_usd`, `pod_scanner_namespace_oom_killed_total`, `pod_scanner_namespace_cpu_change_pct`

Note: Node actual usage panels are only populated when metrics-server is enabled (POD_SCANNER_METRICS_ENABLED=true).

🤝 Contributing

Contributions are welcome. Please open an issue or pull request on GitHub.

📄 License

See LICENSE in this repository.

Repository: github.com/clouddrove/pod-resource-scanner · Maintained by CloudDrove

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
chart		chart
grafana		grafana
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
quantity.py		quantity.py
requirements.lock		requirements.lock
requirements.txt		requirements.txt
scanner.py		scanner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kubernetes Pod Resource Scanner

Table of Contents

✨ Features

🎯 Why Use This

📦 What It Collects

📊 Output

🚀 Quick Start

📥 Installation (Helm)

1. Image

2. Install

3. Override schedule and config

⚙️ Configuration

📋 Google Sheet (Optional)

💻 Running Locally

🧪 Testing

✅ Production Checklist

🐛 Troubleshooting

📡 Grafana Dashboard

Prerequisites

Import

Dashboard Panels

🤝 Contributing

📄 License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kubernetes Pod Resource Scanner

Table of Contents

✨ Features

🎯 Why Use This

📦 What It Collects

📊 Output

🚀 Quick Start

📥 Installation (Helm)

1. Image

2. Install

3. Override schedule and config

⚙️ Configuration

📋 Google Sheet (Optional)

💻 Running Locally

🧪 Testing

✅ Production Checklist

🐛 Troubleshooting

📡 Grafana Dashboard

Prerequisites

Import

Dashboard Panels

🤝 Contributing

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages