Stop overspending on Kubernetes! This Grafana dashboard helps you visualize and optimize resource allocation in your Kubernetes cluster, turning wasteful spending into measurable cost savings.
It provides a clear, actionable overview of the discrepancy between requested resources and actual usage, allowing you to identify both over-provisioned (wasteful) and under-provisioned (at-risk) workloads.
This dashboard was developed to solve the common problem of high requested resource usage triggering unnecessary alerts and ballooning cloud bills, while actual usage remains low. Gain the transparency you need to make informed optimization decisions.
- Direct Cost Savings: Quickly identify and reduce wasted CPU and Memory resources to lower your cloud bills.
- Performance Stability: Proactively detect under-provisioned pods that are at risk of throttling or eviction, ensuring application stability.
- Namespace-level Overview: High-level graphs comparing CPU and Memory requests vs. actual usage for each namespace, providing a holistic view of efficiency.
- Top 10 Wasteful Pods: Prioritized tables that pinpoint the exact pods wasting the most CPU and Memory in absolute terms, enabling targeted optimization.
- Percentage-Based Waste Calculation: An intuitive column showing the percentage of requested resources being wasted, making it easy to spot the most inefficient workloads at a glance.
- Dynamic Namespace Filter: A convenient dropdown menu allows you to filter the entire dashboard to focus on specific namespaces or analyze the entire cluster.
The dashboard contains four key panels designed to guide your optimization efforts:
These graphs provide a high-level, historical view of your cluster's efficiency trends.
- Request Line (Often High, Flat): Represents the total amount of CPU/Memory reserved by all pods within a namespace. This is what you're paying for.
- Usage Lines (Lower, Fluctuating): Shows the actual, real-time CPU/Memory consumption.
- Key Insight: The significant gap between the request and usage lines visually represents the total amount of wasted resources for that namespace over time.
These tables are your primary tool for taking immediate, actionable steps to optimize your cluster.
| Column | Description |
|---|---|
| Pod | The name of the Kubernetes pod. |
| Namespace | The Kubernetes namespace the pod belongs to. |
| Request | The amount of CPU/Memory the pod has guaranteed (requested). |
| Usage | The actual amount of CPU/Memory the pod is consuming at this moment. |
| Waste | The absolute difference between Request and Usage (Request - Usage). A positive value indicates over-provisioning; a negative value indicates under-provisioning. |
| Wasted % | The relative waste calculated as (Waste / Request) * 100%. This is the most important column for prioritization, highlighting the percentage of requested resources that are unused. |
How to Interpret "Wasted %"
- ✅ Positive Waste (e.g., +91%): Over-provisioned. The pod is reserving far more resources than it actually uses.
- Action: This is a direct cost-saving opportunity. Decrease the pod's resource request to align with actual usage.
⚠️ Negative Waste (e.g., -177%): Under-provisioned. The pod is using significantly more resources than it requested (often referred to as "bursting").- Action: This is a stability risk. Increase the pod's resource request to prevent CPU throttling, memory OOMKills, or pod eviction under load, ensuring consistent application performance.
For this dashboard to function correctly, your Kubernetes environment must be properly configured with:
- Grafana: Version 9.0 or newer.
- Prometheus: A Prometheus data source configured in Grafana, actively scraping metrics from your Kubernetes cluster.
- kube-state-metrics: Must be deployed in your cluster and exposing metrics to Prometheus, specifically
kube_pod_container_resource_requestsandkube_pod_info. - cAdvisor/kubelet Metrics: Must be exposing container usage metrics to Prometheus, such as
container_cpu_usage_seconds_totalandcontainer_memory_working_set_bytes.
- Download: Copy the entire contents of the
dashboard.jsonfile from this repository. - Import in Grafana: In your Grafana instance, navigate to Dashboards -> Import.
- Paste & Load: Paste the copied JSON model into the text area. Click Load.
- Select Data Source: Choose your Prometheus data source from the dropdown menu.
- Import: Click Import.
The dashboard utilizes a dynamic namespace filter for focused analysis. Ensure the following dashboard variable is configured correctly:
- Name:
namespace - Type:
Query - Query Options:
- Query Type:
Query result - Query:
count by (namespace) (kube_pod_info) - Regex:
/\"([^\"]+)\"/(This extracts the namespace names from the query result.)
- Query Type:
- Selection Options:
- Enable Multi-value:
true - Enable Include All option:
true - Set Custom all value:
.*
- Enable Multi-value:
- Post-Import Check: After importing, it's recommended to go to Dashboard settings -> Variables to verify that this variable is correctly configured for your environment. Remember to ensure all panel queries use
namespace=~"${namespace:regex}"to support the multi-select and "All" options.
This dashboard is a powerful tool for identifying resource waste and inefficiency. However, implementing the necessary changes, refining your resource requests/limits, and building a truly cost-optimized and stable Kubernetes environment can be complex.
If your organization requires deeper insights, customized monitoring solutions, or hands-on assistance to implement the identified optimizations and achieve significant, measurable cloud cost reductions, I'm available for freelance consulting engagements.
As a CKA-certified Cloud/DevOps Engineer specializing in Kubernetes efficiency and observability, I can help you:
- Perform in-depth Kubernetes cost audits and identify precise saving opportunities.
- Develop tailored Grafana dashboards and monitoring solutions for your specific needs.
- Implement optimized resource requests and limits across your workloads.
- Automate cost management and performance monitoring.
- Improve overall cluster performance, stability, and reliability.
Let's turn insights into savings! Connect with me on LinkedIn to discuss how I can help your team.
This project is licensed under the MIT License. See the LICENSE file for details.
