Skip to content

fkaute7BD/k8s-resource-overhead-grafana

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Kubernetes Resource Optimization Dashboard for Grafana

Stop overspending on Kubernetes! This Grafana dashboard helps you visualize and optimize resource allocation in your Kubernetes cluster, turning wasteful spending into measurable cost savings.

It provides a clear, actionable overview of the discrepancy between requested resources and actual usage, allowing you to identify both over-provisioned (wasteful) and under-provisioned (at-risk) workloads.

This dashboard was developed to solve the common problem of high requested resource usage triggering unnecessary alerts and ballooning cloud bills, while actual usage remains low. Gain the transparency you need to make informed optimization decisions.

image


📊 Features & Benefits

  • Direct Cost Savings: Quickly identify and reduce wasted CPU and Memory resources to lower your cloud bills.
  • Performance Stability: Proactively detect under-provisioned pods that are at risk of throttling or eviction, ensuring application stability.
  • Namespace-level Overview: High-level graphs comparing CPU and Memory requests vs. actual usage for each namespace, providing a holistic view of efficiency.
  • Top 10 Wasteful Pods: Prioritized tables that pinpoint the exact pods wasting the most CPU and Memory in absolute terms, enabling targeted optimization.
  • Percentage-Based Waste Calculation: An intuitive column showing the percentage of requested resources being wasted, making it easy to spot the most inefficient workloads at a glance.
  • Dynamic Namespace Filter: A convenient dropdown menu allows you to filter the entire dashboard to focus on specific namespaces or analyze the entire cluster.

🛠️ Panels Explained

The dashboard contains four key panels designed to guide your optimization efforts:

CPU/Memory Usage vs. Requests by Namespace (Time-series Graphs)

These graphs provide a high-level, historical view of your cluster's efficiency trends.

  • Request Line (Often High, Flat): Represents the total amount of CPU/Memory reserved by all pods within a namespace. This is what you're paying for.
  • Usage Lines (Lower, Fluctuating): Shows the actual, real-time CPU/Memory consumption.
  • Key Insight: The significant gap between the request and usage lines visually represents the total amount of wasted resources for that namespace over time.

Top 10 Pods with Wasted CPU/Memory (Tables)

These tables are your primary tool for taking immediate, actionable steps to optimize your cluster.

Column Description
Pod The name of the Kubernetes pod.
Namespace The Kubernetes namespace the pod belongs to.
Request The amount of CPU/Memory the pod has guaranteed (requested).
Usage The actual amount of CPU/Memory the pod is consuming at this moment.
Waste The absolute difference between Request and Usage (Request - Usage). A positive value indicates over-provisioning; a negative value indicates under-provisioning.
Wasted % The relative waste calculated as (Waste / Request) * 100%. This is the most important column for prioritization, highlighting the percentage of requested resources that are unused.

How to Interpret "Wasted %"

  • ✅ Positive Waste (e.g., +91%): Over-provisioned. The pod is reserving far more resources than it actually uses.
    • Action: This is a direct cost-saving opportunity. Decrease the pod's resource request to align with actual usage.
  • ⚠️ Negative Waste (e.g., -177%): Under-provisioned. The pod is using significantly more resources than it requested (often referred to as "bursting").
    • Action: This is a stability risk. Increase the pod's resource request to prevent CPU throttling, memory OOMKills, or pod eviction under load, ensuring consistent application performance.

📋 Prerequisites

For this dashboard to function correctly, your Kubernetes environment must be properly configured with:

  • Grafana: Version 9.0 or newer.
  • Prometheus: A Prometheus data source configured in Grafana, actively scraping metrics from your Kubernetes cluster.
  • kube-state-metrics: Must be deployed in your cluster and exposing metrics to Prometheus, specifically kube_pod_container_resource_requests and kube_pod_info.
  • cAdvisor/kubelet Metrics: Must be exposing container usage metrics to Prometheus, such as container_cpu_usage_seconds_total and container_memory_working_set_bytes.

🚀 Installation Steps

  1. Download: Copy the entire contents of the dashboard.json file from this repository.
  2. Import in Grafana: In your Grafana instance, navigate to Dashboards -> Import.
  3. Paste & Load: Paste the copied JSON model into the text area. Click Load.
  4. Select Data Source: Choose your Prometheus data source from the dropdown menu.
  5. Import: Click Import.

⚙️ Configuration (Namespace Variable)

The dashboard utilizes a dynamic namespace filter for focused analysis. Ensure the following dashboard variable is configured correctly:

  • Name: namespace
  • Type: Query
  • Query Options:
    • Query Type: Query result
    • Query: count by (namespace) (kube_pod_info)
    • Regex: /\"([^\"]+)\"/ (This extracts the namespace names from the query result.)
  • Selection Options:
    • Enable Multi-value: true
    • Enable Include All option: true
    • Set Custom all value: .*
  • Post-Import Check: After importing, it's recommended to go to Dashboard settings -> Variables to verify that this variable is correctly configured for your environment. Remember to ensure all panel queries use namespace=~"${namespace:regex}" to support the multi-select and "All" options.

🤝 Need further optimization or hands-on implementation?

This dashboard is a powerful tool for identifying resource waste and inefficiency. However, implementing the necessary changes, refining your resource requests/limits, and building a truly cost-optimized and stable Kubernetes environment can be complex.

If your organization requires deeper insights, customized monitoring solutions, or hands-on assistance to implement the identified optimizations and achieve significant, measurable cloud cost reductions, I'm available for freelance consulting engagements.

As a CKA-certified Cloud/DevOps Engineer specializing in Kubernetes efficiency and observability, I can help you:

  • Perform in-depth Kubernetes cost audits and identify precise saving opportunities.
  • Develop tailored Grafana dashboards and monitoring solutions for your specific needs.
  • Implement optimized resource requests and limits across your workloads.
  • Automate cost management and performance monitoring.
  • Improve overall cluster performance, stability, and reliability.

Let's turn insights into savings! Connect with me on LinkedIn to discuss how I can help your team.


📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Optimize your Kubernetes cloud spend! This Grafana dashboard provides clear visibility into resource requests vs. actual usage, helping you pinpoint and eliminate costly over-provisioning and risky under-provisioning. Dive in to boost efficiency and save money in your K8s clusters.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors