Merge pull request #166 from ramachandranravi/remove_expose_cgroup_metrics

sthaha · web-flow · commit 702573fa8d50 · 2024-11-13T17:46:54.000+10:00
removed_cgroupfs_from_kepler_doc
diff --git a/docs/design/metrics.md b/docs/design/metrics.md
@@ -116,27 +116,6 @@ All the metrics specific to the Kepler Exporter are prefixed with `kepler`.
 !!! note
     You can enable/disable expose of those metrics through `expose-hardware-counter-metrics` Kepler execution option or `EXPOSE_HW_COUNTER_METRICS` environment value.
 
-### cGroups Metrics
-
-- **kepler_container_cgroupfs_cpu_usage_us_total**
-
-    This measures the total CPU time used by the container reading from cGroups stat.
-
-- **kepler_container_cgroupfs_memory_usage_bytes_total**
-
-    This measures the total memory in bytes used by the container reading from cGroups stat.
-
-- **kepler_container_cgroupfs_system_cpu_usage_us_total**
-
-    This measures the total CPU time in kernel space used by the container reading from cGroups stat.
-
-- **kepler_container_cgroupfs_user_cpu_usage_us_total**
-
-    This measures the total CPU time in userspace used by the container reading from cGroups stat.
-
-!!! note
-    You can enable/disable expose of those metrics through `EXPOSE_CGROUP_METRICS` environment value.
-
 ### IRQ Metrics
 
 - **kepler_container_bpf_net_tx_irq_total**
diff --git a/docs/hardwareengagement/index.md b/docs/hardwareengagement/index.md
@@ -24,9 +24,12 @@ Currently, we use power consumption API as RAPL or ACPI.
 For some of the devices, you may need to find your own way to get power consumption, and implement in golang for Kepler usage.
 For further plan, please ref [here](https://github.com/sustainable-computing-io/kepler/issues/644)
 
-### eBPF/cgroup data
+### eBPF data
 
-Currently, we relays on eBPF and cgroup to characterization a process/pod. Hence, you can ref to our dependency as BCC or cgroup. To test those golang package works well on your device.
+Currently, we rely on eBPF to obtain key cpu, irq  and perf information about a process.
+Hence,  refer to the documentation of [cilium/ebpf](https://github.com/cilium/ebpf) to test whether these Go packages work well on your device.
+
+Please let us know if you need any further adjustments!
 
 ## Stage 1 Integration with ratio
 
@@ -39,7 +42,7 @@ You should know the scope of the Power consumption API. How many API do you have
 ### Interval
 
 You should know the intervals of the Power consumption API.
-As Kepler collects eBPF and cgroup data in each 3s by default, you should know the interval and make them in same time slot.
+As Kepler collects eBPF data in each 3s by default, you should know the interval and make them in same time slot.
 
 ### Verify
 
diff --git a/docs/index.md b/docs/index.md
@@ -2,7 +2,7 @@
 
 Kepler (Kubernetes-based Efficient Power Level Exporter) is a Prometheus exporter. It uses eBPF to probe CPU performance counters and Linux kernel tracepoints.
 
-These data and stats from cgroup and sysfs can then be fed into ML models to estimate energy consumption by Pods.
+These data and stats from sysfs can then be fed into ML models to estimate energy consumption by Pods.
 
 Check out the project on GitHub ➡️ [Kepler](https://github.com/sustainable-computing-io/kepler).
 
diff --git a/docs/kepler_model_server/pipeline.md b/docs/kepler_model_server/pipeline.md
@@ -43,13 +43,12 @@ for each defined resource utilization metric group as below.
 Group Name|Features|Kepler Metric Source(s)
 ---|---|---
 CounterOnly|COUNTER_FEATURES|[Hardware Counter](../design/metrics.md#hardware-counter-metrics)
-CgroupOnly|CGROUP_FEATURES|[cGroups](../design/metrics.md#cgroups-metrics)
 BPFOnly|BPF_FEATURES|[BPF](../design/metrics.md#base-metric)
 IRQOnly|IRQ_FEATURES|[IRQ](../design/metrics.md#irq-metrics)
 AcceleratorOnly|ACCELERATOR_FEATURES|[Accelerator](../design/metrics.md#Accelerator-metrics)
 CounterIRQCombined|COUNTER_FEATURES, IRQ_FEATURES|BPF and Hardware Counter
-Basic|COUNTER_FEATURES, CGROUP_FEATURES, BPF_FEATURES|All except IRQ and node information
-WorkloadOnly|COUNTER_FEATURES, CGROUP_FEATURES, BPF_FEATURES, IRQ_FEATURES, ACCELERATOR_FEATURES|All except node information
+Basic|COUNTER_FEATURES, BPF_FEATURES|All except IRQ and node information
+WorkloadOnly|COUNTER_FEATURES, BPF_FEATURES, IRQ_FEATURES, ACCELERATOR_FEATURES|All except node information
 Full|WORKLOAD_FEATURES, SYSTEM_FEATURES|All
 
 Node information refers to value from [kepler_node_info](../design/metrics.md#kepler-metrics-for-node-information)
diff --git a/docs/usage/deep_dive.md b/docs/usage/deep_dive.md
@@ -12,11 +12,11 @@ Kepler, Kubernetes-based Efficient Power Level Exporter, offers a way to estimat
 
 Kepler uses the following to collects power data:
 
-#### EBPF, Hardware Counters, cGroups
+#### EBPF, Hardware Counters
 
-Kepler can utilize a BPF program integrated into the kernel's pathway to extract process-related resource utilization metrics or use metrics from Hardware Counters or cGroups.
+Kepler can utilize a BPF program integrated into the kernel's pathway to extract process-related resource utilization metrics or use metrics from Hardware Counters.
 The type of metrics used to build the model can differ based on the system's environment.
-For example, it might use hardware counters, or metrics from tools like eBPF or cGroups, depending on what is available in the system that will use the model.
+For example, it might use hardware counters, or metrics from tools like eBPF, depending on what is available in the system that will use the model.
 
 #### Real-time Component Power Meters
 
@@ -44,7 +44,7 @@ When creating the power model, the Model Server uses a regression algorithm. It
 
 Once trained, the Model Server makes these models accessible through a github repository, where any Kepler deployment can download the model from.
 Kepler then uses these models to calculate how much power a node (VM) consumes based on the way its resources are being used. The type of metrics used to build the model can differ based on the system's environment.
-For example, it might use hardware counters, or metrics from tools like eBPF or cGroups, depending on what is available in the system that will use the model.
+For example, it might use hardware counters, or metrics from tools like eBPF, depending on what is available in the system that will use the model.
 
 ![Power model training](../fig/power_model_training.jpg)
 
diff --git a/docs/usage/general_config.md b/docs/usage/general_config.md
@@ -27,7 +27,6 @@ This is a list of configurable values of Kepler System. The configuration can be
 |Model Server Pod Environment (INITIAL_MODEL_NAMES.[`MODEL_TYPE`])|model-server.[`MODEL_TYPE`]|Name of default pipeline for each model type|-|
 |***CollectMetric CR*** (single item: default)||||
 |Kepler DaemonSet Environment (COUNTER_METRICS)|counter|List of performance metrics to enable from counter source| * (enable all available metrics from counter source)|
-|Kepler DaemonSet Environment (CGROUP_METRICS)|cgroup|List of performance metrics to enable from cgroup source| * (enable all available metrics from cgroup source)|
 |Kepler DaemonSet Environment (BPF_METRICS)|bpf|List of performance metrics to enable from bpf (aka. eBPF) source| * (enable all available metrics from bpf source)|
 |Kepler DaemonSet Environment (GPU_METRICS)|gpu|List of performance metrics to enable from gpu source| * (enable all available metrics from gpu source)|
 |***ExportMetric CR*** (single item: default)||||
diff --git a/docs/usage/kepler_daemon.md b/docs/usage/kepler_daemon.md
@@ -15,7 +15,6 @@ To set environments by ConfigMap:
     data:
       MODEL_SERVER_ENABLE: true
       COUNTER_METRICS: '*'
-      CGROUP_METRICS: '*'
       BPF_METRICS: '*'
       # KUBELET_METRICS: ''
       # GPU_METRICS: ''
diff --git a/docs/usage/trouble_shooting.md b/docs/usage/trouble_shooting.md
@@ -28,28 +28,3 @@ apt install linux-headers-$(uname -r)
 ```
 
 On OpenShift, install the MachineConfiguration [here](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)
-
-## Kepler energy metrics are zeroes
-
-<!-- markdownlint-disable MD024 -->
-### Background
-
-Kepler uses RAPL counters on x86 platforms to read energy consumption.
-VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained
-ML models. The models use either hardware performance counters or cGroup stats to estimate energy
-consumed by processes. Currently the cGroup based models use cGroup v2 features such as
-`cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`,
-`cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`.
-
-### Diagnose
-
-The Kepler metrics are zeroes, check if cGroup version on the node:
-
-```bash
-ls /sys/fs/cgroup/cgroup.controllers
-```
-
-### Solution
-<!-- markdownlint-enable MD024 -->
-
-Enable cGroup v2 on the node by following [these Kubernetes instruction](https://kubernetes.io/docs/concepts/architecture/cgroups/).