Skip to content

Commit 702573f

Browse files
authored
Merge pull request #166 from ramachandranravi/remove_expose_cgroup_metrics
removed_cgroupfs_from_kepler_doc
2 parents 293c205 + 3f4bb00 commit 702573f

File tree

8 files changed

+13
-59
lines changed

8 files changed

+13
-59
lines changed

docs/design/metrics.md

-21
Original file line numberDiff line numberDiff line change
@@ -116,27 +116,6 @@ All the metrics specific to the Kepler Exporter are prefixed with `kepler`.
116116
!!! note
117117
You can enable/disable expose of those metrics through `expose-hardware-counter-metrics` Kepler execution option or `EXPOSE_HW_COUNTER_METRICS` environment value.
118118

119-
### cGroups Metrics
120-
121-
- **kepler_container_cgroupfs_cpu_usage_us_total**
122-
123-
This measures the total CPU time used by the container reading from cGroups stat.
124-
125-
- **kepler_container_cgroupfs_memory_usage_bytes_total**
126-
127-
This measures the total memory in bytes used by the container reading from cGroups stat.
128-
129-
- **kepler_container_cgroupfs_system_cpu_usage_us_total**
130-
131-
This measures the total CPU time in kernel space used by the container reading from cGroups stat.
132-
133-
- **kepler_container_cgroupfs_user_cpu_usage_us_total**
134-
135-
This measures the total CPU time in userspace used by the container reading from cGroups stat.
136-
137-
!!! note
138-
You can enable/disable expose of those metrics through `EXPOSE_CGROUP_METRICS` environment value.
139-
140119
### IRQ Metrics
141120

142121
- **kepler_container_bpf_net_tx_irq_total**

docs/hardwareengagement/index.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,12 @@ Currently, we use power consumption API as RAPL or ACPI.
2424
For some of the devices, you may need to find your own way to get power consumption, and implement in golang for Kepler usage.
2525
For further plan, please ref [here](https://github.com/sustainable-computing-io/kepler/issues/644)
2626

27-
### eBPF/cgroup data
27+
### eBPF data
2828

29-
Currently, we relays on eBPF and cgroup to characterization a process/pod. Hence, you can ref to our dependency as BCC or cgroup. To test those golang package works well on your device.
29+
Currently, we rely on eBPF to obtain key cpu, irq and perf information about a process.
30+
Hence, refer to the documentation of [cilium/ebpf](https://github.com/cilium/ebpf) to test whether these Go packages work well on your device.
31+
32+
Please let us know if you need any further adjustments!
3033

3134
## Stage 1 Integration with ratio
3235

@@ -39,7 +42,7 @@ You should know the scope of the Power consumption API. How many API do you have
3942
### Interval
4043

4144
You should know the intervals of the Power consumption API.
42-
As Kepler collects eBPF and cgroup data in each 3s by default, you should know the interval and make them in same time slot.
45+
As Kepler collects eBPF data in each 3s by default, you should know the interval and make them in same time slot.
4346

4447
### Verify
4548

docs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Kepler (Kubernetes-based Efficient Power Level Exporter) is a Prometheus exporter. It uses eBPF to probe CPU performance counters and Linux kernel tracepoints.
44

5-
These data and stats from cgroup and sysfs can then be fed into ML models to estimate energy consumption by Pods.
5+
These data and stats from sysfs can then be fed into ML models to estimate energy consumption by Pods.
66

77
Check out the project on GitHub ➡️ [Kepler](https://github.com/sustainable-computing-io/kepler).
88

docs/kepler_model_server/pipeline.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,12 @@ for each defined resource utilization metric group as below.
4343
Group Name|Features|Kepler Metric Source(s)
4444
---|---|---
4545
CounterOnly|COUNTER_FEATURES|[Hardware Counter](../design/metrics.md#hardware-counter-metrics)
46-
CgroupOnly|CGROUP_FEATURES|[cGroups](../design/metrics.md#cgroups-metrics)
4746
BPFOnly|BPF_FEATURES|[BPF](../design/metrics.md#base-metric)
4847
IRQOnly|IRQ_FEATURES|[IRQ](../design/metrics.md#irq-metrics)
4948
AcceleratorOnly|ACCELERATOR_FEATURES|[Accelerator](../design/metrics.md#Accelerator-metrics)
5049
CounterIRQCombined|COUNTER_FEATURES, IRQ_FEATURES|BPF and Hardware Counter
51-
Basic|COUNTER_FEATURES, CGROUP_FEATURES, BPF_FEATURES|All except IRQ and node information
52-
WorkloadOnly|COUNTER_FEATURES, CGROUP_FEATURES, BPF_FEATURES, IRQ_FEATURES, ACCELERATOR_FEATURES|All except node information
50+
Basic|COUNTER_FEATURES, BPF_FEATURES|All except IRQ and node information
51+
WorkloadOnly|COUNTER_FEATURES, BPF_FEATURES, IRQ_FEATURES, ACCELERATOR_FEATURES|All except node information
5352
Full|WORKLOAD_FEATURES, SYSTEM_FEATURES|All
5453

5554
Node information refers to value from [kepler_node_info](../design/metrics.md#kepler-metrics-for-node-information)

docs/usage/deep_dive.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ Kepler, Kubernetes-based Efficient Power Level Exporter, offers a way to estimat
1212

1313
Kepler uses the following to collects power data:
1414

15-
#### EBPF, Hardware Counters, cGroups
15+
#### EBPF, Hardware Counters
1616

17-
Kepler can utilize a BPF program integrated into the kernel's pathway to extract process-related resource utilization metrics or use metrics from Hardware Counters or cGroups.
17+
Kepler can utilize a BPF program integrated into the kernel's pathway to extract process-related resource utilization metrics or use metrics from Hardware Counters.
1818
The type of metrics used to build the model can differ based on the system's environment.
19-
For example, it might use hardware counters, or metrics from tools like eBPF or cGroups, depending on what is available in the system that will use the model.
19+
For example, it might use hardware counters, or metrics from tools like eBPF, depending on what is available in the system that will use the model.
2020

2121
#### Real-time Component Power Meters
2222

@@ -44,7 +44,7 @@ When creating the power model, the Model Server uses a regression algorithm. It
4444

4545
Once trained, the Model Server makes these models accessible through a github repository, where any Kepler deployment can download the model from.
4646
Kepler then uses these models to calculate how much power a node (VM) consumes based on the way its resources are being used. The type of metrics used to build the model can differ based on the system's environment.
47-
For example, it might use hardware counters, or metrics from tools like eBPF or cGroups, depending on what is available in the system that will use the model.
47+
For example, it might use hardware counters, or metrics from tools like eBPF, depending on what is available in the system that will use the model.
4848

4949
![Power model training](../fig/power_model_training.jpg)
5050

docs/usage/general_config.md

-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ This is a list of configurable values of Kepler System. The configuration can be
2727
|Model Server Pod Environment (INITIAL_MODEL_NAMES.[`MODEL_TYPE`])|model-server.[`MODEL_TYPE`]|Name of default pipeline for each model type|-|
2828
|***CollectMetric CR*** (single item: default)||||
2929
|Kepler DaemonSet Environment (COUNTER_METRICS)|counter|List of performance metrics to enable from counter source| * (enable all available metrics from counter source)|
30-
|Kepler DaemonSet Environment (CGROUP_METRICS)|cgroup|List of performance metrics to enable from cgroup source| * (enable all available metrics from cgroup source)|
3130
|Kepler DaemonSet Environment (BPF_METRICS)|bpf|List of performance metrics to enable from bpf (aka. eBPF) source| * (enable all available metrics from bpf source)|
3231
|Kepler DaemonSet Environment (GPU_METRICS)|gpu|List of performance metrics to enable from gpu source| * (enable all available metrics from gpu source)|
3332
|***ExportMetric CR*** (single item: default)||||

docs/usage/kepler_daemon.md

-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ To set environments by ConfigMap:
1515
data:
1616
MODEL_SERVER_ENABLE: true
1717
COUNTER_METRICS: '*'
18-
CGROUP_METRICS: '*'
1918
BPF_METRICS: '*'
2019
# KUBELET_METRICS: ''
2120
# GPU_METRICS: ''

docs/usage/trouble_shooting.md

-25
Original file line numberDiff line numberDiff line change
@@ -28,28 +28,3 @@ apt install linux-headers-$(uname -r)
2828
```
2929

3030
On OpenShift, install the MachineConfiguration [here](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)
31-
32-
## Kepler energy metrics are zeroes
33-
34-
<!-- markdownlint-disable MD024 -->
35-
### Background
36-
37-
Kepler uses RAPL counters on x86 platforms to read energy consumption.
38-
VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained
39-
ML models. The models use either hardware performance counters or cGroup stats to estimate energy
40-
consumed by processes. Currently the cGroup based models use cGroup v2 features such as
41-
`cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`,
42-
`cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`.
43-
44-
### Diagnose
45-
46-
The Kepler metrics are zeroes, check if cGroup version on the node:
47-
48-
```bash
49-
ls /sys/fs/cgroup/cgroup.controllers
50-
```
51-
52-
### Solution
53-
<!-- markdownlint-enable MD024 -->
54-
55-
Enable cGroup v2 on the node by following [these Kubernetes instruction](https://kubernetes.io/docs/concepts/architecture/cgroups/).

0 commit comments

Comments
 (0)