Skip to content

Commit 6d90144

Browse files
committed
Final slides for 20231025
1 parent 8e0cbc5 commit 6d90144

17 files changed

+357
-277
lines changed

000_introduction/02_bio.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
- <span class="fa-li"><img src="images/TraefikLabs-icon-white.svg" style="height: 1em;" /></span> [tr&aelig;fik Ambassador][5] since 2021</li>
1414
- <span class="fa-li"><i class="fa fa-briefcase"></i></span> [Haufe Group][6] since 2016
1515
- <span class="fa-li"><i class="fa fa-person-chalkboard"></i></span> Self-employed [trainer][7] since 2020
16-
- <span class="fa-li"><i class="fa fa-person-chalkboard"></i></span> Initiator/maintainer of uniget[12] since 2023
16+
- <span class="fa-li"><i class="fa fa-user-helmet-safety"></i></span> Initiator/maintainer of [uniget][12] since 2021
1717

1818
<!-- .element: class="fa-ul" style="line-height: 175%;" -->
1919

100_monitoring/prometheus/application-level.md

+22-11
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Application Level Monitoring
22

3-
XXX many apps ship with exporters
3+
Many apps ship with integrated exporters
44

5-
XXX many FOSS services have an exporter
5+
Many FOSS services have an exporter
66

7-
XXX collection just like system services
7+
Collection works just like for system services
88

9-
XXX if not, use special exporters
9+
If nothing available, use generic exporters
1010

1111
### `blackbox_exporter` [](https://github.com/prometheus/blackbox_exporter)
1212

@@ -16,13 +16,17 @@ Probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP and gRPC
1616

1717
Scraping of remote JSON by JSONPath [](https://goessner.net/articles/JsonPath/)
1818

19+
Alternative: JSON API datasource [](https://grafana.com/grafana/plugins/marcusolsson-json-datasource/)
20+
1921
---
2022

21-
## Application Location
23+
## Application on the network
24+
25+
XXX datacenters, firewalls, policies
2226

23-
XXX network
27+
Check whether scraping is possible
2428

25-
XXX datacenters, firewalls, policies, pull vs. push
29+
Otherwise push metrics to gateway:
2630

2731
### `pushgateway` [](https://github.com/prometheus/pushgateway)
2832

@@ -47,14 +51,21 @@ When resources on a node are depleted:
4751

4852
### How pods are "chosen"
4953

50-
Pods have a quiality-of-service based on resource requests and limits [](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)
54+
Pods have a quality-of-service based on resource requests and limits [](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)
5155

5256
- Best effort: All container have resource identical requests and limits
5357
- Burstable: At least one container has resource requests or limits
5458
- Guaranteed: All container do not have resource requests or limits
5559

5660
Scheduling uses resource requests to find suitable node
5761

58-
Notes:
59-
Check pods for QoS
60-
`kubectl get pods -A -o json | jq -r '.items[] | "\(.metadata.name): \(.status.qosClass)"'`
62+
---
63+
64+
## Pod Quality-of-Service
65+
66+
### Check QoS
67+
68+
```bash
69+
kubectl get pods --all-namespaces --output=json \
70+
| jq --raw-output '.items[] | "\(.metadata.name): \(.status.qosClass)"'
71+
```

100_monitoring/prometheus/cadvisor/compose.yaml

-3
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,11 @@ services:
2222
container_name: cadvisor
2323
command:
2424
- --docker="unix:///var/run/docker.sock"
25-
#- --containerd="unix:///var/run/docker/containerd/containerd.sock"
26-
#- --containerd-namespace=docker
2725
ports:
2826
- 8080:8080
2927
volumes:
3028
- /:/rootfs:ro
3129
- /var/run:/var/run:rw
32-
#- /var/run/docker/containerd/containerd.sock:/var/run/docker/containerd/containerd.sock
3330
- /sys:/sys:ro
3431
- /var/lib/docker/:/var/lib/docker:ro
3532

100_monitoring/prometheus/cluster_scraping.drawio.svg

+166-109
Loading

100_monitoring/prometheus/container.md

+23-13
Original file line numberDiff line numberDiff line change
@@ -30,21 +30,21 @@ cat "/sys/fs/cgroup/memory/docker/${ID}/memory.usage_in_bytes"
3030
3131
## Container metrics in Kubernetes
3232
33-
Remember: `kubelet` is responsible for maintaining pods/containers on a node
33+
`kubelet` is responsible for maintaining pods/containers on a node
3434
35-
kubelet offers metrics
35+
### Metrics...
3636
37-
kubelet ships with cadvisor [](https://github.com/google/cadvisor)
37+
...are offered by `kubelet` as well
3838
39-
Published under `/metrics/cadvisor/`
39+
`kubelet` ships with cadvisor [](https://github.com/google/cadvisor)
4040
41-
---
41+
Published under `/metrics/cadvisor/`
4242
43-
## Demo: cadvisor with Docker
43+
### Demo: cadvisor with Docker
4444
45-
XXX
45+
Run `cadvisor` in `compose`
4646
47-
XXX docker-exporter?
47+
XXX docker-exporter https://github.com/0xERR0R/dex
4848
4949
---
5050
@@ -72,6 +72,7 @@ kubeletctl \
7272
--token ${TOKEN} \
7373
metrics cadvisor | less
7474
```
75+
<!-- .element: style="width: 46em;" -->
7576
7677
---
7778
@@ -99,12 +100,15 @@ curl -skH "Authorization: Bearer ${TOKEN}" \
99100
"https://${IP}:10250/metrics/cadvisor" \
100101
| grep container_memory_usage_bytes | grep kube-proxy
101102
```
103+
<!-- .element: style="width: 46em;" -->
102104
103105
---
104106
105107
## OpenMetrics 1/
106108
107-
"...today's de-facto standard for transmitting cloud-native metrics at scale." [](https://openmetrics.io/)
109+
"...today's de-facto standard for transmitting cloud-native metrics at scale."
110+
111+
Specification [](https://openmetrics.io/)
108112
109113
### Types
110114
@@ -114,7 +118,7 @@ curl -skH "Authorization: Bearer ${TOKEN}" \
114118
- <span class="fa-li"><i class="fa-duotone fa-chart-column"></i></span> Histogram
115119
- <span class="fa-li"><i class="fa-duotone fa-ball-pile"></i></span> and more [](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#metric-types)
116120
117-
<!-- .element: class="fa-ul" -->
121+
<!-- .element: class="fa-ul" style="line-height: 1.5em;" -->
118122
119123
### Metadata
120124
@@ -146,12 +150,19 @@ go_goroutines 69
146150
# HELP process_cpu_seconds Total user and system CPU time spent in seconds.
147151
process_cpu_seconds_total 4.20072246e+06
148152
```
153+
<!-- .element: style="width: 47em;" -->
149154
150155
---
151156
152157
## OpenMetrics
153158
154-
Metrics in Kubernetes have labels for:
159+
Format:
160+
161+
```plaintext
162+
name{labels} value [timestamp]
163+
```
164+
165+
Labels provide context for...
155166
156167
- Namespace name
157168
- Pod name
@@ -163,7 +174,6 @@ For example:
163174
container_memory_usage_bytes{
164175
namespace="kube-system",
165176
pod="kube-proxy-68mp4",
166-
container="kube-proxy",
167-
# ...
177+
container="kube-proxy"
168178
} 1.4917632e+07 1669235346213
169179
```

100_monitoring/prometheus/grafana.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@ Grafana is the most prominent tool to query, visualize and alert on metrics
44

55
Supports many datasources including Prometheus
66

7-
Support datasource-specific query language
7+
Supports datasource-specific query languages
88

99
Prometheus community offers pre-created dashcoards [](https://github.com/kubernetes-monitoring/kubernetes-mixin)
1010

1111
### Demo
1212

1313
Quick intro to UI [](http://grafana.inmylab.de)
1414

15-
Graph for pod memory
16-
17-
Graph for pod CPU (usage)
15+
Graph for pod memory and CPU (usage)
1816

1917
Graph for node memory
2018

2119
Count running pods
20+
21+
Add variable for namespace and pod name

100_monitoring/prometheus/host.md

+16-36
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Can containers use all resources? Yes, but they should not!
44

55
Some reservations are necessary [](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/)
66

7+
Capacity must be divided between system, cluster and containers
8+
79
![](100_monitoring/prometheus/reservations.drawio.svg) <!-- .element: style="float: right; width: 40%;" -->
810

911
### Operating system
@@ -14,44 +16,25 @@ Reserved for system services
1416

1517
Reserved for cluster components
1618

17-
### Further resources
18-
19-
Instance calculator for cloud providers [](https://learnk8s.io/kubernetes-instance-calculator)
19+
### Allocatable resources
2020

21-
Read reservations from managed cluster [](https://github.com/learnk8s/kubernetes-resource-inspector)
21+
`Allocatable = Capacity - System - Kubernetes`
2222

2323
---
2424

25-
## CPU Reservations in Managed Kubernetes
25+
## Reservations in Managed Kubernetes
2626

27-
Major cloud providers agree
27+
Overview of AWS, Azure and Google Cloud [](https://learnk8s.io/allocatable-resources)
2828

29-
XXX link to docs and rules
30-
31-
| Cores | Reservation | Cumulative | Efficiency |
32-
|-------|---------------:|-----------:|-----------:|
33-
| 1 | 60m | 60m | 94.0% |
34-
| 2 | + 10m | 70m | 96.5% |
35-
| 4 | + 10m | 80m | 98.0% |
36-
| 8 | + 10m | 90m | 99.0% |
37-
38-
---
29+
Larger VMs have less overhead
3930

40-
## Memory reservations in Managed Kubernetes
31+
More VMs provide more availability
4132

42-
Most major cloud providers agree
33+
### Further reading
4334

44-
AWS uses: 255MiB + 11MiB * MAX_PODS
45-
46-
XXX link to docs and rules
35+
Instance calculator for cloud providers [](https://learnk8s.io/kubernetes-instance-calculator)
4736

48-
| Memory | Reservation | Cumulative | Efficiency |
49-
|--------|------------:|-----------:|-----------:|
50-
| 0 | 255MiB | 255MiB | |
51-
| 4GiB | + 800MiB | 1055MiB | 73.7% |
52-
| 8GiB | + 800GiB | 1855MiB | 76,8% |
53-
| 112GiB | + 672MiB | 2527MiB | 97.7% |
54-
| 128GiB | + 256MiB | 2783MiB | 97.8% |
37+
Read reservations from managed cluster [](https://github.com/learnk8s/kubernetes-resource-inspector)
5538

5639
---
5740

@@ -86,27 +69,24 @@ nodes:
8669
8770
## Host metrics collection
8871
89-
node-exporter [](https://github.com/prometheus/node_exporter) collects host metrics...
72+
`node-exporter` [](https://github.com/prometheus/node_exporter) collects host metrics...
9073

9174
...and exports them for scraping
9275

9376
Metrics [](https://github.com/prometheus/node_exporter#collectors) include CPU, memory, disk, network and a lot more!
9477

95-
Some are disabled by default [](https://github.com/prometheus/node_exporter#disabled-by-default)
78+
Some are disabled but the defaults are reasonable [](https://github.com/prometheus/node_exporter#disabled-by-default)
9679

9780
### Demo
9881

99-
Start Kubernetes API proxy:
82+
Start Kubernetes API proxy and read metrics endpoint:
10083

10184
```bash
10285
kubectl proxy
103-
```
104-
105-
Read metrics endpoint:
10686
107-
```bash
87+
H=localhost:8001
10888
NS=kube-system
10989
SVC=node-exporter-prometheus-node-exporter
110-
curl -s localhost:8001/api/v1/namespaces/${NS}/services/${SVC}:metrics/proxy/metrics \
90+
curl -s $H/api/v1/namespaces/$NS/services/$SVC:metrics/proxy/metrics \
11191
| grep node_cpu_seconds_total
11292
```
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,30 @@
1-
### `kube-state-metrics` [](https://github.com/kubernetes/kube-state-metrics)
1+
## `kube-state-metrics`
22

3-
New metrics about cluster
3+
Metrics derived from cluster and resources
44

5-
https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/
5+
Project page [](https://github.com/kubernetes/kube-state-metrics)
66

7-
`kubectl proxy`
7+
### Exposed Metrics (exerpt)
88

9-
`curl localhost:8001/api/v1/namespaces/kube-system/services/kube-state-metrics:http/proxy/metrics`
9+
For every resources:
10+
11+
- *_info
12+
- *_labels
13+
- *_annotations
14+
15+
Full list of metrics [](https://github.com/kubernetes/kube-state-metrics/tree/main/docs#exposed-metrics)
16+
17+
Very useful for joins against other metrics [](https://github.com/kubernetes/kube-state-metrics/tree/main/docs#join-metrics)
18+
19+
---
20+
21+
## Demo: `kube-state-metrics`
22+
23+
```bash
24+
kubectl proxy
25+
26+
H=localhost:8001
27+
NS=kube-system
28+
S=kube-state-metrics
29+
curl -s $H/api/v1/namespaces/$N/services/$S:http/proxy/metrics
30+
```

100_monitoring/prometheus/metrics-server.md

+8-10
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
## metrics-server [](https://github.com/kubernetes-sigs/metrics-server/)
1+
<!-- .slide: data-transition="fade" -->
22

3-
Provides an API for metrics collected by kubelet
3+
## metrics-server [](https://github.com/kubernetes-sigs/metrics-server/)
44

5-
Required for `kubectl top`
5+
Provides an API for metrics collected by `kubelet`/`cadvisor`
66

7-
Required for Horizontal/Vertical Pod AutoScaler
7+
Required for `kubectl top` and Horizontal/Vertical Pod AutoScaler
88

99
### Demo 1/
1010

@@ -22,15 +22,13 @@ kubectl top pod
2222

2323
---
2424

25-
## metrics-server [](https://github.com/kubernetes-sigs/metrics-server/)
25+
<!-- .slide: data-transition="fade" -->
2626

27-
Provides an API for metrics collected by kubelet
28-
29-
Builds on cadvisor (XXX link?)
27+
## metrics-server [](https://github.com/kubernetes-sigs/metrics-server/)
3028

31-
Required for `kubectl top`
29+
Provides an API for metrics collected by `kubelet`/`cadvisor`
3230

33-
Required for Horizontal Pod AutoScaler
31+
Required for `kubectl top` and Horizontal/Vertical Pod AutoScaler
3432

3533
### Demo 2/2
3634

100_monitoring/prometheus/metrics.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,15 @@ How metrics can be collected...
1212

1313
![](100_monitoring/prometheus/push.drawio.svg) <!-- .element: style="width: 45%; float: right;" -->
1414

15-
### Push <i class="fa-duotone fa-truck"></i>
15+
### Push <i class="fa-duotone fa-person-dolly"></i>
1616

1717
Metrics are delivered to database
1818

1919
Usually involves an agent
2020

2121
Example: Telegraf agent and InfluxDB
2222

23-
### Pull <i class="fa-duotone fa-hand-holding-heart"></i>
23+
### Pull <i class="fa-duotone fa-cart-shopping"></i>
2424

2525
![](100_monitoring/prometheus/pull.drawio.svg) <!-- .element: style="width:45%; float: right;" -->
2626

0 commit comments

Comments
 (0)