Skip to content

Commit 2538977

Browse files
authored
Merge pull request #186 from chetanyb/check-http-metrics
feat: add check_http_metrics task for Prometheus metrics assertions
2 parents 3c3eb56 + ae254d4 commit 2538977

6 files changed

Lines changed: 2900 additions & 4 deletions

File tree

go.mod

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ go 1.25.7
44

55
require (
66
github.com/donovanhide/eventsource v0.0.0-20210830082556-c59027999da0
7+
github.com/dustin/go-humanize v1.0.1
78
github.com/ethereum/go-ethereum v1.17.3
89
github.com/ethpandaops/ethwallclock v0.4.0
910
github.com/ethpandaops/go-eth2-client v0.1.2
@@ -23,6 +24,8 @@ require (
2324
github.com/mashingan/smapping v0.1.19
2425
github.com/pressly/goose/v3 v3.27.1
2526
github.com/prometheus/client_golang v1.23.2
27+
github.com/prometheus/client_model v0.6.2
28+
github.com/prometheus/common v0.66.1
2629
github.com/protolambda/zrnt v0.34.1
2730
github.com/protolambda/ztyp v0.2.2
2831
github.com/prysmaticlabs/go-bitfield v0.0.0-20240618144021-706c95b2dd15
@@ -38,6 +41,7 @@ require (
3841
github.com/wealdtech/go-eth2-types/v2 v2.8.2
3942
github.com/wealdtech/go-eth2-util v1.8.2
4043
golang.org/x/text v0.37.0
44+
google.golang.org/protobuf v1.36.11
4145
gopkg.in/yaml.v2 v2.4.0
4246
gopkg.in/yaml.v3 v3.0.1
4347
)
@@ -58,7 +62,6 @@ require (
5862
github.com/crate-crypto/go-eth-kzg v1.5.0 // indirect
5963
github.com/deckarep/golang-set/v2 v2.6.0 // indirect
6064
github.com/decred/dcrd/dcrec/secp256k1/v4 v4.3.0 // indirect
61-
github.com/dustin/go-humanize v1.0.1 // indirect
6265
github.com/emicklei/dot v1.6.4 // indirect
6366
github.com/ethereum/c-kzg-4844/v2 v2.1.6 // indirect
6467
github.com/ferranbt/fastssz v0.1.4 // indirect
@@ -90,8 +93,6 @@ require (
9093
github.com/pk910/dynamic-ssz v1.3.2-0.20260505131440-111bcb265c8f // indirect
9194
github.com/pk910/hashtree-bindings v0.1.0 // indirect
9295
github.com/pkg/errors v0.9.1 // indirect
93-
github.com/prometheus/client_model v0.6.2 // indirect
94-
github.com/prometheus/common v0.66.1 // indirect
9596
github.com/prometheus/procfs v0.20.1 // indirect
9697
github.com/protolambda/bls12-381-util v0.1.0 // indirect
9798
github.com/r3labs/sse/v2 v2.10.0 // indirect
@@ -118,7 +119,6 @@ require (
118119
golang.org/x/sys v0.43.0 // indirect
119120
golang.org/x/time v0.15.0 // indirect
120121
golang.org/x/tools v0.44.0 // indirect
121-
google.golang.org/protobuf v1.36.11 // indirect
122122
gopkg.in/cenkalti/backoff.v1 v1.1.0 // indirect
123123
modernc.org/libc v1.72.1 // indirect
124124
modernc.org/mathutil v1.7.1 // indirect
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
## `check_http_metrics` Task
2+
3+
### Description
4+
The `check_http_metrics` task fetches metrics from an HTTP Prometheus endpoint and evaluates assertions against metric values.
5+
6+
#### Task Behavior
7+
- The task polls the metrics endpoint at regular intervals.
8+
- By default, the task returns immediately when all assertions pass.
9+
- Use `continueOnPass: true` to keep monitoring even after success.
10+
- Use `failOnCheckMiss: true` to fail immediately when assertions are not met.
11+
12+
### Configuration Parameters
13+
14+
- **`url`**:\
15+
HTTP URL of the Prometheus metrics endpoint. Required.
16+
17+
- **`headers`**:\
18+
Optional HTTP request headers (e.g., for authentication). Default: `{}`.
19+
20+
- **`pollInterval`**:\
21+
Interval between metric scrapes. Default: `10s`.
22+
23+
- **`requestTimeout`**:\
24+
Timeout for a single HTTP request. Default: `5s`.
25+
26+
- **`maxResponseSize`**:\
27+
Maximum response body size. Must be positive. Default: `10MB`.
28+
29+
- **`failOnCheckMiss`**:\
30+
If `true`, fail immediately when assertions are not met. If `false`, keep polling until timeout or success. Default: `false`.
31+
32+
- **`continueOnPass`**:\
33+
If `true`, continue checking after all assertions pass. Default: `false`.
34+
35+
- **`missingMetric`**:\
36+
Behavior when a metric family is missing: `wait`, `fail`, or `pass`. Default: `wait`.
37+
38+
- **`missingSeries`**:\
39+
Behavior when no time series matches the label selector: `wait`, `fail`, or `pass`. Default: `wait`.
40+
41+
- **`resetBehavior`**:\
42+
Behavior when a COUNTER metric's value drops below baseline (indicating restart): `fail`, `rebaseline`, or `ignore`. Only applies to COUNTER type metrics. Default: `fail`.
43+
44+
- **`assertions`**:\
45+
List of metric assertions. At least one required.
46+
47+
#### Assertion Configuration
48+
49+
- **`name`**: Unique assertion name. Required.
50+
- **`metric`**: Prometheus metric name. Required.
51+
- **`labels`**: Label selector (subset matching). Must match exactly one series.
52+
- **`mode`**: `value` (current value) or `delta` (change since baseline). Default: `value`.
53+
- **`operator`**: Comparison operator: `eq`, `neq`, `gt`, `gte`, `lt`, `lte`. Required.
54+
- **`value`**: Expected numeric value. Required.
55+
- **`missingMetric`**: Per-assertion override for global `missingMetric`.
56+
- **`missingSeries`**: Per-assertion override for global `missingSeries`.
57+
58+
#### Delta Mode
59+
60+
In `delta` mode, the task tracks changes over time:
61+
1. First scrape: records the current value as baseline (waits, does not evaluate)
62+
2. Subsequent scrapes: computes `delta = current - baseline` and evaluates
63+
64+
Negative deltas are valid for GAUGE and UNTYPED metrics. For COUNTER metrics, a decrease triggers `resetBehavior`.
65+
66+
#### Examples
67+
68+
```yaml
69+
assertions:
70+
# Check counter increased by at least 1
71+
- name: counter_increased
72+
metric: my_counter
73+
labels:
74+
env: prod
75+
mode: delta
76+
operator: gte
77+
value: 1
78+
79+
# Check gauge decreased (negative delta)
80+
- name: gauge_dropped
81+
metric: my_gauge
82+
mode: delta
83+
operator: lte
84+
value: -1
85+
86+
# Check current value is above threshold
87+
- name: value_above_threshold
88+
metric: my_metric
89+
operator: gt
90+
value: 100
91+
```
92+
93+
#### Metric Type Handling
94+
95+
| Type | Value Extracted |
96+
|------|-----------------|
97+
| COUNTER | Counter value |
98+
| GAUGE | Gauge value |
99+
| UNTYPED | Untyped value |
100+
| SUMMARY | Sample sum |
101+
| HISTOGRAM | Sample sum |
102+
103+
Counter reset detection only applies to COUNTER type. SUMMARY and HISTOGRAM use sample sum; bucket/quantile helpers are not supported.
104+
105+
### Outputs
106+
107+
- **`passedAssertions`**: Array of assertion names that passed.
108+
- **`failedAssertions`**: Array of assertion names that failed.
109+
- **`values`**: Map of assertion name to latest observed value.
110+
- **`deltas`**: Map of assertion name to computed delta (for `delta` mode).
111+
- **`baselines`**: Map of assertion name to baseline value (for `delta` mode).
112+
- **`scrapeErrors`**: Number of HTTP/parsing errors.
113+
- **`assertionErrors`**: Number of assertion evaluation errors.
114+
115+
### Defaults
116+
117+
```yaml
118+
- name: check_http_metrics
119+
config:
120+
url: ""
121+
headers: {}
122+
pollInterval: 10s
123+
requestTimeout: 5s
124+
maxResponseSize: 10MB
125+
failOnCheckMiss: false
126+
continueOnPass: false
127+
missingMetric: wait
128+
missingSeries: wait
129+
resetBehavior: fail
130+
assertions: []
131+
```

0 commit comments

Comments
 (0)