You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The predictive controller ingests coefficients from `config/controller_coeffs.json` (or the path provided via `--coeff-path`). The file is a JSON object with the following fields:
4
+
5
+
| Field | Type | Required | Description |
6
+
| --- | --- | --- | --- |
7
+
|`bias`| number | Yes | Constant term applied to the forecast (millicelsius). |
8
+
|`ar_temperature`| array<number> | Yes | Auto-regressive coefficients applied to historical package temperatures. The array length determines the minimum history window. |
9
+
|`ratio`| array<number> | No | Coefficients applied to historical SIMD ratio measurements (milli-units). |
10
+
|`trimmed_ratio`| array<number> | No | Coefficients applied to the trimmed ratio (if available). |
11
+
|`severity`| array<number> | No | Coefficients applied to the severity metric reported in telemetry (milli-units). |
12
+
|`ma`| number | No | Moving-average gain applied to the most recent residual (`actual - forecast`). |
13
+
|`staleness_window_ms`| number | No | Maximum age (in milliseconds) of telemetry used for prediction. Defaults to 500 ms. |
14
+
15
+
Example:
16
+
17
+
```json
18
+
{
19
+
"bias": 1200.0,
20
+
"ar_temperature": [0.85, 0.05],
21
+
"ratio": [-0.30],
22
+
"severity": [0.04],
23
+
"ma": 0.25,
24
+
"staleness_window_ms": 750
25
+
}
26
+
```
27
+
28
+
## Hot Reload Workflow
29
+
30
+
1. Update the JSON file on disk (e.g., write a new revision into the ConfigMap or local path).
31
+
2. Send `SIGHUP` to the dispatcher process. The controller marks a reload for the next control tick.
32
+
3. On the following recommendation cycle, the controller attempts to parse the file:
33
+
- Success increments `predictive_coeff_reload_total` and logs an INFO entry with the new history window and staleness guard.
34
+
- Failure increments `predictive_coeff_reload_errors_total`, logs an ERROR entry, and falls back to the previous coefficients or averaging forecast.
35
+
36
+
## Validation Tips
37
+
38
+
- Use `tests/policy/test_arx_model.cpp` as a reference for crafting deterministic coefficients during development.
39
+
- Monitor `predictive_abs_error_millic_total` to evaluate how well the updated coefficients track observed temperatures.
40
+
- Pair coefficient adjustments with updates to [Predictive Controller](predictive-controller.md) documentation to keep operational guidance in sync.
Copy file name to clipboardExpand all lines: docs/predictive-controller.md
+19-11Lines changed: 19 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,17 +18,24 @@ The predictive controller combines reactive thermal throttling with a short-hori
18
18
Each signal is tagged with a monotonic timestamp. Stale signals (>2 intervals) are discarded and treated as unavailable.
19
19
20
20
## Forecast Model
21
-
The controller uses a single-step ARX model:
21
+
The controller uses a single-step ARX/ARMAX model implemented in `src/policy/arx_model.cpp` and driven by coefficients stored in `config/controller_coeffs.json` (see [Controller Coefficients](controller_coeffs.md)). The model consumes a sliding window of recent telemetry samples and projects the next package temperature in millicelsius:
- Coefficients `a1..a4` are calibrated offline using lab traces and stored in `config/controller_coeffs.json`.
28
-
- The bias `a0` compensates for ambient temperature.
29
-
- Missing inputs zero out their coefficients and raise the `predictive_input_gaps` metric.
31
+
-`φᵢ`, `θᵢ`, and `γᵢ` are configurable auto-regressive and exogenous coefficients.
32
+
-`ψ` is an optional moving-average gain applied to the most recent residual `ε[t] = T[t] - T̂[t]`.
33
+
- Missing temperature samples disable the prediction path and fall back to a simple moving average.
34
+
- Coefficient files support hot-reload: the controller listens for `SIGHUP` and re-reads `config/controller_coeffs.json` on the next control tick. Successful reloads and failures are logged and exported via metrics.
30
35
31
-
The forecast produces a projected temperature and CPI value under the current SIMD width. The controller evaluates transitions (`SSE4.1`, `AVX2`, `AVX-512`) and selects the highest width whose projected temperature remains below `temp_ceiling_c - safety_margin_c` and whose CPI ratio is under `up_ratio`.
36
+
Telemetry freshness is enforced prior to forecasting. If the latest sample exceeds the configured `staleness_window_ms`, the controller skips predictive evaluation, logs a warning, and records `predictive_stale_samples_total`.
37
+
38
+
The forecast produces a projected temperature under the current SIMD width. The controller evaluates transitions (`SSE4.1`, `AVX2`, `AVX-512`) and selects the highest width whose projected temperature remains below `temp_ceiling_c - safety_margin_c` and whose CPI ratio is under `up_ratio`.
32
39
33
40
## Decision Pipeline
34
41
1.**Acquire Inputs:** Pull the latest telemetry fusion snapshot (all `TelemetrySnapshot` values share a generation number).
@@ -52,11 +59,12 @@ The forecast produces a projected temperature and CPI value under the current SI
52
59
|`--predictive-alpha`| EWMA alpha applied to CPI history. | 0.25 |
53
60
54
61
## Telemetry & Metrics
55
-
-`predictive_forecasts_total`: incremented each control tick.
56
-
-`predictive_downgrades_total`: decision to reduce SIMD width due to forecast.
57
-
-`predictive_input_gaps_total`: missing telemetry inputs for a tick.
0 commit comments