Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ See dedicated docs for subsystem details:
- [Sandbox Workflow](docs/sandbox-workflow.md)

## Tests
Refer to the [Validation Matrix](docs/testing-matrix.md) for a subsystem → coverage breakdown.

Run smoke tests (build + basic run):
```bash
tests/compile.sh && tests/smoke.sh
Expand All @@ -105,13 +107,16 @@ ci/hw-smoke.sh

CI expectations:
- `.github/workflows/ci.yml` runs the public GitHub Actions pipeline (configure, build, unit and integration tests).
- `ci/pipeline.yml` runs the default lint/build/test stages used by the OSS mirror.
- `ci/hw-smoke.sh` executes on bare metal to verify MSR/perf integration and metrics TLS.
- `ci/pipeline.yml` orchestrates build, `hardware-smoke`, `stress-suite`, and `thermal-soak` hardware stages described in the [Validation Matrix](docs/testing-matrix.md).
- `ci/hw-smoke.sh` executes on bare metal to verify MSR/perf integration and metrics TLS (see [`docs/ci-hil.md`](docs/ci-hil.md) for provisioning guidance).

> **Infrastructure requirement**
> Hardware-in-the-loop stages are pinned to runners tagged `hil` and `avx512`. Ensure this fleet is online before expecting counter/MSR regressions to surface automatically.
> **Note**
> Historical documentation referenced `ci/security.yml` and `ci/sandbox.yml` for supply-chain and fuzzing coverage. Those
> workflows are not currently part of this repository. Security attestation validation and sandbox fuzzing remain roadmap
> items and should be treated as future work until corresponding workflows land.
> Security attestation and sandbox fuzzing now run via `ci/security.yml` and `ci/sandbox.yml`. These jobs require
> dedicated credentials/runners and currently fail open, so release reviews must still confirm the checklists
> documented in [`docs/testing-matrix.md`](docs/testing-matrix.md#security--sandbox-coverage) before promotion.
## Packaging

Expand Down
4 changes: 2 additions & 2 deletions docs/predictive-controller.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,6 @@ Metrics are exposed through the metrics subsystem documented in [Metrics Endpoin
| Forecast divergence | `forecast_temp` deviates > `forecast_residual_threshold` for N intervals | Auto-revert to reactive controller and set `controller_state=reactive` until manual intervention. |

## Testing Strategy
- `tests/controller_forecast_test.cpp` validates coefficient application and dwell logic.
- `tests/policy/test_policy_controller.c` validates coefficient application, dwell logic, and emergency fallbacks.
- Integration tests under `tests/smoke.sh` run with synthetic telemetry via `--health-check` to verify downgrades.
- CI pipeline (see README) executes these tests on every merge to `main`.
- CI pipeline (see README) executes these tests on every merge to `main` and is summarized in the [Validation Matrix](testing-matrix.md).
5 changes: 3 additions & 2 deletions docs/telemetry-fusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Metrics include:
See [Metrics Endpoints](metrics-endpoints.md) for export details.

## Testing
- Unit tests under `tests/telemetry_fusion_test.cpp` mock sensor inputs and verify normalization.
- Hardware-in-the-loop CI job (`ci/hil.md`) validates sensor integration on nightly runs.
- Unit tests under `tests/telemetry/test_telemetry.cpp` mock sensor inputs and verify normalization.
- Hardware-in-the-loop CI job (see [`docs/ci-hil.md`](ci-hil.md)) validates sensor integration on nightly runs when AVX-512 runners tagged `hil` are available.
- `tests/smoke.sh` exercises the telemetry pipeline using the sandbox workflow.
- Coverage ownership for this subsystem is tracked in the [Validation Matrix](testing-matrix.md).
67 changes: 67 additions & 0 deletions docs/testing-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Validation Matrix

This document maps the dispatcher subsystems to automated coverage and operator checklists so production rollouts can rely on deterministic guardrails instead of tribal knowledge.

## Unit Tests

| Subsystem | Test Binary | Path | Notes |
| --- | --- | --- | --- |
| Dispatcher core (trampoline wiring, downgrade paths) | `test_thermal_simd` | [`tests/test_thermal_simd.c`](../tests/test_thermal_simd.c) | Exercises scalar/SIMD transitions, W^X enforcement, and failure escalation. |
| Predictive controller | `test_policy_controller` | [`tests/policy/test_policy_controller.c`](../tests/policy/test_policy_controller.c) | Validates dwell timers, emergency fallbacks, and coefficient reload behavior. |
| Telemetry fusion | `test_telemetry` | [`tests/telemetry/test_telemetry.cpp`](../tests/telemetry/test_telemetry.cpp) | Mocks perf/MSR inputs to ensure normalization, staleness guards, and degraded flags. |
| Config parsing | `test_config_parser` | [`tests/test_config_parser.c`](../tests/test_config_parser.c) | Confirms CLI/env precedence and rejects malformed telemetry/controller overrides. |
| Statistics helpers | `test_statistics` | [`tests/test_statistics.c`](../tests/test_statistics.c) | Guards percentile and EWMA helpers used by controller heuristics. |

All unit binaries build via `cmake --build build --target test_thermal_simd test_policy_controller test_telemetry ...` and execute under `ctest` when `BUILD_TESTING=ON`.

## Integration & Smoke

| Scenario | Script | Tags |
| --- | --- | --- |
| Build + basic health check | [`tests/compile.sh`](../tests/compile.sh), [`tests/smoke.sh`](../tests/smoke.sh) | Runs on public CI and pre-merge branches. |
| Capability/self-test gate | [`ci/hw-smoke.sh`](../ci/hw-smoke.sh) | Requires AVX-512 + perf access; validates `--health-check`. |

The smoke suite compiles the dispatcher, runs `--health-check`, and captures metrics/log assertions expected in staging.

## Stress & Fault Injection

| Harness | Binary | Description |
| --- | --- | --- |
| Patch churn | `stress_patch_request` | Validates double-buffer trampolines under sustained AVX width flips. |
| Signal storm | `stress_signal_storm` | Ensures signal handling remains re-entrant while patching occurs. |
| Telemetry faults | `stress_telemetry_faults` | Feeds malformed snapshots to verify safe downgrade and alerting. |

Targets live in [`tests/stress/`](../tests/stress) and run automatically in the `stress` stage of `ci/pipeline.yml` on `hil`/`avx512` runners.

## Hardware-in-the-Loop

| Stage | Definition | Purpose |
| --- | --- | --- |
| `hardware-smoke` | [`ci/pipeline.yml`](../ci/pipeline.yml)`hardware-smoke` | Rebuilds and executes `ci/hw-smoke.sh` on bare metal. |
| `stress-suite` | [`ci/pipeline.yml`](../ci/pipeline.yml)`stress-suite` | Runs all stress harnesses with production-grade parameters. |
| `thermal-soak` | [`ci/pipeline.yml`](../ci/pipeline.yml)`thermal-soak` | Long-running thermal regression check; see [`ci/thermal-soak.sh`](../ci/thermal-soak.sh). |

Provisioning and operations guidance live in [`docs/ci-hil.md`](ci-hil.md); the fleet must expose the `hil` and `avx512` tags for the stages above to schedule.

## Security & Sandbox Coverage

| Workflow | Definition | Coverage Focus | Automation Notes | Release Review Requirement |
| --- | --- | --- | --- | --- |
| Supply-chain attestation | [`ci/security.yml`](../ci/security.yml) | Verifies attestation bundle signatures, SBOM drift, and binary provenance against `docs/security/threat-model.md`. | Runs on nightly + release-candidate branches with private signing materials. Fails open when credentials are unavailable. | Release manager must confirm artifacts uploaded and checklist in [`docs/security/threat-model.md`](security/threat-model.md) is satisfied before promoting. |
| Sandbox fuzzing | [`ci/sandbox.yml`](../ci/sandbox.yml) | Exercises `ci/sandbox/run.sh` dropout/spike scenarios to surface telemetry, patching, and metrics regressions. | Nightly on hosts labeled `sandbox`; artifacts archived under `artifacts/YYYYmmdd-HHMMSS/`. | QA lead reviews sandbox artifacts per [`docs/sandbox-workflow.md`](sandbox-workflow.md) exit criteria during release sign-off. |

The workflows above surface defects early, but their results are not yet wired into automated release gates. Treat failures and missing runs as blocking items during release reviews until that integration ships.

## Manual Runbooks

The following runbooks plug the remaining operational gaps and guide the manual checks noted above:

- [`docs/runbooks/sensor-failure.md`](runbooks/sensor-failure.md) covers telemetry dropouts and degraded mode remediation.
- [`docs/runbooks/policy-divergence.md`](runbooks/policy-divergence.md) walks through controller forecast drift triage.
- [`docs/runbooks/patcher-attestation-alert.md`](runbooks/patcher-attestation-alert.md) describes the manual attestation validation performed when `ci/security.yml` cannot complete.
- [`docs/security/threat-model.md`](security/threat-model.md) enumerates attestation requirements validated during release reviews.

## Roadmap Items

- Promote `ci/security.yml` verdicts to a mandatory release gate once signing infrastructure is accessible to CI.
- Expand `ci/sandbox.yml` to cover long-haul fuzzing and feed results into automated regression triage.