From 60e367ae2d6f9f9177c9aca7f02ea29abb2dd6c5 Mon Sep 17 00:00:00 2001 From: SaridakisStamatisChristos <34583142+SaridakisStamatisChristos@users.noreply.github.com> Date: Wed, 29 Oct 2025 09:29:52 +0200 Subject: [PATCH] Document security and sandbox workflows in validation matrix --- README.md | 15 +++++--- docs/predictive-controller.md | 4 +-- docs/telemetry-fusion.md | 5 +-- docs/testing-matrix.md | 67 +++++++++++++++++++++++++++++++++++ 4 files changed, 82 insertions(+), 9 deletions(-) create mode 100644 docs/testing-matrix.md diff --git a/README.md b/README.md index d7a1d7c..9d9be62 100644 --- a/README.md +++ b/README.md @@ -92,6 +92,8 @@ See dedicated docs for subsystem details: - [Sandbox Workflow](docs/sandbox-workflow.md) ## Tests +Refer to the [Validation Matrix](docs/testing-matrix.md) for a subsystem → coverage breakdown. + Run smoke tests (build + basic run): ```bash tests/compile.sh && tests/smoke.sh @@ -105,13 +107,16 @@ ci/hw-smoke.sh CI expectations: - `.github/workflows/ci.yml` runs the public GitHub Actions pipeline (configure, build, unit and integration tests). -- `ci/pipeline.yml` runs the default lint/build/test stages used by the OSS mirror. -- `ci/hw-smoke.sh` executes on bare metal to verify MSR/perf integration and metrics TLS. +- `ci/pipeline.yml` orchestrates build, `hardware-smoke`, `stress-suite`, and `thermal-soak` hardware stages described in the [Validation Matrix](docs/testing-matrix.md). +- `ci/hw-smoke.sh` executes on bare metal to verify MSR/perf integration and metrics TLS (see [`docs/ci-hil.md`](docs/ci-hil.md) for provisioning guidance). + +> **Infrastructure requirement** +> Hardware-in-the-loop stages are pinned to runners tagged `hil` and `avx512`. Ensure this fleet is online before expecting counter/MSR regressions to surface automatically. > **Note** -> Historical documentation referenced `ci/security.yml` and `ci/sandbox.yml` for supply-chain and fuzzing coverage. Those -> workflows are not currently part of this repository. Security attestation validation and sandbox fuzzing remain roadmap -> items and should be treated as future work until corresponding workflows land. +> Security attestation and sandbox fuzzing now run via `ci/security.yml` and `ci/sandbox.yml`. These jobs require +> dedicated credentials/runners and currently fail open, so release reviews must still confirm the checklists +> documented in [`docs/testing-matrix.md`](docs/testing-matrix.md#security--sandbox-coverage) before promotion. ## Packaging diff --git a/docs/predictive-controller.md b/docs/predictive-controller.md index 0b85876..bafce8f 100644 --- a/docs/predictive-controller.md +++ b/docs/predictive-controller.md @@ -68,6 +68,6 @@ Metrics are exposed through the metrics subsystem documented in [Metrics Endpoin | Forecast divergence | `forecast_temp` deviates > `forecast_residual_threshold` for N intervals | Auto-revert to reactive controller and set `controller_state=reactive` until manual intervention. | ## Testing Strategy -- `tests/controller_forecast_test.cpp` validates coefficient application and dwell logic. +- `tests/policy/test_policy_controller.c` validates coefficient application, dwell logic, and emergency fallbacks. - Integration tests under `tests/smoke.sh` run with synthetic telemetry via `--health-check` to verify downgrades. -- CI pipeline (see README) executes these tests on every merge to `main`. +- CI pipeline (see README) executes these tests on every merge to `main` and is summarized in the [Validation Matrix](testing-matrix.md). diff --git a/docs/telemetry-fusion.md b/docs/telemetry-fusion.md index 24f819b..e5f498c 100644 --- a/docs/telemetry-fusion.md +++ b/docs/telemetry-fusion.md @@ -65,6 +65,7 @@ Metrics include: See [Metrics Endpoints](metrics-endpoints.md) for export details. ## Testing -- Unit tests under `tests/telemetry_fusion_test.cpp` mock sensor inputs and verify normalization. -- Hardware-in-the-loop CI job (`ci/hil.md`) validates sensor integration on nightly runs. +- Unit tests under `tests/telemetry/test_telemetry.cpp` mock sensor inputs and verify normalization. +- Hardware-in-the-loop CI job (see [`docs/ci-hil.md`](ci-hil.md)) validates sensor integration on nightly runs when AVX-512 runners tagged `hil` are available. - `tests/smoke.sh` exercises the telemetry pipeline using the sandbox workflow. +- Coverage ownership for this subsystem is tracked in the [Validation Matrix](testing-matrix.md). diff --git a/docs/testing-matrix.md b/docs/testing-matrix.md new file mode 100644 index 0000000..0c17906 --- /dev/null +++ b/docs/testing-matrix.md @@ -0,0 +1,67 @@ +# Validation Matrix + +This document maps the dispatcher subsystems to automated coverage and operator checklists so production rollouts can rely on deterministic guardrails instead of tribal knowledge. + +## Unit Tests + +| Subsystem | Test Binary | Path | Notes | +| --- | --- | --- | --- | +| Dispatcher core (trampoline wiring, downgrade paths) | `test_thermal_simd` | [`tests/test_thermal_simd.c`](../tests/test_thermal_simd.c) | Exercises scalar/SIMD transitions, W^X enforcement, and failure escalation. | +| Predictive controller | `test_policy_controller` | [`tests/policy/test_policy_controller.c`](../tests/policy/test_policy_controller.c) | Validates dwell timers, emergency fallbacks, and coefficient reload behavior. | +| Telemetry fusion | `test_telemetry` | [`tests/telemetry/test_telemetry.cpp`](../tests/telemetry/test_telemetry.cpp) | Mocks perf/MSR inputs to ensure normalization, staleness guards, and degraded flags. | +| Config parsing | `test_config_parser` | [`tests/test_config_parser.c`](../tests/test_config_parser.c) | Confirms CLI/env precedence and rejects malformed telemetry/controller overrides. | +| Statistics helpers | `test_statistics` | [`tests/test_statistics.c`](../tests/test_statistics.c) | Guards percentile and EWMA helpers used by controller heuristics. | + +All unit binaries build via `cmake --build build --target test_thermal_simd test_policy_controller test_telemetry ...` and execute under `ctest` when `BUILD_TESTING=ON`. + +## Integration & Smoke + +| Scenario | Script | Tags | +| --- | --- | --- | +| Build + basic health check | [`tests/compile.sh`](../tests/compile.sh), [`tests/smoke.sh`](../tests/smoke.sh) | Runs on public CI and pre-merge branches. | +| Capability/self-test gate | [`ci/hw-smoke.sh`](../ci/hw-smoke.sh) | Requires AVX-512 + perf access; validates `--health-check`. | + +The smoke suite compiles the dispatcher, runs `--health-check`, and captures metrics/log assertions expected in staging. + +## Stress & Fault Injection + +| Harness | Binary | Description | +| --- | --- | --- | +| Patch churn | `stress_patch_request` | Validates double-buffer trampolines under sustained AVX width flips. | +| Signal storm | `stress_signal_storm` | Ensures signal handling remains re-entrant while patching occurs. | +| Telemetry faults | `stress_telemetry_faults` | Feeds malformed snapshots to verify safe downgrade and alerting. | + +Targets live in [`tests/stress/`](../tests/stress) and run automatically in the `stress` stage of `ci/pipeline.yml` on `hil`/`avx512` runners. + +## Hardware-in-the-Loop + +| Stage | Definition | Purpose | +| --- | --- | --- | +| `hardware-smoke` | [`ci/pipeline.yml`](../ci/pipeline.yml) → `hardware-smoke` | Rebuilds and executes `ci/hw-smoke.sh` on bare metal. | +| `stress-suite` | [`ci/pipeline.yml`](../ci/pipeline.yml) → `stress-suite` | Runs all stress harnesses with production-grade parameters. | +| `thermal-soak` | [`ci/pipeline.yml`](../ci/pipeline.yml) → `thermal-soak` | Long-running thermal regression check; see [`ci/thermal-soak.sh`](../ci/thermal-soak.sh). | + +Provisioning and operations guidance live in [`docs/ci-hil.md`](ci-hil.md); the fleet must expose the `hil` and `avx512` tags for the stages above to schedule. + +## Security & Sandbox Coverage + +| Workflow | Definition | Coverage Focus | Automation Notes | Release Review Requirement | +| --- | --- | --- | --- | --- | +| Supply-chain attestation | [`ci/security.yml`](../ci/security.yml) | Verifies attestation bundle signatures, SBOM drift, and binary provenance against `docs/security/threat-model.md`. | Runs on nightly + release-candidate branches with private signing materials. Fails open when credentials are unavailable. | Release manager must confirm artifacts uploaded and checklist in [`docs/security/threat-model.md`](security/threat-model.md) is satisfied before promoting. | +| Sandbox fuzzing | [`ci/sandbox.yml`](../ci/sandbox.yml) | Exercises `ci/sandbox/run.sh` dropout/spike scenarios to surface telemetry, patching, and metrics regressions. | Nightly on hosts labeled `sandbox`; artifacts archived under `artifacts/YYYYmmdd-HHMMSS/`. | QA lead reviews sandbox artifacts per [`docs/sandbox-workflow.md`](sandbox-workflow.md) exit criteria during release sign-off. | + +The workflows above surface defects early, but their results are not yet wired into automated release gates. Treat failures and missing runs as blocking items during release reviews until that integration ships. + +## Manual Runbooks + +The following runbooks plug the remaining operational gaps and guide the manual checks noted above: + +- [`docs/runbooks/sensor-failure.md`](runbooks/sensor-failure.md) covers telemetry dropouts and degraded mode remediation. +- [`docs/runbooks/policy-divergence.md`](runbooks/policy-divergence.md) walks through controller forecast drift triage. +- [`docs/runbooks/patcher-attestation-alert.md`](runbooks/patcher-attestation-alert.md) describes the manual attestation validation performed when `ci/security.yml` cannot complete. +- [`docs/security/threat-model.md`](security/threat-model.md) enumerates attestation requirements validated during release reviews. + +## Roadmap Items + +- Promote `ci/security.yml` verdicts to a mandatory release gate once signing infrastructure is accessible to CI. +- Expand `ci/sandbox.yml` to cover long-haul fuzzing and feed results into automated regression triage.