Production‑grade, Linux x86‑64 only. Runtime chooses between SSE4.1 / AVX2 / AVX‑512 (XMM‑only) and self‑patches a tiny trampoline under strict W^X with a double buffer. Thermal adaptation uses time‑scaled CPI from perf_event_open with hysteresis, cooldown and a minimum dwell time. A small shim handles scalar↔SIMD and avoids AVX/SSE transition penalties.
- Linux 5.9+ with
CAP_PERFMONavailable to the dispatcher user. /dev/cpu/*/msrreadable by the runtime (systemd unit grantsCAP_SYS_ADMIN).- Optional metrics TLS materials (certificate + key) if exposing
/metricsoff-host. - Attestation bundle (
patcher_measurement.json,attestor_pub.pem) staged under/etc/tsd/. - Config overrides for telemetry/predictive controller can be provided through
TSD_TELEMETRY_*andTSD_PREDICTIVE_*env vars.
makecmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -jRequires
CAP_PERFMONorsudo sysctl kernel.perf_event_paranoid=0.
./thermal_simd --help
./thermal_simd --no-avx512 --interval=100 --down-ratio=1.3 --duration-sec=5--config=FILEload overrides from a JSON file (see configuration docs).--interval=MScheck interval (default 50).--down-count=Nthrottles before downgrade (default 3).--up-count=Nstable intervals before upgrade (default 5).--down-ratio=Rthrottle threshold as CPI multiple (default 1.5).--cooldown-down=MScooldown after downgrade (default 1000).--cooldown-up=MScooldown after upgrade (default 2000).--min-dwell=MSminimum time per SIMD width (default 200).--no-avx512disable AVX‑512 usage.--duration-sec=Sruntime duration for demo (default 10).--work-iters=Ninner work iterations per tick (default 10,000,000).--degraded-timeout-sec=Sfail closed if hardware counters remain unavailable for S seconds (default 120).--log-level=LEVELset log verbosity (error,warn,info,debug; defaultinfo).--health-checkrun diagnostics (perf counters, telemetry, trampolines) and exit with status.
Predictive controller
--temp-ceiling=°Cpredictive controller ceiling (default 92).--safety-margin=°Cguard band below the ceiling for upgrades (default 4).--emergency-margin=°Cadditional buffer that triggers scalar fallback (default 10).--predictive-alpha=ACPI EWMA alpha in the predictive path (default 0.25).--coeff-path=PATHARX coefficient bundle (defaultconfig/controller_coeffs.json).
Telemetry fusion
--telemetry-interval=MScollector interval (default 50).--telemetry-max-skew=MSallowable skew between collectors (default 150).--telemetry-ewma=Atelemetry CPI EWMA alpha (default 0.25).--telemetry-profile=PATHoptional telemetry profile manifest.
Metrics & observability
--metrics-port=PORTPrometheus endpoint port (default 9464,0disables).--metrics-bind=ADDRbind address (default127.0.0.1).--metrics-cert=PATH/--metrics-key=PATHenable TLS for the metrics endpoint.--metrics-ca=PATHoptional client CA bundle when using mutual TLS.--metrics-require-client-authenforce mutual TLS for/metricsand/healthz.--metrics-basic-auth=user:passenable HTTP basic authentication.--statsd-host=HOSTemit StatsD metrics to the given host (disabled by default).--statsd-port=PORTStatsD UDP port (default 8125).
Environment override:
TSD_LOG_LEVELmirrors--log-levelfor non-interactive deployments.TSD_TELEMETRY_*,TSD_PREDICTIVE_*, andTSD_METRICS_*mirror respective CLI flags.
The dispatcher exposes a one-shot diagnostic mode that validates hardware counters, telemetry probes, and trampoline integrity before workloads start:
./thermal_simd --health-checkThe command exits non-zero when the dispatcher would operate in degraded mode (e.g. missing perf_event_open permissions or inaccessible MSRs) and increments the health_check_failures metric.
Structured log lines (key=value) and in-process counters provide hooks for Prometheus/StatsD scraping. The following counters are tracked in runtime_metrics.c and exposed via log snapshots:
perf_fallbacks/perf_recoveriestelemetry_temp_*,telemetry_freq_*,telemetry_msr_*patch_transitions/patch_failuressoftware_timeout_escalationshealth_check_failuresattestation_verificationsattestation_failuremetrics_flush_duration_ms
Sensor dropouts automatically trigger exponential back-off retries and emit logs such as event=telemetry_sensor state=degraded sensor=temp to simplify alert wiring.
See dedicated docs for subsystem details:
- Predictive Controller
- Controller Coefficient Format
- Telemetry Fusion
- Metrics Endpoints
- Sandbox Workflow
Refer to the Validation Matrix for a subsystem → coverage breakdown.
Run smoke tests (build + basic run):
tests/compile.sh && tests/smoke.shA hardware-backed nightly can re-use the new helper script:
ci/hw-smoke.shCI expectations:
.github/workflows/ci.ymlruns the public GitHub Actions pipeline (configure, build, unit and integration tests).ci/pipeline.ymlorchestrates build,hardware-smoke,stress-suite, andthermal-soakhardware stages described in the Validation Matrix.ci/hw-smoke.shexecutes on bare metal to verify MSR/perf integration and metrics TLS (seedocs/ci-hil.mdfor provisioning guidance).
Infrastructure requirement Hardware-in-the-loop stages are pinned to runners tagged
hilandavx512. Ensure this fleet is online before expecting counter/MSR regressions to surface automatically.
Note Security attestation and sandbox fuzzing now run via
ci/security.ymlandci/sandbox.yml. These jobs require dedicated credentials/runners and currently fail open, so release reviews must still confirm the checklists documented indocs/testing-matrix.mdbefore promotion.
packaging/Dockerfilebuilds a minimal container with the dispatcher defaulting to health checks on startup.packaging/systemd/thermal-simd.serviceis a hardened unit file that runs the binary with the required capabilities.packaging/kubernetes/daemonset.yamldemonstrates a daemonset with MSR/perf mounts and capability grants.
- Requires SSE4.1 (fails fast otherwise)
- Uses
perf_event_open; in containers, add--cap-add=SYS_ADMINor run privileged - XMM‑only payloads to minimize downclocks and power
- Patch failures restore trampoline page protections before retrying so the runtime fails closed
This project is distributed under a proprietary commercial license. See LICENSE for full terms.