This guide documents command outcome telemetry exposed by C&C and a fast triage workflow for terminal-state spikes.
GET /api/healthincludes runtime metrics in the JSON payload.GET /api/metricsexposes Prometheus text format metrics.
Primary terminal outcome metric:
woly_cnc_command_outcomes_total{type,state}type:wake,scan,update-host,delete-host(orunknownonly when attribution is unavailable).state:acknowledged,failed,timed_out.- Value: cumulative process-local count for the
(type, state)label pair.
Related command metrics:
woly_cnc_commands_by_type{type,state}- Includes
dispatchedplus terminal states.
- Includes
woly_cnc_command_timeout_rate- Fraction of completed commands that ended in
timed_out.
- Fraction of completed commands that ended in
woly_cnc_command_avg_latency_ms- Average command latency in milliseconds.
woly_cnc_command_last_latency_ms- Last observed command latency in milliseconds.
Use this sequence when failures or timeouts increase:
- Confirm scope:
- Compare
woly_cnc_command_outcomes_totalforstate=\"failed\"andstate=\"timed_out\". - Identify which
typelabel has the largest increase.
- Compare
- Check dispatch vs terminal behavior:
- Compare
woly_cnc_commands_by_type{state=\"dispatched\"}to terminal outcome counts for the sametype. - A widening gap usually indicates in-flight backlog or delayed results.
- Compare
- Check timeout pressure:
- Review
woly_cnc_command_timeout_ratetrend. - If timeout rate rises with stable dispatch volume, investigate node reachability and command timeout settings.
- Review
- Correlate with control-plane health:
- Check
GET /api/nodesandGET /api/nodes/:id/healthfor offline/stale nodes. - Check recent command outcomes in admin APIs (
GET /api/admin/commands).
- Check
- Decide immediate action:
- If isolated to one site/node: canary pause and targeted node-agent diagnostics.
- If broad across command types: pause rollout and execute rollback checklist in
docs/PRODUCTION_DEPLOYMENT_GUIDE.md.
Minimal sanity checks after deploy:
GET /api/metricsreturnswoly_cnc_command_outcomes_total.- At least one tracked
typelabel is present for each terminalstate. state=\"unknown\"remains absent or near-zero under normal operation.