This guide is the production baseline for deploying:
apps/cnc(C&C backend)apps/node-agent(one or more node agents, per LAN/site)
It consolidates deployment topology, secure defaults, secrets strategy, TLS/proxy expectations, backup/restore, and rollout/rollback operations.
Use a hub-and-spoke model:
- Internet-facing/API tier:
- Reverse proxy with TLS termination (Nginx, Traefik, or managed LB).
cncservice behind proxy.- PostgreSQL for C&C state (nodes, hosts, command history).
- LAN tier (per site):
node-agentrunning close to target devices.- Prefer host networking for ARP/WoL behavior in containerized setups.
- Outbound connection from node agent to C&C WebSocket endpoint.
High-level flow:
- Mobile app calls C&C REST API.
- C&C routes commands to the target node agent over WebSocket.
- Node agent performs LAN actions (scan/WoL) and reports results back.
Use the existing compose files as baseline templates:
apps/cnc/docker-compose.ymlapps/node-agent/docker-compose.yml
Recommended production pattern:
- Deploy C&C and PostgreSQL in a protected network segment.
- Deploy each node agent in its own LAN/site environment.
- Expose C&C only through TLS reverse proxy.
- Keep node agents non-public whenever possible (private network/VPN).
Do not commit production secrets to git. Inject through secret manager, orchestrator secrets, or encrypted env files.
Minimum required secrets/config:
- C&C:
NODE_AUTH_TOKENSOPERATOR_TOKENSADMIN_TOKENS(if admin JWT issuance is enabled)JWT_SECRETWS_SESSION_TOKEN_SECRETSDATABASE_URL(PostgreSQL in production)
- Node agent:
NODE_MODE=agentCNC_URL(preferwss://...in production)NODE_IDNODE_LOCATIONNODE_AUTH_TOKEN- optional tunnel mode:
TUNNEL_MODE=cloudflareCLOUDFLARE_TUNNEL_URL=https://...CLOUDFLARE_TUNNEL_TOKEN=...
- optional hardening:
NODE_API_KEY
Rotation guidance:
- Rotate JWT and WS session-token secrets using overlap windows (old+new) before removing old.
- Rotate node/operator/admin bootstrap tokens on a defined cadence.
- Validate reconnect/auth behavior after each rotation.
Related runbook:
apps/cnc/docs/runbooks/ws-session-token-rotation.md
Use this for remote node-agent access without router port-forwarding:
- Deploy
cloudflarednear the node-agent. - Route the tunnel hostname to node-agent (
http://localhost:8082). - Set:
TUNNEL_MODE=cloudflareCLOUDFLARE_TUNNEL_URL=https://<tunnel-hostname>CLOUDFLARE_TUNNEL_TOKEN=<token>
- Keep
NODE_AUTH_TOKENsynchronized with C&CNODE_AUTH_TOKENS. - Validate in C&C:
- node registration includes
publicUrl - command routing succeeds via tunnel endpoint
- if tunnel is unavailable, command routing falls back to direct WebSocket transport.
- node registration includes
Apply these defaults before go-live:
- TLS and proxy:
- Terminate TLS at trusted proxy/load balancer.
- Enforce HTTPS externally.
- Set
WS_REQUIRE_TLS=truein C&C production environments.
- Authentication and auth transport:
- Keep
WS_ALLOW_QUERY_TOKEN_AUTH=falsein production. - Use strong, unique token material.
- Restrict token distribution by role/use.
- Keep
- Network exposure:
- C&C reachable only via required ports (typically 443 from proxy, DB internal only).
- Node agent inbound access restricted to trusted management paths.
- Browser/API policy:
- Set explicit
CORS_ORIGINS(no wildcard in production).
- Set explicit
- Logging and observability:
- Use centralized log collection.
- Avoid logging secrets/tokens.
- Monitor health and command timeout/error rates.
- Use
docs/COMMAND_OUTCOME_METRICS.mdfor terminal-state triage workflow.
- Runtime hardening:
- Run as non-root where possible.
- Keep dependencies updated and pinned to reviewed versions.
Backup:
pg_dump --format=custom --file woly-cnc-$(date +%F-%H%M).dump "$DATABASE_URL"Restore:
pg_restore --clean --if-exists --no-owner --dbname "$DATABASE_URL" woly-cnc-YYYY-MM-DD-HHMM.dumpBackup:
cp /path/to/node-agent/db/woly.db /path/to/backups/woly-node-$(date +%F-%H%M).dbRestore:
cp /path/to/backups/woly-node-YYYY-MM-DD-HHMM.db /path/to/node-agent/db/woly.dbOperational notes:
- Test restore procedures in staging on a schedule.
- Version backups with timestamp and environment labels.
- Keep retention policy explicit (for example 7/30/90 day tiers).
Use this sequence for production deploys:
- Pre-deploy:
- Confirm secrets are present and rotated as needed.
- Run local gate:
npm run validate:standard - Run smoke gate:
npm run test:e2e:smoke - Confirm DB migration plan and backup created.
- Deploy C&C:
- Apply migrations.
- Deploy C&C with rolling strategy.
- Validate
/healthand/api/health.
- Deploy node agents (staged/canary):
- Canary subset first.
- Verify WebSocket registration and heartbeat stability.
- Verify host propagation and wake command routing.
- Full rollout:
- Expand to remaining nodes/sites.
- Monitor errors, timeouts, and reconnection rates.
Rollback immediately if any of the following occurs:
- Persistent auth failures (node reconnect loops, widespread 401/403).
- Command routing failures/timeouts above acceptable threshold.
- Data integrity regressions after migration/deploy.
Rollback steps:
- Freeze further rollout.
- Re-deploy last known-good C&C and/or node-agent artifacts.
- Restore DB from backup only if required for data consistency.
- Verify node reconnection, host sync, and wake routing path.
- Document incident timeline, root cause, and corrective actions.
Run these checks after each production rollout:
- C&C health:
GET /healthGET /api/health
- Node health:
GET /api/nodesGET /api/nodes/:id/health
- Host aggregation:
GET /api/hosts
- Command path:
- Trigger controlled wake command to test host.
- Repo smoke suite:
npm run test:e2e:smoke
- Command outcome observability:
GET /api/metricsincludeswoly_cnc_command_outcomes_total- Review per-type terminal states using
docs/COMMAND_OUTCOME_METRICS.md
If any check fails, pause rollout and execute rollback checklist.