Skip to content

AuroraBoot fleet-server hardening (non-Redfish review findings) #4117

@wrkode

Description

@wrkode

AuroraBoot fleet-server hardening — non-Redfish review findings

A whole-codebase review (5 parallel reviewers) ran alongside the Redfish work (#4109) to confirm the rest of AuroraBoot behaves as intended. It surfaced production blockers independent of Redfish that touch the same fleet server. Tracked here so they aren't lost between tracks.

Code in kairos-io/AuroraBoot. Several items were independently confirmed by ≥2 reviewers.

Production blockers (minimum bar before exposing the fleet server)

  • BOLA / node-impersonation (Critical). FIXED on branch fix/fleet-server-hardening (commit 1640efc): identity bound at route/handler/store layers; also fixed agent REST route-shadowing. Security-reviewed.
    • ORIGINAL: Agent endpoints authenticate the node API key but then trust the path :nodeID/:commandID instead of the authenticated identity → any registered node can heartbeat as, read/consume commands for, and set command status on any other node. pkg/handlers/nodes.go (Heartbeat/GetCommands), commands.go (UpdateStatus), pkg/ws/handler.go. Fix: bind path params to auth.ContextKeyNodeID; scope command lookups by node.
  • losetup -D global detach (Critical). FIXED on fix/fleet-server-hardening (commit d8e9c03): removed the global detach; -f --show allocates one device, deferred cleanup detaches exactly it.
    • ORIGINAL: pkg/ops/rawDiskGeneration.go detaches all host loop devices before attaching the build image → corrupts concurrent builds / host mounts. Fix: use only the device from losetup -f --show and detach exactly that one.
  • No TLS on the server (High). FIXED on fix/fleet-server-hardening (commit 9ce77d2): --tls-cert/--tls-key enable HTTPS; plaintext warns loudly.
  • Credentials logged via ?token= (High). FIXED on fix/fleet-server-hardening (commit 9ce77d2): RequestLoggerWithConfig + redactToken redacts the token query param (fail-closed).
    • ORIGINAL: All three auth schemes accept ?token= and the default middleware.Logger() writes the full URI → admin password / API keys land in access logs. Strip/redact query strings; prefer headers.
  • Command injection in build (High). FIXED on fix/fleet-server-hardening (commit 1a784f3): allowlist-validate Model/KairosVersion/KubernetesVersion before the Dockerfile RUN.
    • ORIGINAL: internal/builder/auroraboot/builder.go splices Model/KairosVersion/KubernetesVersion into a Dockerfile RUN shell line. Validate against tight patterns / pass via ARG.
  • SecureBoot key-name path traversal (High). FIXED on fix/fleet-server-hardening (commit d99e5ea): validate key-set name ^[A-Za-z0-9_-]{1,64}$ in GenerateKeys/ImportKeys.
    • ORIGINAL: pkg/handlers/secureboot.go joins an unvalidated Name/?name= into a filesystem path (GenerateKeys/ImportKeys). Allowlist ^[A-Za-z0-9_-]+$.
  • UploadOverlay unsafe tar extract (Med). FIXED on fix/fleet-server-hardening (commit 27fe702): in-process tar extraction with member containment, symlink rejection, and size caps.
    • ORIGINAL: pkg/handlers/artifacts.go pipes upload to tar xzf -C with no member containment / size cap (the ImportKeys path shows the correct bar).

Concurrency / correctness

  • WS hub: concurrent writes to one *websocket.Conn (pkg/ws/hub.go + handler.go) — single-writer per conn.
  • Dual command delivery (REST poll + WS push) can double-deliver — make delivery idempotent.
  • Process-global os.Chdir in pkg/ops/iso.go InjectISO and pkg/uki/uki.go Build while the builder runs them in goroutines — corrupts cwd of concurrent builds.
  • ServeArtifacts/ServeUkiPXE register on the global http.DefaultServeMux (panic if invoked twice); also plaintext, no auth, browsable directory listing on 0.0.0.0.
  • GCE/VHD convert steps mutate the same raw file in parallel (no mutual DAG dependency) — deployer/steps.go.

Smaller but real

  • Non-constant-time token comparisons (pkg/auth/middleware.go) — use crypto/subtle.
  • SDK SecureBoot.Import sends raw gzip but the server requires multipart (broken).
  • GenerateKeys silently drops secureBootEnroll.
  • Ignored yaml.Unmarshal errors in internal/config/config.go.
  • No rate limiting on register/auth; downloadFile has no timeout/size cap.

Recommend triaging the Critical/High items onto their own branch(es), separate from the Redfish track. The BOLA + TLS + cred-logging trio is the minimum bar for exposing the fleet server.

Follow-ups discovered during hardening

  • Cross-node artifact download (read surface). GET /api/v1/artifacts/:id/download/* and /image (DownloadMiddleware) accept admin OR any node API key, but the artifact lookup is keyed by build id with no node scoping — any valid node key can download any artifact (image/cloud-config) intended for another node. Pre-existing design (artifacts keyed by build, not node), surfaced during the BOLA review. If the threat model requires mutually-distrusting nodes, add a node→artifact authorization check. (staff-engineer to decide intent.)

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status
No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions