AuroraBoot fleet-server hardening — non-Redfish review findings
A whole-codebase review (5 parallel reviewers) ran alongside the Redfish work (#4109) to confirm the rest of AuroraBoot behaves as intended. It surfaced production blockers independent of Redfish that touch the same fleet server. Tracked here so they aren't lost between tracks.
Code in kairos-io/AuroraBoot. Several items were independently confirmed by ≥2 reviewers.
Production blockers (minimum bar before exposing the fleet server)
Concurrency / correctness
Smaller but real
Recommend triaging the Critical/High items onto their own branch(es), separate from the Redfish track. The BOLA + TLS + cred-logging trio is the minimum bar for exposing the fleet server.
Follow-ups discovered during hardening
AuroraBoot fleet-server hardening — non-Redfish review findings
A whole-codebase review (5 parallel reviewers) ran alongside the Redfish work (#4109) to confirm the rest of AuroraBoot behaves as intended. It surfaced production blockers independent of Redfish that touch the same fleet server. Tracked here so they aren't lost between tracks.
Production blockers (minimum bar before exposing the fleet server)
fix/fleet-server-hardening(commit 1640efc): identity bound at route/handler/store layers; also fixed agent REST route-shadowing. Security-reviewed.:nodeID/:commandIDinstead of the authenticated identity → any registered node can heartbeat as, read/consume commands for, and set command status on any other node.pkg/handlers/nodes.go(Heartbeat/GetCommands),commands.go(UpdateStatus),pkg/ws/handler.go. Fix: bind path params toauth.ContextKeyNodeID; scope command lookups by node.losetup -Dglobal detach (Critical). FIXED onfix/fleet-server-hardening(commit d8e9c03): removed the global detach;-f --showallocates one device, deferred cleanup detaches exactly it.pkg/ops/rawDiskGeneration.godetaches all host loop devices before attaching the build image → corrupts concurrent builds / host mounts. Fix: use only the device fromlosetup -f --showand detach exactly that one.fix/fleet-server-hardening(commit 9ce77d2): --tls-cert/--tls-key enable HTTPS; plaintext warns loudly.internal/cmd/web.goe.Startis plaintext only; all three bearer creds + cloud-configs (with reg token) cross the wire in clear. Add--tls-cert/--tls-keyor require a TLS-terminating proxy. Gates the Redfish server path (Redfish Phase 1b: integration layer (tokenized ISO-serve, rewire call sites, creds-at-rest, context/reconciler) #4111 T10).?token=(High). FIXED onfix/fleet-server-hardening(commit 9ce77d2): RequestLoggerWithConfig + redactToken redacts the token query param (fail-closed).?token=and the defaultmiddleware.Logger()writes the full URI → admin password / API keys land in access logs. Strip/redact query strings; prefer headers.fix/fleet-server-hardening(commit 1a784f3): allowlist-validate Model/KairosVersion/KubernetesVersion before the Dockerfile RUN.internal/builder/auroraboot/builder.gosplicesModel/KairosVersion/KubernetesVersioninto a DockerfileRUNshell line. Validate against tight patterns / pass via ARG.fix/fleet-server-hardening(commit d99e5ea): validate key-set name^[A-Za-z0-9_-]{1,64}$in GenerateKeys/ImportKeys.pkg/handlers/secureboot.gojoins an unvalidatedName/?name=into a filesystem path (GenerateKeys/ImportKeys). Allowlist^[A-Za-z0-9_-]+$.UploadOverlayunsafe tar extract (Med). FIXED onfix/fleet-server-hardening(commit 27fe702): in-process tar extraction with member containment, symlink rejection, and size caps.pkg/handlers/artifacts.gopipes upload totar xzf -Cwith no member containment / size cap (theImportKeyspath shows the correct bar).Concurrency / correctness
*websocket.Conn(pkg/ws/hub.go+handler.go) — single-writer per conn.os.Chdirinpkg/ops/iso.go InjectISOandpkg/uki/uki.go Buildwhile the builder runs them in goroutines — corrupts cwd of concurrent builds.ServeArtifacts/ServeUkiPXEregister on the globalhttp.DefaultServeMux(panic if invoked twice); also plaintext, no auth, browsable directory listing on0.0.0.0.deployer/steps.go.Smaller but real
pkg/auth/middleware.go) — usecrypto/subtle.SecureBoot.Importsends raw gzip but the server requires multipart (broken).GenerateKeyssilently dropssecureBootEnroll.yaml.Unmarshalerrors ininternal/config/config.go.downloadFilehas no timeout/size cap.Recommend triaging the Critical/High items onto their own branch(es), separate from the Redfish track. The BOLA + TLS + cred-logging trio is the minimum bar for exposing the fleet server.
Follow-ups discovered during hardening
GET /api/v1/artifacts/:id/download/*and/image(DownloadMiddleware) accept admin OR any node API key, but the artifact lookup is keyed by build id with no node scoping — any valid node key can download any artifact (image/cloud-config) intended for another node. Pre-existing design (artifacts keyed by build, not node), surfaced during the BOLA review. If the threat model requires mutually-distrusting nodes, add a node→artifact authorization check. (staff-engineer to decide intent.)