Findings
1. shouldFleetEnroll compares full Fleet URL strings without canonical normalization
Priority: P0 (unrecoverable re-enrollment loop risk)
Location
internal/pkg/agent/cmd/container.go:1212-1229
internal/pkg/agent/cmd/container.go:1237-1241
- Re-enroll trigger site:
internal/pkg/agent/cmd/container.go:304-309
Evidence
At re-enroll decision time, the code checks:
matchedFullURL := slices.Contains(storedFleetHosts, setupCfg.Fleet.URL)
matchedHostOnly := slices.Contains(storedFleetHosts, setupFleetHost)
if !matchedFullURL && !matchedHostOnly { return true, nil }
This is raw string comparison for full URLs. In the post-policy-update layout, stored hosts are full URLs (covered by test case in internal/pkg/agent/cmd/container_test.go:509-534) and host-only fallback does not apply.
A concrete mismatch case that currently re-enrolls:
- stored host:
(host1/redacted)
- setup
FLEET_URL: `(host1/redacted)
These are semantically the same endpoint, but matchedFullURL is false and matchedHostOnly is also false (stored value is full URL, not host-only), so the agent decides enrollment is required.
What is wrong
Equivalent Fleet endpoints can be treated as different due to non-canonical URL comparison (trailing slash, and similarly canonicalization-sensitive forms like explicit default ports).
Why it matters
In container mode, this decision is used at startup (runContainerCmd), so equivalent-but-differently-formatted Fleet URLs can repeatedly trigger re-enrollment on restarts. This causes credential churn and enrollment instability in normal operations (e.g., user-provided env var formatting drift), matching the re-enrollment loop bug class.
Suggested fix direction
Normalize both stored and setup Fleet URLs before comparison (parse + canonicalize scheme/host/port/path-slash) and compare canonical endpoint identity instead of raw strings. Keep protocol-change checks for pre-policy-update host-only layout.
Suggested Actions
Communication paths audited and found resilient
- Check-in elapsed-time handling uses monotonic-safe
time.Since in internal/pkg/fleetapi/checkin_cmd.go.
- Unauthorized scheduler switch/reset behavior is covered by
internal/pkg/agent/application/gateway/fleet/fleet_gateway_test.go:704-785.
- Liveness
failon=degraded|failed semantics are covered in internal/pkg/agent/application/monitoring/liveness.go and liveness_test.go.
- Enrollment retry/backoff path in
internal/pkg/agent/application/enroll/enroll.go handles transient network/server classes with backoff.
What is this? | From workflow: Sweeper: Fleet Enrollment and Communication Resilience
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
Findings
1.
shouldFleetEnrollcompares full Fleet URL strings without canonical normalizationPriority: P0 (unrecoverable re-enrollment loop risk)
Location
internal/pkg/agent/cmd/container.go:1212-1229internal/pkg/agent/cmd/container.go:1237-1241internal/pkg/agent/cmd/container.go:304-309Evidence
At re-enroll decision time, the code checks:
matchedFullURL := slices.Contains(storedFleetHosts, setupCfg.Fleet.URL)matchedHostOnly := slices.Contains(storedFleetHosts, setupFleetHost)if !matchedFullURL && !matchedHostOnly { return true, nil }This is raw string comparison for full URLs. In the post-policy-update layout, stored hosts are full URLs (covered by test case in
internal/pkg/agent/cmd/container_test.go:509-534) and host-only fallback does not apply.A concrete mismatch case that currently re-enrolls:
(host1/redacted)FLEET_URL: `(host1/redacted)These are semantically the same endpoint, but
matchedFullURLis false andmatchedHostOnlyis also false (stored value is full URL, not host-only), so the agent decides enrollment is required.What is wrong
Equivalent Fleet endpoints can be treated as different due to non-canonical URL comparison (trailing slash, and similarly canonicalization-sensitive forms like explicit default ports).
Why it matters
In container mode, this decision is used at startup (
runContainerCmd), so equivalent-but-differently-formatted Fleet URLs can repeatedly trigger re-enrollment on restarts. This causes credential churn and enrollment instability in normal operations (e.g., user-provided env var formatting drift), matching the re-enrollment loop bug class.Suggested fix direction
Normalize both stored and setup Fleet URLs before comparison (parse + canonicalize scheme/host/port/path-slash) and compare canonical endpoint identity instead of raw strings. Keep protocol-change checks for pre-policy-update host-only layout.
Suggested Actions
shouldFleetEnrollbefore host matching.(host1/redacted)vs setup(host1/redacted) expectingshouldFleetEnroll == false`.(host1/redacted)vs(host1/redacted),(host1/redacted) vs(host1/redacted) in post-policy-update full-URL layout.Communication paths audited and found resilient
time.Sinceininternal/pkg/fleetapi/checkin_cmd.go.internal/pkg/agent/application/gateway/fleet/fleet_gateway_test.go:704-785.failon=degraded|failedsemantics are covered ininternal/pkg/agent/application/monitoring/liveness.goandliveness_test.go.internal/pkg/agent/application/enroll/enroll.gohandles transient network/server classes with backoff.What is this? | From workflow: Sweeper: Fleet Enrollment and Communication Resilience
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.