Skip to content

[9.3] (backport #14836) Use the checkin on state change mode in integration tests#14841

Open
mergify[bot] wants to merge 3 commits into
9.3from
mergify/bp/9.3/pr-14836
Open

[9.3] (backport #14836) Use the checkin on state change mode in integration tests#14841
mergify[bot] wants to merge 3 commits into
9.3from
mergify/bp/9.3/pr-14836

Conversation

@mergify

@mergify mergify Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Uses the checkin on state change mode in integration tests, to speed up tests that poll for the agent status to change in Fleet. This makes a few test significantly faster, like the log level related tests.

// assert Fleet eventually receives the new log level from agent through checkin
assert.Eventuallyf(t, func() bool {
fleetMetadataLogLevel, err := getLogLevelFromFleetMetadata(ctx, info.KibanaClient, agentID)
if err != nil {
t.Logf("error getting log level for agent %q from Fleet metadata: %v", agentID, err)
return false
}
t.Logf("Fleet metadata log level for agent %q: %q policy log level: %q", agentID, fleetMetadataLogLevel, policyLogLevel)
return fleetMetadataLogLevel == policyLogLevel.String()
}, 6*time.Minute, 30*time.Second, "agent never communicated policy log level %q to Fleet", policyLogLevel)

e
The impact on test execution overall is not that big because these tests weren't on the critical path. It's also bypassed in several of the upgrade tests until I backport this and come back and change the version guard.


This is an automatic backport of pull request #14836 done by Mergify.

* Integration tests now use on_state_change checkin by default.

* Speed up polling interval in log level test.

* Add test that persisted config sets the correct checkin mode.

* Document why the checkin-on-state-change is hidden

* Remove unnecessary comment

* Include checkin-on-state-change only when enrolling

* Only use new flag in versions that support it.

(cherry picked from commit cb4ff55)

# Conflicts:
#	internal/pkg/agent/configuration/fleet.go
@mergify mergify Bot requested a review from a team as a code owner June 8, 2026 15:56
@mergify mergify Bot added backport conflicts There is a conflict in the backported pull request labels Jun 8, 2026
@mergify mergify Bot requested review from blakerouse and michel-laterman and removed request for a team June 8, 2026 15:56
@mergify mergify Bot assigned cmacknz Jun 8, 2026
@mergify

mergify Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Cherry-pick of cb4ff55 has failed:

On branch mergify/bp/9.3/pr-14836
Your branch is up to date with 'origin/9.3'.

You are currently cherry-picking commit cb4ff5574.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   internal/pkg/agent/application/enroll/enroll.go
	modified:   internal/pkg/agent/application/enroll/options.go
	modified:   internal/pkg/agent/cmd/enroll.go
	modified:   internal/pkg/agent/cmd/enroll_cmd_test.go
	modified:   pkg/testing/fixture.go
	modified:   pkg/testing/fixture_install.go
	modified:   testing/integration/ess/log_level_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   internal/pkg/agent/configuration/fleet.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@github-actions

This comment has been minimized.

cmacknz
cmacknz previously approved these changes Jun 8, 2026
@cmacknz cmacknz enabled auto-merge (squash) June 8, 2026 20:28
@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

TL;DR

golangci-lint failed in lint (ubuntu-latest) on 5 new lint violations (gosec, noctx, nolintlint, and two unused constants). Fix those specific findings and re-run the workflow.

Remediation

  • Update pkg/testing/fixture_install.go to avoid opening callback-derived walk paths in a way that triggers gosec G122 (use a root-scoped/symlink-safe access pattern in that walk path handling).
  • In internal/pkg/agent/cmd/enroll_cmd_test.go, replace net.Listen(...) with (&net.ListenConfig{}).Listen(context.Background(), ...), and remove the now-unused //nolint:gosec directive on the TLS config line.
  • In internal/pkg/agent/configuration/fleet.go, either remove fleetCheckinModeStandard and fleetCheckinModeOnStateChanged or wire them into live code so they are no longer unused.
Investigation details

Root Cause

The run failed in the golangci-lint step due to five lint violations introduced in the PR patch. This is materially different from the prior detective report on this PR (which flagged missing checkin-compression symbols).

Evidence

pkg/testing/fixture_install.go:872:23: G122 ... race-prone path ... (gosec)
internal/pkg/agent/cmd/enroll_cmd_test.go:925:30: net.Listen must not be called ... (noctx)
internal/pkg/agent/cmd/enroll_cmd_test.go:995:22: directive `//nolint:gosec ...` is unused ... (nolintlint)
internal/pkg/agent/configuration/fleet.go:100:2: const fleetCheckinModeStandard is unused (unused)
internal/pkg/agent/configuration/fleet.go:101:2: const fleetCheckinModeOnStateChanged is unused (unused)
issues found

Validation

  • Not run locally (detective workflow is read-only); diagnosis is from GitHub Actions job logs.

Follow-up

  • After applying the above lint fixes, re-run golangci-lint/PR checks to confirm no additional --new-from-patch issues remain.

What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

TL;DR

Buildkite unit tests are failing due to a nil-pointer panic introduced in the new checkin-on-state-change enroll path. Initialize cfg.Checkin before writing cfg.Checkin.Mode in createFleetConfigFromEnroll.

Remediation

  • In internal/pkg/agent/application/enroll/enroll.go, guard/initialize cfg.Checkin before cfg.Checkin.Mode = configuration.FleetCheckinModeOnStateChange (for example: if cfg.Checkin == nil { cfg.Checkin = &configuration.FleetCheckin{} }).
  • Re-run the failing Unit tests steps (especially Unit tests - Ubuntu 22.04 with requirefips build tag) to confirm the enroll tests no longer panic.
Investigation details

Root Cause

This is a code bug. The PR adds checkin-mode wiring in enroll:

  • internal/pkg/agent/application/enroll/enroll.go:500 writes cfg.Checkin.Mode = ... when checkinOnStateChange is true.

But on this backport branch, the default fleet config sets checkin to nil:

  • internal/pkg/agent/configuration/fleet.go:62 has Checkin: nil in DefaultFleetAgentConfig().

So cfg.Checkin.Mode dereferences a nil pointer and panics.

Evidence

=== FAIL: internal/pkg/agent/cmd TestEnroll/checkin-on-state-change_flag_configures_on_state_change_checkin_mode
panic: runtime error: invalid memory address or nil pointer dereference
...
internal/pkg/agent/application/enroll/enroll.go:500 +0x5c9
  • Diff context also shows the newly introduced dereference:
    • internal/pkg/agent/application/enroll/enroll.go hunk @@ -490,12 +490,15 @@ adds:
if checkinOnStateChange {
    cfg.Checkin.Mode = configuration.FleetCheckinModeOnStateChange
}

Verification

  • Not run locally (detective workflow is read-only); diagnosis is based on Buildkite logs plus PR branch source at commit 6bbb3b4d5fc9952ae4cef1029f224ceed925d4c0.

Follow-up

  • Checked for a matching flaky-test issue for TestEnroll/checkin-on-state-change_flag_configures_on_state_change_checkin_mode; none found.

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport conflicts There is a conflict in the backported pull request skip-changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants