Skip to content

Fix missing beat receiver trace logs from diagnostic bundle#14716

Draft
macdewee wants to merge 5 commits into
mainfrom
drosiek-fix-missing-logs-in-diagnostics
Draft

Fix missing beat receiver trace logs from diagnostic bundle#14716
macdewee wants to merge 5 commits into
mainfrom
drosiek-fix-missing-logs-in-diagnostics

Conversation

@macdewee

@macdewee macdewee commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

In 9.x, beats run as in-process OTel receivers inside elastic-otel-collector. Their path.home is set to paths.Components() (i.e. {home}/components/), so their path.logs defaults to {home}/components/logs/. Input trace logs (e.g. from httpjson, cel, http_endpoint, entity-analytics) are written to {home}/components/logs/{input}/.

The old zipLogsWithPath only walked {pathsHome}/logs/ — missing {pathsHome}/components/logs/ entirely.

Changes:

  • zipLogsWithPath now walks both {pathsHome}/logs/ and {pathsHome}/components/logs/, placing trace logs under logs/{version}/{input}/ in the zip alongside the agent's own logs.
  • collectServiceComponentsLogs and the logs/ root header moved out of zipLogsWithPath into zipLogs so they run once, not once per versioned home.
  • Walk logic extracted into walkLogPath + zipLogWalkFunc helpers so both roots share the same walk callback.
  • Three new unit tests: single versioned home, multi versioned home, and container (unversioned) layout.
  • Changelog fragment added.

Why is it important?

Without this fix, elastic-agent diagnostics bundles are silently missing the HTTP request trace logs that are essential for debugging httpjson, cel, and entity-analytics integrations.

The root cause is that beats resolve trace log paths via ResolvePathInLogsFor, which places files under {home}/components/logs/{input}/. The diagnostics code only walked {pathsHome}/logs/ and never walked {pathsHome}/components/logs/.

The regression was introduced by elastic/beats#48909 (merged 2026-02-18), which added ResolvePathInLogsFor to fix tracer path validation under managed agents. Before that change, relative tracer paths were resolved against the process CWD, placing files under {ROOT}/logs/{input}/ which the old diagnostics walk did cover. After it, paths resolve against path.logs, placing files under {home}/components/logs/{input}/ which was never walked. This affects 9.x and later 8.19.x patch releases.

Fixes #14359.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

None. The zip layout for existing log files is unchanged. Trace log files are now present in the bundle where they were previously missing.

How to test this PR locally

  1. Run Elastic Agent 9.x with an httpjson input that has request.tracer.filename configured.
  2. Confirm the file appears on disk under {home}/components/logs/httpjson/.
  3. Run elastic-agent diagnostics.
  4. Unzip the bundle and confirm the trace file appears under logs/{version-hash}/httpjson/.

Unit tests can be run directly:

go test ./internal/pkg/diagnostics/... -run TestZipLogs -v

Related issues

@macdewee macdewee requested a review from a team as a code owner June 2, 2026 08:33
@mergify

mergify Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

This pull request does not have a backport label. Could you fix it @macdewee? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@swiatekm

swiatekm commented Jun 2, 2026

Copy link
Copy Markdown
Member

Can you add an integration test showing that we do, in fact, include these in the bundle? Modifying the existing filebeat diagnostic tests should work.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jun 2, 2026
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Signed-off-by: Dominik Rosiek <dominik.rosiek@elastic.co>
@macdewee macdewee force-pushed the drosiek-fix-missing-logs-in-diagnostics branch from 77cbcf8 to 415485c Compare June 2, 2026 13:06
@github-actions

This comment has been minimized.

Comment thread internal/pkg/diagnostics/diagnostics.go Outdated
}

if err := collectServiceComponentsLogs(zw); err != nil {
return fmt.Errorf("failed to collect endpoint-security logs: %w", err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just log a warning here and continue if error? Otherwise we wouldn't get any logs

Comment thread internal/pkg/diagnostics/diagnostics.go Outdated
// Beat receivers write trace logs under {pathsHome}/components/logs.
// Collect them under logs/<commitName>/ (same as agent logs) so the zip layout matches
// 8.19.x where trace files lived under {pathsHome}/logs directly.
return walkLogPath(filepath.Join(pathsHome, "components", "logs"), commitName, excludeEvents, zw, ts)

@samuelvl samuelvl Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is that if the same file exists in both {pathsHome}/logs/ and {pathsHome}/components/logs/ it is going to produce two zip entries with the same path logs/<commitName>/{file}.

unzip for example will replace the first one with the second which can be confusing when analyzing the diagnostics.

I would keep the original structure for component logs (inside components/logs) instead of putting everything in logs/.

// zipLogs walks paths.Logs() and copies the file structure into zw in "logs/"
func zipLogsWithPath(pathsHome, commitName string, collectServices, excludeEvents bool, zw *zip.Writer, ts time.Time) error {
// zipLogsWithPath walks {pathsHome}/logs and {pathsHome}/components/logs and copies them into zw
// under "logs/<commitName>/" and "logs/<commitName>/components/" respectively.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true if https://github.com/elastic/elastic-agent/pull/14716/changes#r3342415022 is implemented. Currently all logs are stored under logs/<commitName>/, not "logs/<commitName>/components/"

Signed-off-by: Dominik Rosiek <dominik.rosiek@elastic.co>
@macdewee macdewee force-pushed the drosiek-fix-missing-logs-in-diagnostics branch from 415485c to d5ee5ac Compare June 2, 2026 15:45
macdewee added 2 commits June 3, 2026 10:45
…crit errors

Signed-off-by: Dominik Rosiek <dominik.rosiek@elastic.co>
Signed-off-by: Dominik Rosiek <dominik.rosiek@elastic.co>
@github-actions

This comment has been minimized.

Signed-off-by: Dominik Rosiek <dominik.rosiek@elastic.co>
@macdewee macdewee marked this pull request as draft June 3, 2026 12:00
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

TL;DR

Both failing steps (windows:amd64:tier1:non-sudo:default and linux:amd64:tier1:non-sudo:default) are failing the same integration test: TestBeatDiagnostics/httpjson_trace_logs_in_bundle. The failing expectation is the trace-log glob path: it expects logs/*/httpjson/..., but diagnostics now writes trace logs under logs/*/components/httpjson/....

Remediation

  • Update the expected glob in testing/integration/ess/diagnostics_test.go from:
    • path.Join("logs", "*", "httpjson", "http-request-trace-*.ndjson")
      to:
    • path.Join("logs", "*", "components", "httpjson", "http-request-trace-*.ndjson")
  • Re-run the two failing tier1 default jobs (Windows + Linux) to confirm TestBeatDiagnostics/httpjson_trace_logs_in_bundle passes.
Investigation details

Root Cause

The new integration test case defines an expected trace pattern that does not match the archive layout produced by this PR:

  • testing/integration/ess/diagnostics_test.go:529 expects logs/*/httpjson/http-request-trace-*.ndjson

But this PR’s diagnostics logic and unit tests clearly place beat receiver trace files under components/httpjson inside the versioned logs directory:

  • internal/pkg/diagnostics/diagnostics_test.go:490 expects logs/elastic-agent-unknow/components/httpjson/http-request-trace-test.ndjson
  • internal/pkg/diagnostics/diagnostics_test.go:534 checks logs/<version>/components/httpjson/http-request-trace-test.ndjson

So the integration test expectation is currently off by one path segment (components).

Evidence

  • Build: https://buildkite.com/elastic/elastic-agent/builds/40942
  • Failed jobs:
    • windows:amd64:tier1:non-sudo:default
    • linux:amd64:tier1:non-sudo:default
  • Both logs fail at the same test:
    • .../elastic-agent-windowsamd64tier1non-sudodefault.txt:109--- FAIL: TestBeatDiagnostics/httpjson_trace_logs_in_bundle
    • .../elastic-agent-linuxamd64tier1non-sudodefault.txt:90--- FAIL: TestBeatDiagnostics/httpjson_trace_logs_in_bundle
  • Windows log shows the trace file is present under ...\logs\elastic-agent-...\components\httpjson\http-request-trace-...ndjson (line 2), while expected/actual list comparison differs by one file (listA len 31 vs listB len 32).

Verification

  • Not run locally (read-only workflow); conclusion based on Buildkite logs + PR source inspection at commit 2347aecd17ca518a5112c5745cb4c930c64f334e.

Follow-up

  • I checked for a matching flaky-test issue for this test name and did not find one.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #9167 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@elasticmachine

elasticmachine commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[elastic-agent] Beat receiver trace logs missing from diagnostic bundle in 9.x

5 participants