Skip to content

[extension/opampextension] Fix self-reported status events dropped by non-Reporter hosts#49412

Open
CodeBlackwell wants to merge 3 commits into
open-telemetry:mainfrom
CodeBlackwell:crg-farm/opamp-agent-fix
Open

[extension/opampextension] Fix self-reported status events dropped by non-Reporter hosts#49412
CodeBlackwell wants to merge 3 commits into
open-telemetry:mainfrom
CodeBlackwell:crg-farm/opamp-agent-fix

Conversation

@CodeBlackwell

Copy link
Copy Markdown

Root cause

opampAgent.Start wires the extension's internal reportFunc (used to report the extension's own status, e.g. from monitorPPID) directly to componentstatus.ReportStatus(host, event).

componentstatus.ReportStatus is a no-op when the component.Host passed to Start does not implement componentstatus.Reporter. In that case, self-reported events from the opamp extension — most notably the fatal error monitorPPID emits when the collector's parent process disappears — were silently discarded instead of reaching the extension's own status aggregator. The collector could keep reporting healthy even though it had been orphaned.

The fix

  • Added an extensionID component.ID field to opampAgent, populated from extension.Settings.ID in newOpampAgent.
  • In Start, build selfInstanceID := componentstatus.NewInstanceID(o.extensionID, component.KindExtension) and route reportFunc through the extension's own ComponentStatusChanged handler (o.ComponentStatusChanged(selfInstanceID, event)) instead of componentstatus.ReportStatus(host, event).

ComponentStatusChanged always forwards events into the extension's internal status aggregator, so self-reported events now reach it regardless of whether the host implements componentstatus.Reporter.

Before / after

  • Before: self-reported status events (fatal error on parent-process disappearance, etc.) were dropped whenever host didn't implement componentstatus.Reporter (e.g. componenttest.NewNopHost()), so the collector's health state never reflected them.
  • After: self-reported events are delivered to the extension's own status aggregator unconditionally, independent of the host's capabilities.

Test added

TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter (extension/opampextension/opamp_agent_test.go) starts the agent against componenttest.NewNopHost() — which intentionally does not implement componentstatus.Reporter — reports a fatal error via o.reportFunc, and asserts the mock status aggregator receives it with StatusFatalError and the original error.

Status

Fix was produced and verified by an automated CRG (code-review-graph) debug pass: discovery → adversarial verification → TDD fix (RED before the fix, GREEN after) → final gate. Final gate: clean.

Draft PR — opening for early feedback; not requesting review yet.

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jul 1, 2026

Copy link
Copy Markdown

CLA Missing ID

One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via:

Co-authored-by: name <email>

Supported Co-authored-by: formats include:

  1. Anything <id+login@users.noreply.github.com> - it will locate your GitHub user by id part.
  2. Anything <login@users.noreply.github.com> - it will locate your GitHub user by login part.
  3. Anything <public-email> - it will locate your GitHub user by public-email part. Note that this email must be made public on Github.
  4. Anything <other-email> - it will locate your GitHub user by other-email part but only if that email was used before for any other CLA as a main commit author.
  5. login <any-valid-email> - it will locate your GitHub user by login part, note that login part must be at least 3 characters long.

Alternatively, if the co-author should not be included, remove the Co-authored-by: line from the commit message.

Please update your commit message(s) by doing git commit --amend and then git push [--force] and then request re-running CLA check via commenting on this pull request:

/easycla

@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

  • Read our Contributing Guidelines.
  • Sign the CLA if you haven't already.
  • First-time contributors should have at most one PR not marked as draft until their first PR is merged.
  • If your change isn't one of our priority components, reviews may take more time.
  • Give reviewers at least a few days before pinging them for feedback.
  • If you need help or struggle to move your PR forward:

@CodeBlackwell CodeBlackwell marked this pull request as ready for review July 1, 2026 03:14

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb2668a22f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

selfInstanceID := componentstatus.NewInstanceID(o.extensionID, component.KindExtension)
o.reportFunc = func(event *componentstatus.Event) {
componentstatus.ReportStatus(host, event)
o.ComponentStatusChanged(selfInstanceID, event)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route fatal self-reports through the host reporter

With the normal collector service host, host implements componentstatus.Reporter; before this change the self-report went through that reporter, whose status notification path also sends StatusFatalError to the service AsyncErrorChannel. Calling the OpAMP watcher directly only updates this extension's internal aggregator, so when ppid is configured and the parent disappears the OpAMP health may flip but the collector no longer receives the fatal event and keeps running orphaned (and with reports_health: false this send has no initialized componentStatusCh to receive it). Please preserve the host reporter path when available and add the internal fallback for non-reporter hosts.

Useful? React with 👍 / 👎.

Comment thread extension/opampextension/opamp_agent.go
…ped by non-Reporter hosts

opampAgent.Start wired its internal reportFunc to call
componentstatus.ReportStatus(host, event) directly. That helper silently
no-ops when the component.Host passed to Start does not implement
componentstatus.Reporter, so self-reported status changes -- most notably
the fatal error monitorPPID emits when the collector's parent process
disappears -- never reached the extension's own status aggregator. The
collector could keep reporting healthy despite being orphaned.

Track the extension's own component.ID (added extensionID field, set from
extension.Settings.ID in newOpampAgent) and route self-reported events
through the extension's own ComponentStatusChanged handler instead, which
always forwards to the status aggregator regardless of what the host
implements.

Adds TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter, which
starts the agent against componenttest.NewNopHost() (which intentionally
does not implement componentstatus.Reporter) and asserts a fatal error
event reported via o.reportFunc still reaches the mock status aggregator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CodeBlackwell CodeBlackwell force-pushed the crg-farm/opamp-agent-fix branch from eb2668a to 42a171e Compare July 1, 2026 16:31
Per reviewer feedback (@avy252004, chatgpt-codex-connector): the initial fix
replaced componentstatus.ReportStatus(host, event) outright with the internal
ComponentStatusChanged handler. That fixed the non-Reporter (NopHost) case but
regressed production: with the real collector service host, ReportStatus routes
through the status notification path, which also forwards a StatusFatalError to
the service's AsyncErrorChannel so an orphaned collector actually shuts down.
Bypassing it left the collector running orphaned.

reportFunc now prefers the host reporter when the host implements
componentstatus.Reporter, and only falls back to the extension's own
ComponentStatusChanged handler for non-Reporter hosts, where ReportStatus would
otherwise be a silent no-op.

Adds TestReportFuncPrefersHostReporter covering the reporter path; the existing
TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter still covers the
non-Reporter fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CodeBlackwell

Copy link
Copy Markdown
Author

Thanks @avy252004 and @chatgpt-codex-connector — you're both right. The first version replaced componentstatus.ReportStatus(host, event) outright, which fixed the non-Reporter (NopHost) case but regressed production: with the real service host, ReportStatus also forwards a StatusFatalError to the service's AsyncErrorChannel, so bypassing it would leave an orphaned collector running instead of shutting down.

Fixed in 706ec46 with the if/else you suggested:

o.reportFunc = func(event *componentstatus.Event) {
    if _, ok := host.(componentstatus.Reporter); ok {
        componentstatus.ReportStatus(host, event)
        return
    }
    o.ComponentStatusChanged(selfInstanceID, event)
}

So the production path is preserved when the host implements componentstatus.Reporter, and the internal aggregator fallback only kicks in for non-Reporter hosts (where ReportStatus is a silent no-op). Added TestReportFuncPrefersHostReporter for the reporter path; the existing NopHost test still covers the fallback. Changelog subtext updated to match.

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension/opamp first-time contributor PRs made by new contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants