[extension/opampextension] Fix self-reported status events dropped by non-Reporter hosts#49412
[extension/opampextension] Fix self-reported status events dropped by non-Reporter hosts#49412CodeBlackwell wants to merge 3 commits into
Conversation
One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via: Supported
Alternatively, if the co-author should not be included, remove the Please update your commit message(s) by doing |
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb2668a22f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| selfInstanceID := componentstatus.NewInstanceID(o.extensionID, component.KindExtension) | ||
| o.reportFunc = func(event *componentstatus.Event) { | ||
| componentstatus.ReportStatus(host, event) | ||
| o.ComponentStatusChanged(selfInstanceID, event) |
There was a problem hiding this comment.
Route fatal self-reports through the host reporter
With the normal collector service host, host implements componentstatus.Reporter; before this change the self-report went through that reporter, whose status notification path also sends StatusFatalError to the service AsyncErrorChannel. Calling the OpAMP watcher directly only updates this extension's internal aggregator, so when ppid is configured and the parent disappears the OpAMP health may flip but the collector no longer receives the fatal event and keeps running orphaned (and with reports_health: false this send has no initialized componentStatusCh to receive it). Please preserve the host reporter path when available and add the internal fallback for non-reporter hosts.
Useful? React with 👍 / 👎.
…ped by non-Reporter hosts opampAgent.Start wired its internal reportFunc to call componentstatus.ReportStatus(host, event) directly. That helper silently no-ops when the component.Host passed to Start does not implement componentstatus.Reporter, so self-reported status changes -- most notably the fatal error monitorPPID emits when the collector's parent process disappears -- never reached the extension's own status aggregator. The collector could keep reporting healthy despite being orphaned. Track the extension's own component.ID (added extensionID field, set from extension.Settings.ID in newOpampAgent) and route self-reported events through the extension's own ComponentStatusChanged handler instead, which always forwards to the status aggregator regardless of what the host implements. Adds TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter, which starts the agent against componenttest.NewNopHost() (which intentionally does not implement componentstatus.Reporter) and asserts a fatal error event reported via o.reportFunc still reaches the mock status aggregator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eb2668a to
42a171e
Compare
Per reviewer feedback (@avy252004, chatgpt-codex-connector): the initial fix replaced componentstatus.ReportStatus(host, event) outright with the internal ComponentStatusChanged handler. That fixed the non-Reporter (NopHost) case but regressed production: with the real collector service host, ReportStatus routes through the status notification path, which also forwards a StatusFatalError to the service's AsyncErrorChannel so an orphaned collector actually shuts down. Bypassing it left the collector running orphaned. reportFunc now prefers the host reporter when the host implements componentstatus.Reporter, and only falls back to the extension's own ComponentStatusChanged handler for non-Reporter hosts, where ReportStatus would otherwise be a silent no-op. Adds TestReportFuncPrefersHostReporter covering the reporter path; the existing TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter still covers the non-Reporter fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @avy252004 and @chatgpt-codex-connector — you're both right. The first version replaced Fixed in 706ec46 with the if/else you suggested: o.reportFunc = func(event *componentstatus.Event) {
if _, ok := host.(componentstatus.Reporter); ok {
componentstatus.ReportStatus(host, event)
return
}
o.ComponentStatusChanged(selfInstanceID, event)
}So the production path is preserved when the host implements |
|
To use Codex here, create an environment for this repo. |
Root cause
opampAgent.Startwires the extension's internalreportFunc(used to report the extension's own status, e.g. frommonitorPPID) directly tocomponentstatus.ReportStatus(host, event).componentstatus.ReportStatusis a no-op when thecomponent.Hostpassed toStartdoes not implementcomponentstatus.Reporter. In that case, self-reported events from the opamp extension — most notably the fatal errormonitorPPIDemits when the collector's parent process disappears — were silently discarded instead of reaching the extension's own status aggregator. The collector could keep reporting healthy even though it had been orphaned.The fix
extensionID component.IDfield toopampAgent, populated fromextension.Settings.IDinnewOpampAgent.Start, buildselfInstanceID := componentstatus.NewInstanceID(o.extensionID, component.KindExtension)and routereportFuncthrough the extension's ownComponentStatusChangedhandler (o.ComponentStatusChanged(selfInstanceID, event)) instead ofcomponentstatus.ReportStatus(host, event).ComponentStatusChangedalways forwards events into the extension's internal status aggregator, so self-reported events now reach it regardless of whether the host implementscomponentstatus.Reporter.Before / after
hostdidn't implementcomponentstatus.Reporter(e.g.componenttest.NewNopHost()), so the collector's health state never reflected them.Test added
TestReportFuncReachesStatusAggregatorRegardlessOfHostReporter(extension/opampextension/opamp_agent_test.go) starts the agent againstcomponenttest.NewNopHost()— which intentionally does not implementcomponentstatus.Reporter— reports a fatal error viao.reportFunc, and asserts the mock status aggregator receives it withStatusFatalErrorand the original error.Status
Fix was produced and verified by an automated CRG (code-review-graph) debug pass: discovery → adversarial verification → TDD fix (RED before the fix, GREEN after) → final gate. Final gate: clean.
Draft PR — opening for early feedback; not requesting review yet.