-
Notifications
You must be signed in to change notification settings - Fork 5k
Report subcomponent status for beats receivers #48015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
d7a630c to
b5b3a30
Compare
# Conflicts: # x-pack/filebeat/fbreceiver/receiver_test.go
b5b3a30 to
ac56fe5
Compare
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
leehinman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"unique-system-metrics-input-2-process": {
"error": "Error fetching data for metricset system.process: error fetching process list: non fatal error; reporting partial metrics: error fetching PID metrics for 377 processes, most likely a \"permission denied\" error. Enable debug logging to determine the exact cause.",
"status": "Degraded"
},Instead of keeping the control protocol statuses, can we use the healthhcheck extension statuses? e.g. This would almost certainly be a change required if sub-component status gets standardized upstream anyway. |
Sure, we can. It's the more idiomatic choice, even if it creates more work for elastic agent to convert it back. I figured that since we'll be changing this again once the upstream convention is in place, I'd just do the most convenient thing for us right now. I don't mind changing it if you think we should be more idiomatic, though. |
Done: b31ea4c. |
* Add input statuses to beat receiver status # Conflicts: # x-pack/filebeat/fbreceiver/receiver_test.go * Emit dummy status to force otel core to process it * Add unit tests * Add changelog entry * Switch to otel statuses for inputs (cherry picked from commit 6ba7b47)
#48056) * Report subcomponent status for beats receivers (#48015) * Add input statuses to beat receiver status # Conflicts: # x-pack/filebeat/fbreceiver/receiver_test.go * Emit dummy status to force otel core to process it * Add unit tests * Add changelog entry * Switch to otel statuses for inputs (cherry picked from commit 6ba7b47) * Fix linter warnings --------- Co-authored-by: Mikołaj Świątek <[email protected]>
Proposed commit message
Report subcomponent status for beats receivers
Make beats receivers report otel status for their subcomponents - inputs for filebeat and modules for metricbeat.
This is done via the otel status Event
Attributesfield. Under theinputskey, we add a map to the attributes, where input ids are keys, and statuses are values. The status structure is the same as the one used for streams in the control protocol. This is a temporary measure until there's a standard convention for doing this in upstream otel - then we'll switch to that. The purpose of this change is to let elastic-agent report per-stream and per-input status for inputs running in a beat receiver.We currently need to do a hacky workaround to ensure status events are delivered in spite of the component status not changing. This is due to open-telemetry/opentelemetry-collector#14282.
The output of the healthcheckv2 extension with this change looks like the following:
{ "components": { "receiver:metricbeatreceiver/_agent-component/system/metrics-default": { "healthy": true, "status": "StatusRecoverableError", "error": "Error fetching data for metricset system.process: error fetching process list: non fatal error; reporting partial metrics: error fetching PID metrics for 377 processes, most likely a \"permission denied\" error. Enable debug logging to determine the exact cause.", "status_time": "2025-12-10T18:19:53.552220344+01:00", "attributes": { "inputs": { "unique-system-metrics-input-2-cpu": { "error": "", "status": "Running" }, "unique-system-metrics-input-2-process": { "error": "Error fetching data for metricset system.process: error fetching process list: non fatal error; reporting partial metrics: error fetching PID metrics for 377 processes, most likely a \"permission denied\" error. Enable debug logging to determine the exact cause.", "status": "Degraded" }, "unique-system-metrics-input-cpu": { "error": "", "status": "Running" }, "unique-system-metrics-input-process": { "error": "Error fetching data for metricset system.process: non fatal error; reporting partial metrics: error fetching PID metrics for 377 processes, most likely a \"permission denied\" error. Enable debug logging to determine the exact cause.", "status": "Running" } } } } } }Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Related issues