Skip to content

Allow using beats receivers for self-monitoring #8031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 7, 2025

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Apr 29, 2025

What does this PR do?

Adds the ability to use beats receivers for agent self-monitoring. To do so, we add a new configuration key to agent.monitoring named _runtime_experimental - identical to how you can currently switch inputs to the Otel runtime.

In terms of implementation, the changes are very straightforward. In the monitoring injection manager, we set the runtime manager for inputs we add, if it's set in the monitoring configuration.

Most of this PR's code changes lie in tests, and more specifically in the TestAgentMonitoring E2E test. This test compares the data collected by agent self-monitoring using beats processes to an equivalent Otel configuration of beats receivers in Hybrid mode. Instead of doing that, we can now just change agent.monitoring._runtime_experimental, so the test becomes much simpler conceptually.

I have simplified some of the test logic, but I haven't yet made it compare metrics. This should be doable now, but we have another PR (#8009 ) in-flight doing it, so I held off.

Why is it important?

We want to be able to use beats receivers for agent self-monitoring.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Build the agent locally and use the following configuration:

outputs:
  default:
    type: elasticsearch
    hosts: [...]
    username: elastic
    password: "..."

inputs: []

agent:
  monitoring:
    metrics: true
    logs: true
    _runtime_experimental: otel

Looking at Kibana dashboards for the agent integration can prove the data is actually being ingested. You can verify that beats receivers are being used for self-monitoring by looking at their CPU usage - it should be 0.

Related issues

Copy link
Contributor

mergify bot commented Apr 29, 2025

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@swiatekm swiatekm added skip-changelog backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches backport-8.19 Automated backport to the 8.19 branch enhancement New feature or request labels Apr 29, 2025
@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-runtime branch from 4363785 to 4b94fd4 Compare April 29, 2025 17:15
@swiatekm
Copy link
Contributor Author

The test failures are due to the beats update. I'm going to do that in a separate PR for clarity: #8041.

Copy link
Contributor

mergify bot commented Apr 30, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feat/self-monitoring-otel-runtime upstream/feat/self-monitoring-otel-runtime
git merge upstream/main
git push upstream feat/self-monitoring-otel-runtime

@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-runtime branch 3 times, most recently from 4ccd165 to 2d9c182 Compare April 30, 2025 13:13
@swiatekm swiatekm marked this pull request as ready for review April 30, 2025 14:36
@swiatekm swiatekm requested a review from a team as a code owner April 30, 2025 14:36
@swiatekm swiatekm requested a review from ycombinator April 30, 2025 14:36
@swiatekm swiatekm changed the title Allow using the otel runtime for self-monitoring Allow using beats receivers for self-monitoring Apr 30, 2025
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Apr 30, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

swiatekm added 2 commits May 5, 2025 16:17
# Conflicts:
#	internal/pkg/agent/application/monitoring/v1_monitor.go

# Conflicts:
#	internal/pkg/otel/configtranslate/otelconfig.go
@swiatekm swiatekm requested review from ycombinator and cmacknz May 5, 2025 17:05
@cmacknz
Copy link
Member

cmacknz commented May 5, 2025

This LGTM and testing locally I see the monitoring receivers running. It would be nice if we can make sure that only the beat receivers are used for monitoring, right now we have at least the beat receivers are used for monitoring in the tests.

sudo elastic-development-agent status --output=full
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (HEALTHY) Running
   ├─ info
   │  ├─ id: 9293b312-f874-4866-bb1c-ebc21244c75c
   │  ├─ version: 9.1.0
   │  └─ commit: 3b2fe0010f4075f5dd47fad96a4b2c1dc5a97f52
   ├─ filestream-default
   │  ├─ status: (HEALTHY) Healthy: communicating with pid '82070'
   │  ├─ filestream-default
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  └─ filestream-default-your-input-id
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
   ├─ system/metrics-default
   │  ├─ status: (HEALTHY) Healthy: communicating with pid '82069'
   │  ├─ system/metrics-default
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  └─ system/metrics-default-unique-system-metrics-input
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
   ├─ pipeline:logs/_agent-component/beat/metrics-monitoring
   │  ├─ status: StatusOK
   │  ├─ exporter:elasticsearch/_agent-component/monitoring
   │  │  └─ status: StatusOK
   │  └─ receiver:metricbeatreceiver/_agent-component/beat/metrics-monitoring
   │     └─ status: StatusOK
   ├─ pipeline:logs/_agent-component/filestream-monitoring
   │  ├─ status: StatusOK
   │  ├─ exporter:elasticsearch/_agent-component/monitoring
   │  │  └─ status: StatusOK
   │  └─ receiver:filebeatreceiver/_agent-component/filestream-monitoring
   │     └─ status: StatusOK
   └─ pipeline:logs/_agent-component/http/metrics-monitoring
      ├─ status: StatusOK
      ├─ exporter:elasticsearch/_agent-component/monitoring
      │  └─ status: StatusOK
      └─ receiver:metricbeatreceiver/_agent-component/http/metrics-monitoring
         └─ status: StatusOK

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 6, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@swiatekm swiatekm requested a review from cmacknz May 6, 2025 09:42
Co-authored-by: Khushi Jain <[email protected]>
@swiatekm swiatekm requested a review from khushijain21 May 6, 2025 11:50
Copy link

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @swiatekm

@swiatekm swiatekm merged commit a31d56f into main May 7, 2025
12 checks passed
@swiatekm swiatekm deleted the feat/self-monitoring-otel-runtime branch May 7, 2025 09:24
Copy link
Contributor

github-actions bot commented May 7, 2025

@Mergifyio backport 9.0

Copy link
Contributor

mergify bot commented May 7, 2025

backport 9.0

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 7, 2025
* Handle nil case in monitoring config parsing

* Allow using otel runtime for self-monitoring

# Conflicts:
#	internal/pkg/agent/application/monitoring/v1_monitor.go

# Conflicts:
#	internal/pkg/otel/configtranslate/otelconfig.go

* Modify e2e test

* Make the monitoring e2e test more restrictive

* Revert "Handle nil case in monitoring config parsing"

This reverts commit bb11a0f.

* Check receiver statuses in e2e test

* Send data from beats processes and receivers to different namespaces

* Check all component statuses in E2E test

* Fix typo

Co-authored-by: Khushi Jain <[email protected]>

---------

Co-authored-by: Khushi Jain <[email protected]>
(cherry picked from commit a31d56f)
mergify bot pushed a commit that referenced this pull request May 7, 2025
* Handle nil case in monitoring config parsing

* Allow using otel runtime for self-monitoring

# Conflicts:
#	internal/pkg/agent/application/monitoring/v1_monitor.go

# Conflicts:
#	internal/pkg/otel/configtranslate/otelconfig.go

* Modify e2e test

* Make the monitoring e2e test more restrictive

* Revert "Handle nil case in monitoring config parsing"

This reverts commit bb11a0f.

* Check receiver statuses in e2e test

* Send data from beats processes and receivers to different namespaces

* Check all component statuses in E2E test

* Fix typo

Co-authored-by: Khushi Jain <[email protected]>

---------

Co-authored-by: Khushi Jain <[email protected]>
(cherry picked from commit a31d56f)
swiatekm added a commit that referenced this pull request May 7, 2025
* Handle nil case in monitoring config parsing

* Allow using otel runtime for self-monitoring

# Conflicts:
#	internal/pkg/agent/application/monitoring/v1_monitor.go

# Conflicts:
#	internal/pkg/otel/configtranslate/otelconfig.go

* Modify e2e test

* Make the monitoring e2e test more restrictive

* Revert "Handle nil case in monitoring config parsing"

This reverts commit bb11a0f.

* Check receiver statuses in e2e test

* Send data from beats processes and receivers to different namespaces

* Check all component statuses in E2E test

* Fix typo



---------


(cherry picked from commit a31d56f)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Khushi Jain <[email protected]>
ycombinator pushed a commit that referenced this pull request May 8, 2025
* Handle nil case in monitoring config parsing

* Allow using otel runtime for self-monitoring

# Conflicts:
#	internal/pkg/agent/application/monitoring/v1_monitor.go

# Conflicts:
#	internal/pkg/otel/configtranslate/otelconfig.go

* Modify e2e test

* Make the monitoring e2e test more restrictive

* Revert "Handle nil case in monitoring config parsing"

This reverts commit bb11a0f.

* Check receiver statuses in e2e test

* Send data from beats processes and receivers to different namespaces

* Check all component statuses in E2E test

* Fix typo



---------


(cherry picked from commit a31d56f)

Co-authored-by: Mikołaj Świątek <[email protected]>
Co-authored-by: Khushi Jain <[email protected]>
v1v added a commit to v1v/elastic-agent that referenced this pull request May 8, 2025
* upstream/main:
  Guard against `nil` pointer dereference (elastic#8107)
  Generate NOTICE.txt with only modules used by binaries (elastic#8053)
  Retry enrollment requests when an error is returned, add enrollment timeout (elastic#8056)
  Changelog for 8.17.6 version (elastic#8062) (elastic#8106)
  [main][Automation] Update versions (elastic#8098)
  Allow using beats receivers for self-monitoring (elastic#8031)
  Adding new configuration setting: `agent.upgrade.rollback.window` (elastic#8065)
  [Integration Testing] Allow tests to declare themselves as needing a FIPS environment (elastic#8083)
  fix(agentless): overcome SIGPIPE in agentless promotion pipeline (elastic#8094)
  ksm autosharing integration configuration update (elastic#8086)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches enhancement New feature or request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants