Skip to content

Conversation

@swiatekm
Copy link
Contributor

What does this PR do?

Why is it important?

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@mergify
Copy link
Contributor

mergify bot commented Sep 29, 2025

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@elastic-sonarqube
Copy link

@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-default branch from ca48a67 to 91a3eee Compare September 30, 2025 13:19
@mergify
Copy link
Contributor

mergify bot commented Oct 1, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feat/self-monitoring-otel-default upstream/feat/self-monitoring-otel-default
git merge upstream/main
git push upstream feat/self-monitoring-otel-default

@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-default branch 2 times, most recently from 0b48117 to 58c6ae1 Compare October 1, 2025 12:03
@cmacknz cmacknz added the backport-9.2 Automated backport to the 9.2 branch label Oct 1, 2025
@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-default branch 2 times, most recently from cf0a4a9 to 3cf99b9 Compare October 7, 2025 11:54
@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-default branch 4 times, most recently from 4bbb647 to b70dc80 Compare October 14, 2025 19:03
@swiatekm swiatekm force-pushed the feat/self-monitoring-otel-default branch from 2e1e851 to c26dee6 Compare October 15, 2025 18:41
@cmacknz
Copy link
Member

cmacknz commented Oct 15, 2025

I can reproduce the endpoint test failure if I do it manually. The reproduction is exactly what the test does:

  1. Enroll agent in a policy with defend that uses beats receivers
  2. Unenroll the agent via the Fleet UI
  3. Wait for Defend to be removed
  4. Observe the state of the beats receivers for monitoring continue to be reported
  5. Also observe the collector sub-process is not running. This seems to be a bug where we are not clearing the state on unenroll. Unenroll through fleet sends agent and empty agent policy to execute to remove all inputs. We seem to stop the collector but not clear its status.
ubuntu@ubuntu24:~/elastic-agent-9.3.0-SNAPSHOT-linux-arm64$ systemctl status elastic-agent
● elastic-agent.service - Elastic Agent is a unified agent to observe, monitor and protect your system.
     Loaded: loaded (/etc/systemd/system/elastic-agent.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-10-15 16:15:04 EDT; 5min ago
   Main PID: 7497 (elastic-agent)
      Tasks: 12 (limit: 1059)
     Memory: 135.7M (peak: 637.1M)
        CPU: 5.586s
     CGroup: /system.slice/elastic-agent.service
             └─7497 elastic-agent

ubuntu@ubuntu24:~/elastic-agent-9.3.0-SNAPSHOT-linux-arm64$ sudo elastic-agent status --output=full
┌─ fleet
│  └─ status: (HEALTHY) Connected
└─ elastic-agent
   ├─ status: (HEALTHY) Running
   ├─ info
   │  ├─ id: 5bcd2335-398b-4a39-aa67-9185ccb0538f
   │  ├─ version: 9.3.0
   │  └─ commit: c26dee602037f4808433a7a40d9fcb7097555267
   ├─ beat/metrics-monitoring
   │  ├─ status: (HEALTHY) HEALTHY
   │  ├─ beat/metrics-monitoring
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  └─ beat/metrics-monitoring-metrics-monitoring-beats
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
   ├─ filestream-monitoring
   │  ├─ status: (HEALTHY) HEALTHY
   │  ├─ filestream-monitoring
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  └─ filestream-monitoring-filestream-monitoring-agent
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
   ├─ http/metrics-monitoring
   │  ├─ status: (HEALTHY) HEALTHY
   │  ├─ http/metrics-monitoring
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  └─ http/metrics-monitoring-metrics-monitoring-agent
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
   └─ prometheus/metrics-monitoring
      ├─ status: (HEALTHY) HEALTHY
      ├─ prometheus/metrics-monitoring
      │  ├─ status: (HEALTHY) Healthy
      │  └─ type: OUTPUT
      └─ prometheus/metrics-monitoring-metrics-monitoring-collector
         ├─ status: (HEALTHY) Healthy
         └─ type: INPUT

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 15, 2025

@cmacknz
Copy link
Member

cmacknz commented Oct 15, 2025

Doing the above with debugs logs give me the following relevant logs:

{"log.level":"info","@timestamp":"2025-10-15T20:45:35.223Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).refreshComponentModel","file.name":"coordinator/coordinator.go","file.line":1769},"message":"Updating running component model","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2025-10-15T20:45:35.223Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).refreshComponentModel","file.name":"coordinator/coordinator.go","file.line":1770},"message":"Updating running component model","log":{"source":"elastic-agent"},"components":[],"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2025-10-15T20:45:35.223Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).updateManagersWithConfig","file.name":"coordinator/coordinator.go","file.line":1779},"message":"Updating runtime manager model","log":{"source":"elastic-agent"},"components":null,"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2025-10-15T20:45:35.224Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).updateManagersWithConfig","file.name":"coordinator/coordinator.go","file.line":1781},"message":"Updating otel manager model","log":{"source":"elastic-agent"},"components":null,"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2025-10-15T20:45:35.227Z","log.logger":"otel_manager","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/otel/manager.(*OTelManager).Run","file.name":"manager/manager.go","file.line":327},"message":"new config hash ([]) is different than the old config hash ([167 69 55 143 124 32 164 165 106 69 11 110 206 119 75 169]), applying update","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

@cmacknz
Copy link
Member

cmacknz commented Oct 15, 2025

What I think is happening is we are sending an empty configuration to the otel manager which stops the collector and it isn't clearing the previous status.

@cmacknz
Copy link
Member

cmacknz commented Oct 16, 2025

Created a separate bug for the endpoint test failure on unenroll #10634

@swiatekm
Copy link
Contributor Author

Closing in favor of #10594

@swiatekm swiatekm closed this Oct 28, 2025
@swiatekm swiatekm deleted the feat/self-monitoring-otel-default branch October 28, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-9.2 Automated backport to the 9.2 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[beats receivers] Enable beats receivers for internal monitoring data collection by default

4 participants