Skip to content

[otel]: Add e2e test for monitoring metrics in otel mode #8009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

khushijain21
Copy link
Contributor

@khushijain21 khushijain21 commented Apr 28, 2025

What does this PR do?

This PR adds e2e tests for self-monitoring metrics exposed using beatreceivers. It also asserts document equivalency for metrics exposed by normal mode vs otel mode.

Why is it important?

Required to safely transition running elastic-agent in otel mode.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Related issues

@khushijain21 khushijain21 requested a review from a team as a code owner April 28, 2025 16:31
@khushijain21 khushijain21 marked this pull request as draft April 28, 2025 16:31
Copy link
Contributor

mergify bot commented Apr 28, 2025

This pull request does not have a backport label. Could you fix it @khushijain21? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@cmacknz
Copy link
Member

cmacknz commented Apr 28, 2025

Does this overlap with what #7622 intends to do?

@khushijain21
Copy link
Contributor Author

khushijain21 commented Apr 29, 2025

Does this overlap with what #7622 intends to do?

#7622 adds test for metric inputs whereas this is for monitoring metrics collected by elastic-agent.

@khushijain21 khushijain21 added skip-changelog backport-8.x Automated backport to the 8.x branch with mergify backport-8.19 Automated backport to the 8.19 branch backport-9.0 Automated backport to the 9.0 branch and removed backport-8.x Automated backport to the 8.x branch with mergify labels Apr 29, 2025
@khushijain21 khushijain21 changed the title [Draft]: Add e2e test for monitoring metrics in otel mode [Draft][otel]: Add e2e test for monitoring metrics in otel mode Apr 29, 2025
Copy link
Contributor

mergify bot commented Apr 30, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b e2emetrics upstream/e2emetrics
git merge upstream/main
git push upstream e2emetrics

@cmacknz
Copy link
Member

cmacknz commented May 1, 2025

Thanks for clarifying!

@khushijain21 khushijain21 marked this pull request as ready for review May 5, 2025 11:26
@khushijain21 khushijain21 changed the title [Draft][otel]: Add e2e test for monitoring metrics in otel mode [otel]: Add e2e test for monitoring metrics in otel mode May 5, 2025
})
}

var configTemplateOTel = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #8031 exists, instead of manually reproducing the configurations you could just install agent twice in the tests.

You can see an example of agent running twice in the same test in

out, err := fixture.Install(ctx, &opts)
if err != nil {
t.Logf("install output: %s", out)
require.NoError(t, err)
}
// Check that Agent was installed in the custom base path
topPath := filepath.Join(basePath, "Elastic", "Agent")
require.NoError(t, installtest.CheckSuccess(ctx, fixture, topPath, &installtest.CheckOpts{Privileged: opts.Privileged}))
t.Run("check agent package version", testAgentPackageVersion(ctx, fixture, true))
t.Run("check second agent installs with --namespace", testSecondAgentCanInstall(ctx, fixture, basePath, false, opts))

Then you will have a true side by side comparison with agent generating the configs itself. CC @mauri870 I think this idea probably applies to the test you are working on as well once the same thing is possible for non-monitoring metrics.

Reproducing the configs manually just decoupled you from having to wait for #8031, but once the feature flag to switch to beat receivers for monitoring exists it'll make sure we test with the latest config agent generates itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --namespace feature is what is used to implement the --develop support in https://github.com/elastic/elastic-agent?tab=readme-ov-file#development-installations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like #8031 is merged now.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 6, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at:

main...leehinman:elastic-agent:4876_agent_monitoring_tests

I think it has some ideas you can use. Specifically the changes to the rawQuery including the sort, and the use of match_phrase over match, and the initial criteria to find the events. Doing that made the ignoredFields much smaller and the results were consistent.

query: map[string]any{
// metric-elastic_agent.elastic_agent-* stores cpu metrics emitted by EA AND all running beats
// here, we only compare elastic-agent self metrics for simplicity
"component.id": "elastic-agent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it was easier to to pick a field name that lead to a unique kind of metric. For example beat.stats.memstats.rss

Copy link
Contributor

mergify bot commented May 7, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b e2emetrics upstream/e2emetrics
git merge upstream/main
git push upstream e2emetrics

@khushijain21 khushijain21 marked this pull request as ready for review May 23, 2025 05:22
@khushijain21 khushijain21 marked this pull request as draft May 23, 2025 08:57
@khushijain21
Copy link
Contributor Author

khushijain21 commented May 23, 2025

investigating why memory related stats (such as beat.stats.memory.rss are not available with runtime_experimental:otel mode

@khushijain21 khushijain21 marked this pull request as ready for review May 23, 2025 12:29
if failureThreshold != nil {
httpStream[failureThresholdKey] = *failureThreshold
// Do not create http streams if runtime-manager is otel and binary is of beat type
if compInfo.RuntimeManager != component.OtelRuntimeManager || !strings.HasSuffix(binaryName, "beat") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http/metrics pulls process related metrics from beats and sends it to metrics-elastic_agent.elastic_agent.* index. These metrics are not applicable for beatreceivers - hence dropping these streams for this special case

@pierrehilbert pierrehilbert requested review from swiatekm and removed request for pchila May 23, 2025 14:07
Copy link

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @khushijain21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch backport-9.0 Automated backport to the 9.0 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants