Skip to content

[agent_metrics] add metrics for num_metrics and num_events #2899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cberry777
Copy link
Contributor

What does this PR do?

Adds additional metrics for num_metrics and num_events to agent_metrics

Motivation

It is very important to monitor the number of metrics and events emitted from each agent. It allows us to 1) keep track of the total number of metrics sent to Datadog (to monitor billing), and 2) locate rogue agents emitting above some threshold.

Testing Guidelines

A test is provided: /tests/checks/mock/test_agent_metrics.py (# test_num_metrics)

Additional Notes

An optional switch is provided (in the init_config) that allows one to log the number of metrics and events for each collection run.

@remh remh added this to the 5.11.0 milestone Oct 27, 2016
@remh
Copy link

remh commented Oct 27, 2016

Thanks @cberry777
FYI, i don't think it will be really useful billing wise as:

  • This just counts metrics coming from checks.d and not old style checks (most of system metrics)
  • It doesn't count dogstatsd metrics
  • It doesn't differentiate between integration metrics and custom metrics

However, it might be useful to track that so we'll get it merged for our 5.11 release.

Can you have a look at the failing tests please ?

@cberry777
Copy link
Contributor Author

Again. Tests are failing that have nothing to do with this code.
I think that the test suite is unstable??

FAIL: Support SNMP scalar objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/DataDog/dd-agent/tests/checks/integration/test_snmp.py", line 267, in test_scalar
    self.assertMetric(metric_name, tags=self.CHECK_TAGS, count=1)
  File "/home/travis/build/DataDog/dd-agent/tests/checks/common.py", line 350, in assertMetric
    self._candidates_size_assert(candidates, count=count, at_least=at_least)
  File "/home/travis/build/DataDog/dd-agent/tests/checks/common.py", line 320, in _candidates_size_assert
    "Needed exactly %d candidates, got %d" % (count, len(candidates))

All tests pass when I run "rake"


Ran 176 tests in 20.422s

OK (SKIP=1)
Cleaning up

@cberry777
Copy link
Contributor Author

Is there a way to "re-fire" the test suite??
(without forcing a bogus commit)

@masci
Copy link
Contributor

masci commented Dec 3, 2016

It was a flaky test, all green now.

@gmmeyer
Copy link
Contributor

gmmeyer commented Jul 6, 2017

Hey @cberry777! Thanks a lot for your contribution.

I think I missed this one when we went through our SDK move. This should be moved to our Integrations Core repo and closed here. I looked it over and don't see anything standout that needs to be changed. If you move it I see no reason it couldn't be merged easily!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants