-
Notifications
You must be signed in to change notification settings - Fork 815
[dns] Allow expected address to be checked in dns_check. #2799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
assert(answer.rrset.items[0].to_text()) | ||
end_time = time.time() | ||
if len(expected_results) > 0: | ||
if resolved_value not in expected_results: | ||
raise Exception('DNS resolution of %s resulted in unexpected address %s.' % (hostname, resolved_value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a better way than raising an exception, since this verification can only take place if the DNS resolution was done correctly: put this block after the else
, before the if end_time - start_time > 0
, and log an error + service critical. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed, this is also a way to fix https://travis-ci.org/DataDog/dd-agent/jobs/156839184#L608
# Specify expected results if you want to check that the returned IP | ||
# matches. If you supply multiple records then a match to any one will | ||
# be considered success | ||
# addressses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two s
needed 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @gphat !
I added a few comments, and we have to decide what happens when there are multiple addresses. Let me know what you think !
|
||
status = AgentCheck.CRITICAL | ||
start_time = time.time() | ||
try: | ||
self.log.debug('Querying "%s" record for hostname "%s"...' % (record_type, hostname)) | ||
answer = resolver.query(hostname, rdtype=record_type) | ||
resolved_value = answer.rrset.items[0].to_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about multiple DNS records ? We should probably takes all the addresses and try to see if one of them matches (all of them maybe makes more sense). What do you think ?
Looks like this might be right now? I need to verify this on my end as well… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @gphat, a few new comments. 😃
self.log.exception('DNS resolution of %s has failed.' % hostname) | ||
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance)) | ||
except Exception as err: | ||
self.log.exception(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the exception is not needed, log.exception
prints the exception anyway. :)
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance)) | ||
except Exception as err: | ||
self.log.exception(err) | ||
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance), message=err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what kind of exception is going to be raised ? I'm wondering if seeing the python exception is useful.
raise | ||
else: | ||
if len(expected_results) > 0: | ||
missing_values = list(set(expected_results, resolved_results)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing_values = [r for r in expected_results if r not in resolved_results]
maybe ? Unless you have a better way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer list(set(expected_results) - set(resolved_results))
for efficiency unless there’s a reason to prefer the comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your method is also way easier to read/understand. 👍
missing_values = list(set(expected_results, resolved_results)) | ||
if len(missing_values) > 0: | ||
self.log.error('DNS resolution of %s did not contain expected address(es) %s.' % (hostname, ", ".join(missing_values))) | ||
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A OK
service check might be sent L74 even when this CRITICAL is sent.
One way to solve this would be to handle the case end_time < start_time
first (and maybe send a service check with WARNING ?), then the check for the missing values and finally the OK service check.
What do you think @hkaj ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove the if end_time - start_time > 0:
condition and just do:
if len(missing_values) > 0:
self.log.error('DNS resolution of %s did not contain expected address(es) %s.' % (hostname, ", ".join(missing_values)))
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance))
else:
self.service_check(self.SERVICE_CHECK_NAME, AgentCheck.OK, tags=self._get_tags(instance))
self.gauge('dns.response_time', end_time - start_time, tags=tags)
|
||
status = AgentCheck.CRITICAL | ||
start_time = time.time() | ||
try: | ||
self.log.debug('Querying "%s" record for hostname "%s"...' % (record_type, hostname)) | ||
answer = resolver.query(hostname, rdtype=record_type) | ||
assert(answer.rrset.items[0].to_text()) | ||
resolved_results = map(lambda x: x.to_text(), answer.rrset.items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't fail when the first item is the empty string, whereas it was failing previously. Maybe filter this resolved_results to remove empty strings and later check that resolved_results != [] ?
Oops, kinda forgot about this. :) |
Hey @cory-stripe, my most recent PR #2924 impacts your work here. I'm moving the DNS check over to the integrations-core repo. The changes here are great, we hope you'll move this PR to the new repo! |
Ok, I'll check in to this soon. Thanks! |
What does this PR do?
Allows the DNS check to verify the resolved address against a list of expected values.
Motivation
We have a number of DNS sanity checks that we check for known IPs. Think of public-facing ingestion points for which IP addresses are published.
Testing Guidelines
Includes unit tests within existing dns_check tests.
Additional Notes
I like your new PR format. 😸
cc @rhwlo