Skip to content

[dns] Allow expected address to be checked in dns_check. #2799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

[dns] Allow expected address to be checked in dns_check. #2799

wants to merge 5 commits into from

Conversation

gphat
Copy link
Contributor

@gphat gphat commented Aug 31, 2016

What does this PR do?

Allows the DNS check to verify the resolved address against a list of expected values.

Motivation

We have a number of DNS sanity checks that we check for known IPs. Think of public-facing ingestion points for which IP addresses are published.

Testing Guidelines

Includes unit tests within existing dns_check tests.

Additional Notes

I like your new PR format. 😸

cc @rhwlo

@gphat gphat changed the title Allow expected address to be checked. Allow expected address to be checked in dns_check. Aug 31, 2016
@gphat gphat changed the title Allow expected address to be checked in dns_check. [dns] Allow expected address to be checked in dns_check. Aug 31, 2016
assert(answer.rrset.items[0].to_text())
end_time = time.time()
if len(expected_results) > 0:
if resolved_value not in expected_results:
raise Exception('DNS resolution of %s resulted in unexpected address %s.' % (hostname, resolved_value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better way than raising an exception, since this verification can only take place if the DNS resolution was done correctly: put this block after the else, before the if end_time - start_time > 0, and log an error + service critical. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed, this is also a way to fix https://travis-ci.org/DataDog/dd-agent/jobs/156839184#L608

# Specify expected results if you want to check that the returned IP
# matches. If you supply multiple records then a match to any one will
# be considered success
# addressses:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two s needed 😃

Copy link
Member

@degemer degemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @gphat !

I added a few comments, and we have to decide what happens when there are multiple addresses. Let me know what you think !


status = AgentCheck.CRITICAL
start_time = time.time()
try:
self.log.debug('Querying "%s" record for hostname "%s"...' % (record_type, hostname))
answer = resolver.query(hostname, rdtype=record_type)
resolved_value = answer.rrset.items[0].to_text()
Copy link
Member

@degemer degemer Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about multiple DNS records ? We should probably takes all the addresses and try to see if one of them matches (all of them maybe makes more sense). What do you think ?

@gphat
Copy link
Contributor Author

gphat commented Sep 23, 2016

Looks like this might be right now? I need to verify this on my end as well…

@degemer degemer self-assigned this Sep 27, 2016
@degemer degemer modified the milestones: 5.10.0, Triage Oct 3, 2016
Copy link
Member

@degemer degemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @gphat, a few new comments. 😃

self.log.exception('DNS resolution of %s has failed.' % hostname)
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance))
except Exception as err:
self.log.exception(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the exception is not needed, log.exception prints the exception anyway. :)

self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance))
except Exception as err:
self.log.exception(err)
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance), message=err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what kind of exception is going to be raised ? I'm wondering if seeing the python exception is useful.

raise
else:
if len(expected_results) > 0:
missing_values = list(set(expected_results, resolved_results))
Copy link
Member

@degemer degemer Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_values = [r for r in expected_results if r not in resolved_results]

maybe ? Unless you have a better way to do it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer list(set(expected_results) - set(resolved_results)) for efficiency unless there’s a reason to prefer the comprehension

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your method is also way easier to read/understand. 👍

missing_values = list(set(expected_results, resolved_results))
if len(missing_values) > 0:
self.log.error('DNS resolution of %s did not contain expected address(es) %s.' % (hostname, ", ".join(missing_values)))
self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A OK service check might be sent L74 even when this CRITICAL is sent.
One way to solve this would be to handle the case end_time < start_time first (and maybe send a service check with WARNING ?), then the check for the missing values and finally the OK service check.
What do you think @hkaj ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove the if end_time - start_time > 0: condition and just do:

if len(missing_values) > 0:
    self.log.error('DNS resolution of %s did not contain expected address(es) %s.' % (hostname, ", ".join(missing_values)))
    self.service_check(self.SERVICE_CHECK_NAME, status, tags=self._get_tags(instance))
else:
    self.service_check(self.SERVICE_CHECK_NAME, AgentCheck.OK, tags=self._get_tags(instance))
self.gauge('dns.response_time', end_time - start_time, tags=tags)


status = AgentCheck.CRITICAL
start_time = time.time()
try:
self.log.debug('Querying "%s" record for hostname "%s"...' % (record_type, hostname))
answer = resolver.query(hostname, rdtype=record_type)
assert(answer.rrset.items[0].to_text())
resolved_results = map(lambda x: x.to_text(), answer.rrset.items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't fail when the first item is the empty string, whereas it was failing previously. Maybe filter this resolved_results to remove empty strings and later check that resolved_results != [] ?

@cory-stripe
Copy link
Contributor

Oops, kinda forgot about this. :)

@sjenriquez
Copy link
Contributor

Hey @cory-stripe, my most recent PR #2924 impacts your work here. I'm moving the DNS check over to the integrations-core repo. The changes here are great, we hope you'll move this PR to the new repo!

@sjenriquez sjenriquez assigned irabinovitch and unassigned degemer and sjenriquez Jan 31, 2017
@cory-stripe
Copy link
Contributor

Ok, I'll check in to this soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants