-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add an option to copy FoundationDB cluster files to a writable temporary file #19684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add an option to copy FoundationDB cluster files to a writable temporary file #19684
Conversation
cluster_file = self.instance.get('cluster_file') | ||
|
||
if self.instance.get('copy_cluster_file'): | ||
_, cluster_file = tempfile.mkstemp(suffix=".cluster") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a "right place" to clean up this temporary file (or other long-lived resources)? I notice that this check doesn't have a mechanism for closing the FoundationDB client, so perhaps just leaving the temporary file is the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd ideally clean this up at the end of every run of the check
method as we are a creating a new file every time the check executes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to turn this around into a request for basic information, but I've actually really been struggling to find a clear articulation of the lifecycle of cluster checks in particular. Can you point me to the relevant docs or, failing that, tell me a little more about the lifecycle of a cluster check/its parent Python environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Sorry about the delay- here is some information, however we have no docs that are very detailed about this: https://github.com/DataDog/datadog-agent/blob/44790aa282753af0d8a40a42aaaa38b26b9e916b/docs/dev/checks/README.md. What information in particular are you looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Well, I'm trying to reconcile some observed behavior with your concern about creating a new file every time the check executes.
What we saw in testing was that (from the perspective of the FoundationDB cluster) there was a single, long-lived FoundationDB client associated with the Datadog agent. To me, that looked like the cluster check got assigned to a single agent instance which then had a continually-running VM, and the FoundationDB client only got initialized once. If that was actually the case (and we didn't misinterpret something, which is very possible since this wasn't our focus at the time!), then I think the time to remove the temporary file would be at VM exit.
…but now that I've actually written that out, the time to remove the file is probably at VM exit regardless. The check will only try to create a new client (and a new temporary file) if it doesn't already have a client instance in memory:
def construct_database(self):
if self._db is not None:
return self._db
…and self._db
would only get cleared at VM exit. So I think (with apologies for the added detour) this is all moot, and we should just clean up the temporary file when the VM exists. Whew 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thank you, I see now, that makes a lot of sense. I was not familiar enough with the check but if the connection is persisted then we will not need to clean up the file on every check run.
@pytest.fixture | ||
def copy_cluster_file_instance(): | ||
return COPY_CLUSTER_FILE_INSTANCE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very new to Python unit tests and recognize that this may not be the right way to do things. I'm very open to feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not creating a new test- it is creating a test fixture that can then be used in tests. If you have a test in mind for this change let us know and we can help you write it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged! That was actually my intent; my thought was that it made sense to reuse the existing integration tests with the new option to copy the cluster file (i.e. "can we still talk to FoundationDB when we initialize a client in this manner?"), but I'm very open to feedback that this isn't the ideal way to test things. What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, to do this could you create a new test in the test_foundationdb.py
file similar to test_integ
, except using the new copy_cluster_file_instance
to configure the check?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
5509e1f
to
91a010b
Compare
Hello from the docs team 👋 |
91a010b
to
de59b5b
Compare
Hey, @steveny91, I really appreciate all the review on the bazillions of other FoundationDB pull requests over the past few weeks! I recognize that those were more metric-y and that this is a little bit more of a plumbing thing; are you still the right person to take a look at this one? Thanks kindly! |
Friends, with respect, it sounds like there might not be much interest in this contribution. That's okay if so! But it'd be great to get an opinion one way or the other! |
I'm very sorry @jon-signal I started looking at this PR but it must have slipped through. Taking a look now! |
Thank you, and no worries! I really appreciate all the review you(se) have already done! |
What does this PR do?
This pull request adds an option to FoundationDB integration instances to allow the check to make a writable copy of the cluster file before passing it to the FoundationDB client. This closes #19677.
Motivation
Please see #19677 for a detailed description of the problem, but in short, FoundationDB clients want a writable copy of the cluster file. It can be hard to provide a writable copy when running in a Kubernetes environment (the most common way to get the cluster file in that case is by mounting a
ConfigMap
as a file, but that will always be read-only), and so this option provides a mechanism to satisfy the FoundationDB client without jumping through Terrible Ops Hoops™.Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged