Skip to content

Add an option to copy FoundationDB cluster files to a writable temporary file #19684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jon-signal
Copy link
Contributor

What does this PR do?

This pull request adds an option to FoundationDB integration instances to allow the check to make a writable copy of the cluster file before passing it to the FoundationDB client. This closes #19677.

Motivation

Please see #19677 for a detailed description of the problem, but in short, FoundationDB clients want a writable copy of the cluster file. It can be hard to provide a writable copy when running in a Kubernetes environment (the most common way to get the cluster file in that case is by mounting a ConfigMap as a file, but that will always be read-only), and so this option provides a mechanism to satisfy the FoundationDB client without jumping through Terrible Ops Hoops™.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

cluster_file = self.instance.get('cluster_file')

if self.instance.get('copy_cluster_file'):
_, cluster_file = tempfile.mkstemp(suffix=".cluster")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a "right place" to clean up this temporary file (or other long-lived resources)? I notice that this check doesn't have a mechanism for closing the FoundationDB client, so perhaps just leaving the temporary file is the right thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd ideally clean this up at the end of every run of the check method as we are a creating a new file every time the check executes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to turn this around into a request for basic information, but I've actually really been struggling to find a clear articulation of the lifecycle of cluster checks in particular. Can you point me to the relevant docs or, failing that, tell me a little more about the lifecycle of a cluster check/its parent Python environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry about the delay- here is some information, however we have no docs that are very detailed about this: https://github.com/DataDog/datadog-agent/blob/44790aa282753af0d8a40a42aaaa38b26b9e916b/docs/dev/checks/README.md. What information in particular are you looking for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Well, I'm trying to reconcile some observed behavior with your concern about creating a new file every time the check executes.

What we saw in testing was that (from the perspective of the FoundationDB cluster) there was a single, long-lived FoundationDB client associated with the Datadog agent. To me, that looked like the cluster check got assigned to a single agent instance which then had a continually-running VM, and the FoundationDB client only got initialized once. If that was actually the case (and we didn't misinterpret something, which is very possible since this wasn't our focus at the time!), then I think the time to remove the temporary file would be at VM exit.

…but now that I've actually written that out, the time to remove the file is probably at VM exit regardless. The check will only try to create a new client (and a new temporary file) if it doesn't already have a client instance in memory:

    def construct_database(self):
        if self._db is not None:
            return self._db

…and self._db would only get cleared at VM exit. So I think (with apologies for the added detour) this is all moot, and we should just clean up the temporary file when the VM exists. Whew 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thank you, I see now, that makes a lot of sense. I was not familiar enough with the check but if the connection is persisted then we will not need to clean up the file on every check run.

Comment on lines +41 to +43
@pytest.fixture
def copy_cluster_file_instance():
return COPY_CLUSTER_FILE_INSTANCE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very new to Python unit tests and recognize that this may not be the right way to do things. I'm very open to feedback!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not creating a new test- it is creating a test fixture that can then be used in tests. If you have a test in mind for this change let us know and we can help you write it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged! That was actually my intent; my thought was that it made sense to reuse the existing integration tests with the new option to copy the cluster file (i.e. "can we still talk to FoundationDB when we initialize a client in this manner?"), but I'm very open to feedback that this isn't the ideal way to test things. What's your preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, to do this could you create a new test in the test_foundationdb.py file similar to test_integ, except using the new copy_cluster_file_instance to configure the check?

Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.63%. Comparing base (f5532e4) to head (de59b5b).

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
foundationdb 81.88% <66.66%> (-0.58%) ⬇️
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jon-signal jon-signal force-pushed the copy_foundationdb_cluster_file_option branch 2 times, most recently from 5509e1f to 91a010b Compare February 21, 2025 20:26
@hestonhoffman hestonhoffman added the editorial review Waiting on a more in-depth review from a docs team editor label Feb 21, 2025
@hestonhoffman
Copy link
Contributor

Hello from the docs team 👋
I'm going to hold off on reviewing the stuff in the spec file until the Agent team has taken a look, in case their review necessitates changes to that file. I'll check back in, but feel free to ping me after the Agent team's review if I haven't reviewed yet.

@jon-signal jon-signal force-pushed the copy_foundationdb_cluster_file_option branch from 91a010b to de59b5b Compare March 18, 2025 15:59
@jon-signal
Copy link
Contributor Author

Hey, @steveny91, I really appreciate all the review on the bazillions of other FoundationDB pull requests over the past few weeks! I recognize that those were more metric-y and that this is a little bit more of a plumbing thing; are you still the right person to take a look at this one? Thanks kindly!

@jon-signal
Copy link
Contributor Author

Friends, with respect, it sounds like there might not be much interest in this contribution. That's okay if so! But it'd be great to get an opinion one way or the other!

@sarah-witt
Copy link
Contributor

I'm very sorry @jon-signal I started looking at this PR but it must have slipped through. Taking a look now!

@jon-signal
Copy link
Contributor Author

Thank you, and no worries! I really appreciate all the review you(se) have already done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Difficult to provide FoundationDB cluster files in Kubernetes environments
3 participants