Add an option to copy FoundationDB cluster files to a writable temporary file #19684

jon-signal · 2025-02-21T20:10:34Z

What does this PR do?

This pull request adds an option to FoundationDB integration instances to allow the check to make a writable copy of the cluster file before passing it to the FoundationDB client. This closes #19677.

Motivation

Please see #19677 for a detailed description of the problem, but in short, FoundationDB clients want a writable copy of the cluster file. It can be hard to provide a writable copy when running in a Kubernetes environment (the most common way to get the cluster file in that case is by mounting a ConfigMap as a file, but that will always be read-only), and so this option provides a mechanism to satisfy the FoundationDB client without jumping through Terrible Ops Hoops™.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

jon-signal · 2025-02-21T20:12:01Z

foundationdb/datadog_checks/foundationdb/check.py

+            cluster_file = self.instance.get('cluster_file')
+
+            if self.instance.get('copy_cluster_file'):
+                _, cluster_file = tempfile.mkstemp(suffix=".cluster")


Is there a "right place" to clean up this temporary file (or other long-lived resources)? I notice that this check doesn't have a mechanism for closing the FoundationDB client, so perhaps just leaving the temporary file is the right thing to do.

I think we'd ideally clean this up at the end of every run of the check method as we are a creating a new file every time the check executes.

I'm sorry to turn this around into a request for basic information, but I've actually really been struggling to find a clear articulation of the lifecycle of cluster checks in particular. Can you point me to the relevant docs or, failing that, tell me a little more about the lifecycle of a cluster check/its parent Python environment?

Hi! Sorry about the delay- here is some information, however we have no docs that are very detailed about this: https://github.com/DataDog/datadog-agent/blob/44790aa282753af0d8a40a42aaaa38b26b9e916b/docs/dev/checks/README.md. What information in particular are you looking for?

Hm. Well, I'm trying to reconcile some observed behavior with your concern about creating a new file every time the check executes.

What we saw in testing was that (from the perspective of the FoundationDB cluster) there was a single, long-lived FoundationDB client associated with the Datadog agent. To me, that looked like the cluster check got assigned to a single agent instance which then had a continually-running VM, and the FoundationDB client only got initialized once. If that was actually the case (and we didn't misinterpret something, which is very possible since this wasn't our focus at the time!), then I think the time to remove the temporary file would be at VM exit.

…but now that I've actually written that out, the time to remove the file is probably at VM exit regardless. The check will only try to create a new client (and a new temporary file) if it doesn't already have a client instance in memory:

def construct_database(self): if self._db is not None: return self._db

…and self._db would only get cleared at VM exit. So I think (with apologies for the added detour) this is all moot, and we should just clean up the temporary file when the VM exists. Whew 😅

Ah thank you, I see now, that makes a lot of sense. I was not familiar enough with the check but if the connection is persisted then we will not need to clean up the file on every check run.

jon-signal · 2025-02-21T20:12:30Z

foundationdb/tests/conftest.py

+@pytest.fixture
+def copy_cluster_file_instance():
+    return COPY_CLUSTER_FILE_INSTANCE


I am very new to Python unit tests and recognize that this may not be the right way to do things. I'm very open to feedback!

This is not creating a new test- it is creating a test fixture that can then be used in tests. If you have a test in mind for this change let us know and we can help you write it!

Acknowledged! That was actually my intent; my thought was that it made sense to reuse the existing integration tests with the new option to copy the cluster file (i.e. "can we still talk to FoundationDB when we initialize a client in this manner?"), but I'm very open to feedback that this isn't the ideal way to test things. What's your preference?

Ah I see, to do this could you create a new test in the test_foundationdb.py file similar to test_integ, except using the new copy_cluster_file_instance to configure the check?

codecov · 2025-02-21T20:16:25Z

Codecov Report

Attention: Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.63%. Comparing base (f5532e4) to head (de59b5b).

Additional details and impacted files

Flag	Coverage Δ
activemq	`?`
cassandra	`?`
foundationdb	`81.88% <66.66%> (-0.58%)`	⬇️
hive	`?`
hivemq	`?`
hudi	`?`
ignite	`?`
jboss_wildfly	`?`
kafka	`?`
presto	`?`
solr	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hestonhoffman · 2025-02-21T20:37:44Z

Hello from the docs team 👋
I'm going to hold off on reviewing the stuff in the spec file until the Agent team has taken a look, in case their review necessitates changes to that file. I'll check back in, but feel free to ping me after the Agent team's review if I haven't reviewed yet.

jon-signal · 2025-04-29T13:30:02Z

Hey, @steveny91, I really appreciate all the review on the bazillions of other FoundationDB pull requests over the past few weeks! I recognize that those were more metric-y and that this is a little bit more of a plumbing thing; are you still the right person to take a look at this one? Thanks kindly!

jon-signal · 2025-05-30T14:39:13Z

Friends, with respect, it sounds like there might not be much interest in this contribution. That's okay if so! But it'd be great to get an opinion one way or the other!

sarah-witt · 2025-05-30T17:40:41Z

I'm very sorry @jon-signal I started looking at this PR but it must have slipped through. Taking a look now!

jon-signal · 2025-05-30T21:01:43Z

Thank you, and no worries! I really appreciate all the review you(se) have already done!

jon-signal requested review from a team as code owners February 21, 2025 20:10

temporal-github-worker-1 bot added agent/review-requested ecosystems/review-requested product/review-requested labels Feb 21, 2025

datadog-agent-integrations-bot bot added documentation integration/foundationdb team/agent-integrations team/documentation labels Feb 21, 2025

jon-signal commented Feb 21, 2025

View reviewed changes

jon-signal force-pushed the copy_foundationdb_cluster_file_option branch 2 times, most recently from 5509e1f to 91a010b Compare February 21, 2025 20:26

hestonhoffman added the editorial review Waiting on a more in-depth review from a docs team editor label Feb 21, 2025

jon-signal mentioned this pull request Feb 21, 2025

Difficult to provide FoundationDB cluster files in Kubernetes environments #19677

Open

Add an option to copy cluster files to a writable temporary file

de59b5b

jon-signal force-pushed the copy_foundationdb_cluster_file_option branch from 91a010b to de59b5b Compare March 18, 2025 15:59

Add an option to copy FoundationDB cluster files to a writable temporary file #19684

Are you sure you want to change the base?

Add an option to copy FoundationDB cluster files to a writable temporary file #19684

Conversation

jon-signal commented Feb 21, 2025

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hestonhoffman commented Feb 21, 2025

Uh oh!

jon-signal commented Apr 29, 2025

Uh oh!

jon-signal commented May 30, 2025

Uh oh!

sarah-witt commented May 30, 2025

Uh oh!

jon-signal commented May 30, 2025

Uh oh!

Uh oh!

codecov bot commented Feb 21, 2025 •

edited

Loading