Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions foundationdb/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,18 @@ files:
value:
example: /var/lib/foundationdb/fdb.cluster
type: string
- name: copy_cluster_file
description: |
Whether to make a writable copy of the given `cluster_file` before
passing it to the FoundationDB client. FoundationDB clients generally
need to be able to write to cluster files in the event of cluster
topology changes. This option can be useful in environments where
making a writable file available to the FoundationDB integration is
difficult (i.e. mounting a `ConfigMap` in a Kubernetes deployment).
This option has no effect if `cluster_file` is not specified.
value:
example: false
type: boolean
- name: tls_certificate_file
description: Path to the file from which the local certificates can be loaded
value:
Expand Down
1 change: 1 addition & 0 deletions foundationdb/changelog.d/19684.added
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add an option to copy FoundationDB cluster files to a writable temporary file
10 changes: 9 additions & 1 deletion foundationdb/datadog_checks/foundationdb/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
# Licensed under a 3-clause BSD style license (see LICENSE)

import json
import shutil
import tempfile

import fdb

Expand Down Expand Up @@ -31,7 +33,13 @@ def construct_database(self):
fdb.options.set_tls_verify_peers(self.instance.get('tls_verify_peers').encode('latin-1'))

if 'cluster_file' in self.instance:
self._db = fdb.open(cluster_file=self.instance.get('cluster_file'))
cluster_file = self.instance.get('cluster_file')

if self.instance.get('copy_cluster_file'):
_, cluster_file = tempfile.mkstemp(suffix=".cluster")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a "right place" to clean up this temporary file (or other long-lived resources)? I notice that this check doesn't have a mechanism for closing the FoundationDB client, so perhaps just leaving the temporary file is the right thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd ideally clean this up at the end of every run of the check method as we are a creating a new file every time the check executes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to turn this around into a request for basic information, but I've actually really been struggling to find a clear articulation of the lifecycle of cluster checks in particular. Can you point me to the relevant docs or, failing that, tell me a little more about the lifecycle of a cluster check/its parent Python environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry about the delay- here is some information, however we have no docs that are very detailed about this: https://github.com/DataDog/datadog-agent/blob/44790aa282753af0d8a40a42aaaa38b26b9e916b/docs/dev/checks/README.md. What information in particular are you looking for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Well, I'm trying to reconcile some observed behavior with your concern about creating a new file every time the check executes.

What we saw in testing was that (from the perspective of the FoundationDB cluster) there was a single, long-lived FoundationDB client associated with the Datadog agent. To me, that looked like the cluster check got assigned to a single agent instance which then had a continually-running VM, and the FoundationDB client only got initialized once. If that was actually the case (and we didn't misinterpret something, which is very possible since this wasn't our focus at the time!), then I think the time to remove the temporary file would be at VM exit.

…but now that I've actually written that out, the time to remove the file is probably at VM exit regardless. The check will only try to create a new client (and a new temporary file) if it doesn't already have a client instance in memory:

    def construct_database(self):
        if self._db is not None:
            return self._db

…and self._db would only get cleared at VM exit. So I think (with apologies for the added detour) this is all moot, and we should just clean up the temporary file when the VM exists. Whew 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thank you, I see now, that makes a lot of sense. I was not familiar enough with the check but if the connection is persisted then we will not need to clean up the file on every check run.

shutil.copyfile(self.instance.get('cluster_file'), cluster_file)

self._db = fdb.open(cluster_file=cluster_file)
else:
self._db = fdb.open()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ def instance_cluster_file():
return '/var/lib/foundationdb/fdb.cluster'


def instance_copy_cluster_file():
return False


def instance_disable_generic_tags():
return False

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ class InstanceConfig(BaseModel):
frozen=True,
)
cluster_file: Optional[str] = None
copy_cluster_file: Optional[bool] = None
custom_queries: Optional[tuple[MappingProxyType[str, Any], ...]] = None
disable_generic_tags: Optional[bool] = None
empty_default_hostname: Optional[bool] = None
Expand Down
11 changes: 11 additions & 0 deletions foundationdb/datadog_checks/foundationdb/data/conf.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ instances:
#
# cluster_file: /var/lib/foundationdb/fdb.cluster

## @param copy_cluster_file - boolean - optional - default: false
## Whether to make a writable copy of the given `cluster_file` before
## passing it to the FoundationDB client. FoundationDB clients generally
## need to be able to write to cluster files in the event of cluster
## topology changes. This option can be useful in environments where
## making a writable file available to the FoundationDB integration is
## difficult (i.e. mounting a `ConfigMap` in a Kubernetes deployment).
## This option has no effect if `cluster_file` is not specified.
#
# copy_cluster_file: false

## @param tls_certificate_file - string - optional
## Path to the file from which the local certificates can be loaded
#
Expand Down
8 changes: 8 additions & 0 deletions foundationdb/tests/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@
'tls_verify_peers': 'Check.Valid=0',
}

COPY_CLUSTER_FILE_INSTANCE = {
'cluster_file': CLUSTER_FILE,
'copy_cluster_file': True,
'tls_certificate_file': TLS_CERT_FILE,
'tls_key_file': TLS_KEY_FILE,
'tls_verify_peers': 'Check.Valid=0',
}

TLS_INSTANCE = {
'cluster_file': TLS_CLUSTER_FILE,
'tls_certificate_file': TLS_CERT_FILE,
Expand Down
16 changes: 15 additions & 1 deletion foundationdb/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,16 @@

from datadog_checks.dev import WaitFor, docker_run, run_command

from .common import E2E_CONFIG, E2E_METADATA, E2E_TLS_CONFIG, HERE, INSTANCE, PROTOCOL, TLS_INSTANCE
from .common import (
COPY_CLUSTER_FILE_INSTANCE,
E2E_CONFIG,
E2E_METADATA,
E2E_TLS_CONFIG,
HERE,
INSTANCE,
PROTOCOL,
TLS_INSTANCE,
)


@pytest.fixture(scope='session')
Expand All @@ -29,6 +38,11 @@ def instance():
return INSTANCE


@pytest.fixture
def copy_cluster_file_instance():
return COPY_CLUSTER_FILE_INSTANCE
Comment on lines +41 to +43
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very new to Python unit tests and recognize that this may not be the right way to do things. I'm very open to feedback!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not creating a new test- it is creating a test fixture that can then be used in tests. If you have a test in mind for this change let us know and we can help you write it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged! That was actually my intent; my thought was that it made sense to reuse the existing integration tests with the new option to copy the cluster file (i.e. "can we still talk to FoundationDB when we initialize a client in this manner?"), but I'm very open to feedback that this isn't the ideal way to test things. What's your preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, to do this could you create a new test in the test_foundationdb.py file similar to test_integ, except using the new copy_cluster_file_instance to configure the check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! I just wanted to check in to see if there's anything I can do to help out with this test!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and it's my turn to apologize for the delay. My priorities got pulled elsewhere in a big hurry, but I'm returning my focus to FoundationDB and expect to be able to resolve this in the next week or two. Thank you for your patience!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all, let me know if there's anything I can do!



@pytest.fixture
def tls_instance():
return TLS_INSTANCE
Expand Down