Skip to content

Conversation

@jhiemstrawisc
Copy link
Member

There is an entire class of bugs that revolves around having the
Director host a copy of federation metadata when it isn't also the
root discovery URL for the federation. We keep finding these bugs
in a trickle, but we haven't been able to squash them all because
of how entangled these concepts are.

Adding this knob doesn't solve those bugs, but it should help us
detect them more easily by allowing us to turn of Director metadata
when we don't want people using the Director for that purpose.

@jhiemstrawisc jhiemstrawisc added this to the v7.23 milestone Dec 17, 2025
@jhiemstrawisc jhiemstrawisc added enhancement New feature or request director Issue relating to the director component configuration labels Dec 17, 2025
@jhiemstrawisc
Copy link
Member Author

@williamnswanson This new param also feels relevant to you, since we'll probably want to come up with a plan for turning it on in the OSDF (hopefully without exposing too many bugs at once...).

@williamnswanson
Copy link
Contributor

If our goal is to find bugs related to Director-based federation discovery (or components/clients that are still using the director for discovery), could we use a less destructive method to find out what is currently using it, like logging access? That way we can reach out out-of-band about a transition plan instead of shutting it off and breaking their current workflows out of nowhere?

Copy link
Collaborator

@turetske turetske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue regarding a println that was left in.

This was a subtle bug I uncovered while trying to write a unit test
that distinguished between the fed metadata hosted by the discovery
server and the test's Director.

It turned out that the duplicate metadata hosted by the test Director
wasn't matching the discovery server's because it wasn't enough to
set the Discovery URL as a param for fed tests because by that point,
global fed discovery had already run by the Director (who thought it
was the fed root when it started -- a bootstrapping issue).

To achieve the desired behavior, I also needed to overwrite the global
fed info to reflect what the Director would have discovered had it
used this as its source of truth.

I don't think this is actually a bug in the code, only in the awkward
approach we have to setting up federations for tests.
There is an entire class of bugs that revolves around having the
Director host a copy of federation metadata when it isn't also the
root discovery URL for the federation.

Adding this knob doesn't _solve_ those bugs, but it should help us
detect them more easily by allowing us to turn of Director metadata
when we don't want people using the Director for that purpose.
@jhiemstrawisc
Copy link
Member Author

After an in-person discussion with @williamnswanson, we decided it'd also be nice to add some monitoring in the Director to detect when servers reach out to it for federation metadata. I plan to add that in this PR.

This was requested by the Ops team to smooth the transition toward
turning off metadata hosting at the Director in OSDF.

This commit adds a metric we can use to watch who is still hitting
the Director for fed metadata. I tried to give the metric enough
contextual info to determine "who" is asking for the data by recording
a masked address and a service type (bootstrapped from the request's
user agent).

Importantly, this changed my original feature design slightly -- rather
than completely turning off the metadata hosting endpoint by de-registering
it with gin, this keeps it registered so we can collect metrics. If
the metadata hosting is disabled, we return a `410 Gone` as soon as
we record who tried to access the info.
@jhiemstrawisc
Copy link
Member Author

Added @patrickbrophy as a second reviewer because of the new Prometheus metric this adds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration director Issue relating to the director component enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should have a config knob that lets us turn off pelican-configuration hosting at the Director

3 participants