Skip to content

Conversation

@sfleen
Copy link
Contributor

@sfleen sfleen commented Nov 4, 2025

This is a roll-forward of #14578, albiet with a slightly different implementation. The original PR would fall back to direct metadata API calls in retries for all resources. This instead limits direct API calls to just Jobs and ReplicaSets, which matches logic that the existing owner retriever uses.

Original PR description:

Currently, we add a label that identifies the parent resource that a proxy is attached to, usually a deployment, statefulset, daemonset, etc. These are populated via `linkerd.io/proxy-`` annotations, with a different label for each parent resource.

There are a few deficiencies with the current implementation. Currently, the implementation is specialized to only built-in k8s resources, so things like argocd rollouts will not appear. Additionally it does not recurse beyond two levels, so any more levels of parenting will not be captured.

This adds a set of new labels, linkerd.io/proxy-root-parent and linkerd.io/proxy-root-parent-kind, and linkerd.io/proxy-root-parent-group, that identify the name and kind of the root parent of a proxy workload. It also correctly recurses to the true root resource, at least as far as cluster role permissions for the proxy injector permit.

Note that the proxy already consumes all of the pod labels via the downward API, so there's no changes required to the proxy injector templates.

This is a roll-forward of #14578, albiet with a slightly different implementation. The original PR would fall back to direct metadata API calls when getting resources from the informer cache. This instead resyncs the informer when a resource cannot be found and tries again, skipping the direct metadata API entirely.

Original PR description:

Currently, we add a label that identifies the parent resource that a proxy is attached to, usually a deployment, statefulset, daemonset, etc. These are populated via `linkerd.io/proxy-<resource>`` annotations, with a different label for each parent resource.

There are a few deficiencies with the current implementation. Currently, the implementation is specialized to only built-in k8s resources, so things like argocd rollouts will not appear. Additionally it does not recurse beyond two levels, so any more levels of parenting will not be captured.

This adds a set of new labels, `linkerd.io/proxy-root-parent` and `linkerd.io/proxy-root-parent-kind`, and `linkerd.io/proxy-root-parent-group`, that identify the name and kind of the root parent of a proxy workload. It also correctly recurses to the true root resource, at least as far as cluster role permissions for the proxy injector permit.

Note that the proxy already consumes all of the pod labels via the downward API, so there's no changes required to the proxy injector templates.

Signed-off-by: Scott Fleener <[email protected]>
@sfleen sfleen force-pushed the sfleen/root-owner-3 branch from da0473f to 7be5385 Compare November 4, 2025 15:57
sfleen added a commit that referenced this pull request Nov 6, 2025
Companion to #14682

The previous retry logic relied on direct metadata API calls in cases where the informer cache hadn't updated when injecting a pod.

This updates the retry logic to force a refresh of the informer cache instead of going to the metadata API.

Signed-off-by: Scott Fleener <[email protected]>
@sfleen sfleen marked this pull request as ready for review November 6, 2025 14:26
@sfleen sfleen requested a review from a team as a code owner November 6, 2025 14:26
@sfleen sfleen force-pushed the sfleen/root-owner-3 branch from 24689d8 to 82e0b9e Compare November 6, 2025 16:06
sfleen added a commit that referenced this pull request Nov 6, 2025
Companion to #14682

The previous retry logic relied on direct metadata API calls in cases where the informer cache hadn't updated when injecting a pod.

This updates the retry logic to force a refresh of the informer cache instead of going to the metadata API.

Signed-off-by: Scott Fleener <[email protected]>
// GetRootOwnerKindAndName returns the resource's owner's type and metadata, using owner
// references from the Kubernetes API. Parent refs are recursively traversed to find the
// root parent resource, at least as far as the controller has permissions to query.
func (api *MetadataAPI) GetRootOwnerKindAndName(ctx context.Context, tm *metav1.TypeMeta, om *metav1.ObjectMeta, retry bool) (*metav1.TypeMeta, *metav1.ObjectMeta) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destination controller endpoint_watcher and workload_watcher still use the old GetOwnerKindAndName function for getting owner metadata. Does it make sense to continue to maintain both of these are should GetRootOwnerKindAndName completely replace GetOwnerKindAndName and have the watchers use this instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long term, I'd absolutely like to remove GetOwnerKindAndName. But I'd like this to bake in a relatively low-risk area like trace labels for a bit to make sure this is solid before including it in the watchers.

TL;DR: Yes, in a follow-up PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, let's make sure we have a documented plan for when and how we'll validate this change for trace labels and port it over to the watchers. Without being deliberate here and committing to the follow-through, this duplication will languish and confuse us in the future.

Comment on lines +282 to +285
var indexedParents = map[string]indexedParent{
"Job": {Job, batchv1.SchemeGroupVersion.WithResource("jobs")},
"ReplicaSet": {RS, appsv1.SchemeGroupVersion.WithResource("replicasets")},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the idea is to leave open the possibility of other ownership types and structure we're not aware of, this prevents any 3 (or greater) level structure from working other than ones with Job and ReplicaSet as the middle level.

Would it make sense to invert this and have a known list of root types that we know a priori won't have owners in order to save ourselves a lookup on those types?

In order for this to work, there would also need to be a way to make the destination controller's RBAC extensible so that it can be granted watch/get permissions on whatever intermediate workload types a user needs. Not sure if this extensibility would need to be built into the existing destination controller rbac templates, or if it's possible to add this with purely additive new ClusterRoleBindings, for example. Regardless, we should have a story and documentation for how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants