-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(tracing): Add root owner labels to injected pods, V2 #14682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is a roll-forward of #14578, albiet with a slightly different implementation. The original PR would fall back to direct metadata API calls when getting resources from the informer cache. This instead resyncs the informer when a resource cannot be found and tries again, skipping the direct metadata API entirely. Original PR description: Currently, we add a label that identifies the parent resource that a proxy is attached to, usually a deployment, statefulset, daemonset, etc. These are populated via `linkerd.io/proxy-<resource>`` annotations, with a different label for each parent resource. There are a few deficiencies with the current implementation. Currently, the implementation is specialized to only built-in k8s resources, so things like argocd rollouts will not appear. Additionally it does not recurse beyond two levels, so any more levels of parenting will not be captured. This adds a set of new labels, `linkerd.io/proxy-root-parent` and `linkerd.io/proxy-root-parent-kind`, and `linkerd.io/proxy-root-parent-group`, that identify the name and kind of the root parent of a proxy workload. It also correctly recurses to the true root resource, at least as far as cluster role permissions for the proxy injector permit. Note that the proxy already consumes all of the pod labels via the downward API, so there's no changes required to the proxy injector templates. Signed-off-by: Scott Fleener <[email protected]>
da0473f to
7be5385
Compare
Signed-off-by: Scott Fleener <[email protected]>
Companion to #14682 The previous retry logic relied on direct metadata API calls in cases where the informer cache hadn't updated when injecting a pod. This updates the retry logic to force a refresh of the informer cache instead of going to the metadata API. Signed-off-by: Scott Fleener <[email protected]>
Signed-off-by: Scott Fleener <[email protected]>
Signed-off-by: Scott Fleener <[email protected]>
24689d8 to
82e0b9e
Compare
Companion to #14682 The previous retry logic relied on direct metadata API calls in cases where the informer cache hadn't updated when injecting a pod. This updates the retry logic to force a refresh of the informer cache instead of going to the metadata API. Signed-off-by: Scott Fleener <[email protected]>
| // GetRootOwnerKindAndName returns the resource's owner's type and metadata, using owner | ||
| // references from the Kubernetes API. Parent refs are recursively traversed to find the | ||
| // root parent resource, at least as far as the controller has permissions to query. | ||
| func (api *MetadataAPI) GetRootOwnerKindAndName(ctx context.Context, tm *metav1.TypeMeta, om *metav1.ObjectMeta, retry bool) (*metav1.TypeMeta, *metav1.ObjectMeta) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destination controller endpoint_watcher and workload_watcher still use the old GetOwnerKindAndName function for getting owner metadata. Does it make sense to continue to maintain both of these are should GetRootOwnerKindAndName completely replace GetOwnerKindAndName and have the watchers use this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long term, I'd absolutely like to remove GetOwnerKindAndName. But I'd like this to bake in a relatively low-risk area like trace labels for a bit to make sure this is solid before including it in the watchers.
TL;DR: Yes, in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, let's make sure we have a documented plan for when and how we'll validate this change for trace labels and port it over to the watchers. Without being deliberate here and committing to the follow-through, this duplication will languish and confuse us in the future.
| var indexedParents = map[string]indexedParent{ | ||
| "Job": {Job, batchv1.SchemeGroupVersion.WithResource("jobs")}, | ||
| "ReplicaSet": {RS, appsv1.SchemeGroupVersion.WithResource("replicasets")}, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the idea is to leave open the possibility of other ownership types and structure we're not aware of, this prevents any 3 (or greater) level structure from working other than ones with Job and ReplicaSet as the middle level.
Would it make sense to invert this and have a known list of root types that we know a priori won't have owners in order to save ourselves a lookup on those types?
In order for this to work, there would also need to be a way to make the destination controller's RBAC extensible so that it can be granted watch/get permissions on whatever intermediate workload types a user needs. Not sure if this extensibility would need to be built into the existing destination controller rbac templates, or if it's possible to add this with purely additive new ClusterRoleBindings, for example. Regardless, we should have a story and documentation for how to do this.
Signed-off-by: Scott Fleener <[email protected]>
Signed-off-by: Scott Fleener <[email protected]>
This is a roll-forward of #14578, albiet with a slightly different implementation. The original PR would fall back to direct metadata API calls in retries for all resources. This instead limits direct API calls to just
JobsandReplicaSets, which matches logic that the existing owner retriever uses.Original PR description: