Reduce memory usage when watching resources#425
Open
LAMRobinson wants to merge 2 commits intocrossplane-contrib:mainfrom
Open
Reduce memory usage when watching resources#425LAMRobinson wants to merge 2 commits intocrossplane-contrib:mainfrom
LAMRobinson wants to merge 2 commits intocrossplane-contrib:mainfrom
Conversation
Description: - Set DefaultTransform to strip managed fields in controller-runtime caches. - Switch informer object type to PartialObjectMetadata for efficiency. - Update imports to use metav1 instead of unstructured for informers. Signed-off-by: Laurence Robinson <laurence_robinson@live.co.uk>
Signed-off-by: Laurence Robinson <laurence_robinson@live.co.uk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of your changes
Reduces memory usage of the watch feature (
--enable-watches) by switching resource informer caches from full-object (Unstructured) informers to metadata-only (PartialObjectMetadata) informers, and strippingmanagedFieldsfrom all cached objects (both watch caches and the main manager cache) viacache.TransformStripManagedFields().Note that there are a few other changes in this PR that came up during
make reviewable test, these are in a dedicated commit.Problem
When the watch feature is enabled, the provider creates a
cache.Cacheper(providerConfig, GVK)pair to detect changes to referenced or managed resources. These caches useUnstructuredinformers, which means the Kubernetes API server sends the full object (spec, status, annotations, managedFields, etc.) for every resource of each watched GVK, and the provider stores all of it in memory.The provider never reads data from these caches. The reconciler always fetches resources directly from the API server via
c.client.Get(). The watch caches exist purely as event sources -- the event handler (enqueueObjectsForReferences) only usesGetName(),GetNamespace(), andGetObjectKind().GroupVersionKind()from the event object to look up which Object resources need reconciliation.This means for each watched GVK, the provider caches a complete in-memory copy of every object of that type across all namespaces in the target cluster, even though the cached data is never consumed. For clusters with common resource types (ConfigMaps, Secrets, Deployments, etc.) this leads to extreme memory usage -- 80GB+ observed for a few thousand managed resources.
Additionally, the main manager cache (for Object CRs, ProviderConfigs, etc.) retains
managedFieldson all cached objects, which adds unnecessary memory overhead especially when SSA is enabled.Fix
Three complementary changes:
Switch to
PartialObjectMetadatainformers (watch caches): ReplaceUnstructuredwith*metav1.PartialObjectMetadatawhen callingcache.GetInformer()in both cluster-scoped and namespaced controllers. This causes controller-runtime to use metadata-only List/Watch requests, so the API server only sends object metadata (name, namespace, UID, labels, annotations, etc.) rather than the full spec/status. This reduces per-object size from 5-50KB to ~200-500 bytes and also reduces network bandwidth.Strip
managedFieldsfrom watch caches: Applycache.TransformStripManagedFields()to the watch cache options. Even in metadata-only responses,managedFieldscan be 2-10KB per object. Stripping them provides an additional 30-60% reduction on the remaining metadata.Strip
managedFieldsfrom the main manager cache: Applycache.TransformStripManagedFields()to the manager's cache options inmain.go. This stripsmanagedFieldsfrom Object CRs, ProviderConfigs, and all other control plane resources cached by the manager. The provider never readsmanagedFieldsfrom these resources -- the onlyGetManagedFields()calls in the codebase (syncer.go:149) operate on managed resources fetched directly from the target cluster API, not from the manager cache.Why this is safe
Observe(),Create(),Update(), andDelete()all fetch resources directly from the API server (c.client.Get()), never from the watch cache.enqueueObjectsForReferences()(inindexes.go) only accessesGetObjectKind().GroupVersionKind(),GetNamespace(), andGetName()on event objects -- all available onPartialObjectMetadata.PartialObjectMetadataimplementsclient.Object, so the existingobj.(client.Object)type assertions in event handlers continue to work.cache.GetInformer()in controller-runtime v0.19.0 has explicit support forPartialObjectMetadatavia a dedicated metadata-only informer factory.cache.TransformStripManagedFields()is a built-in, tested utility in controller-runtime designed for exactly this purpose.needSSAFieldManagerUpgrade()function insyncer.goreadsmanagedFields, but from managed resources fetched via direct API calls to the target cluster -- not from the manager cache. StrippingmanagedFieldsfrom the manager cache does not affect this code path.Files changed
internal/controller/cluster/object/informers.go-- metadata-only informers + strip managedFieldsinternal/controller/namespaced/object/informers.go-- metadata-only informers + strip managedFieldscmd/provider/main.go-- strip managedFields from manager cacheI have:
make reviewable testto ensure this PR is ready for review.How has this code been tested
PartialObjectMetadatasatisfies theclient.Objectinterface required by event handler type assertionsenqueueObjectsForReferencesonly accesses metadata fields available onPartialObjectMetadataGetManagedFields()is only called on objects from direct API calls, not from the manager cache