-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Description
I'm observing non-deterministic memory growth in external-dns v0.20.0 on linux/arm64. The external-dns container memory increases from ~14Mi to ~90Mi (6x increase) during initialization and stays elevated until pod restart.
I observed this issue several times initially, but have been unable to reproduce it since. The non-deterministic nature suggests a possible race condition or timing-dependent issue.
Environment
External-DNS:
- Version: v0.20.0
- Platform: linux/arm64
- Image:
registry.k8s.io/external-dns/external-dns:v0.20.0
Configuration (actual from running pod):
Sources: [gateway-httproute service]
Interval: 1m0s
MinEventSyncInterval: 5s
Policy: sync
Registry: txt
TXTOwnerID: unifi
Provider: webhook
ProviderCacheTime: 0s
WebhookProviderURL: http://localhost:8888
WebhookProviderReadTimeout: 5s
WebhookProviderWriteTimeout: 10s
AnnotationPrefix: internal-dns/
LogLevel: info
LogFormat: json
MetricsAddress: :7979
DomainFilter: []
ManagedDNSRecordTypes: [A AAAA CNAME]
Kubernetes:
- Deployment with 2 containers (external-dns + webhook provider)
- Webhook provider memory: stable 33-34Mi (NOT affected)
- DNS records managed: ~10 A records
Expected Behavior
External-DNS memory should remain stable around 14-18Mi, similar to what I observed with the webhook provider container which stays consistently at 33-34Mi.
Actual Behavior
Normal state (most of the time):
- external-dns: 14-18Mi
- Total pod: 48-52Mi
Problem state (observed several times, cannot reproduce now):
- external-dns: 90Mi (6x increase!)
- Total pod: 124Mi
- Memory stayed elevated until manual pod restart
Important details:
- All DNS records were already "up to date" - no changes were being made
- No record manipulations occurred during the high memory state
- Logs showed only normal operation messages (see below)
Reproducibility
Cannot reliably reproduce:
- Observed the issue several times on fresh pod starts
- After manual restarts, sometimes reproduced, sometimes didn't
- Ran 10 consecutive pod restarts as a test - all showed normal memory (14-18Mi)
- Problem has not recurred since initial observations
This non-deterministic behavior suggests a race condition or state-dependent issue.
Logs
Logs were completely clean during both normal and high-memory states. No errors, warnings, or unusual messages:
{"level":"info","msg":"All records are already up to date"}Repeated every minute. No webhook errors, API errors, retries, or any indication of problems.
The clean logs are particularly notable because:
- No record changes were happening
- No errors to trigger retries or buffering
- External-DNS reported normal operation while using 6x memory
Investigation Performed
- Webhook provider: Memory stable at 33-34Mi in all cases, logs clean
- Configuration:
ProviderCacheTime: 0smeans no webhook response caching - Go memstats (when operating normally at 14Mi):
go_memstats_alloc_bytes: 6.2MBgo_memstats_heap_inuse_bytes: 9.6MBgo_memstats_stack_inuse_bytes: 1.1MB
- Restart behavior: Problem cleared immediately on pod restart
- Logs: Clean in both states - no errors or warnings at any point
Hypothesis
Since the webhook provider memory remains stable and there's no caching (ProviderCacheTime: 0s), the issue appears to be in external-dns internal components, possibly:
- Kubernetes informers (gateway-httproute, service, pods, nodes, namespaces, endpointslices)
- Platform-specific issue (linux/arm64)
- Race condition during initialization
- Regression from v0.19.0 (which had significant memory improvements)
Questions
- Are there known issues with v0.20.0 on arm64?
- Have others reported similar memory behavior with v0.20.0?
- Any known race conditions in informer initialization that could cause this?
Additional Context
- I can provide heap dumps and goroutine dumps if the problem reproduces
- Willing to test patches or provide additional diagnostics
- Problem is not critical (pod still functional, restart resolves it)
- Unable to reproduce on demand, so cannot test downgrade scenarios
Related
- v0.19.0 release notes mention memory improvements: https://github.com/kubernetes-sigs/external-dns/releases/tag/v0.19.0
- v0.20.0 is very recent (5 days old): https://github.com/kubernetes-sigs/external-dns/releases/tag/v0.20.0
Filing this issue despite inability to reproduce, in case others encounter the same behavior.