feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target#6329
Conversation
|
Hi @Apoorva64. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
Coverage Report for CI Build 26357600034Coverage increased (+0.1%) to 80.726%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions10 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
|
/retitle feat(wrappers): resolve endpoint target CNAME to A/AAAA records |
|
/retitle feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
201e186 to
a7ad2ad
Compare
6b08e75 to
a7ad2ad
Compare
a7ad2ad to
9492f95
Compare
0a15c3d to
6d3f5e5
Compare
6d3f5e5 to
8f5fc03
Compare
| func NewResolveTarget(src source.Source, opts ...resolveTargetOption) source.Source { | ||
| rs := &resolveTarget{ | ||
| source: src, | ||
| lookupIP: net.LookupIP, |
There was a problem hiding this comment.
net.LookupIP is synchronous, system-resolver default timeout, one call per target serially inside RunOnce. Many resolve-target endpoints + a slow resolver stalls the whole reconcile loop.
Wdyt about a context-aware resolver (net.Resolver.LookupIP(ctx, ...)) with bounded timeout ?
| if len(ipTargets) == 0 { | ||
| // All resolutions failed; skip this endpoint entirely. | ||
| continue | ||
| } |
There was a problem hiding this comment.
If all targets transiently fail to resolve, the endpoint vanishes from desired state → plan computes a delete → DNS record removed → outage, then recreated next sync when DNS recovers.
This is record-flapping driven by resolver health.
Do you think that, on total resolution failure, it can keep the previous/CNAME endpoint (skip the change) rather than emitting nothing ?
There was a problem hiding this comment.
A test like this would confirm that behavior:
func TestTotalFailureKeepsCNAME(t *testing.T) {
ep := endpoint.NewEndpoint("app.example.internal", endpoint.RecordTypeCNAME, "lb.example.com")
ep.WithProviderSpecific(resolveTargetPropertyName, "true")
ms := new(testutils.MockSource)
ms.On("Endpoints").Return([]*endpoint.Endpoint{ep}, nil)
src := NewResolveTarget(ms, WithResolveTargetLookupIP(
func(string) ([]net.IP, error) { return nil, errors.New("i/o timeout") }, // transient
))
got, err := src.Endpoints(t.Context())
require.NoError(t, err)
// Expected: original CNAME preserved, not dropped.
require.Len(t, got, 1, "endpoint must be kept when resolution totally fails")
require.Equal(t, endpoint.RecordTypeCNAME, got[0].RecordType, "should fall back to the original CNAME")
require.Equal(t, endpoint.Targets{"lb.example.com"}, got[0].Targets)
// Property still consumed so it does not leak downstream.
_, ok := got[0].GetProviderSpecificProperty(resolveTargetPropertyName)
require.False(t, ok, "resolve-target property should be consumed even on the fallback path")
}There was a problem hiding this comment.
If the host returns no IPs, why would you want to preserve the record? That's a signal something is wrong - and we don't know whether the host is being reprovisioned, permanently removed, or just experiencing a transient network issue.
This comes down to reconciliation interval tuning. The current service lookup is straightforward - no special logic, just binary: resolved or not. Currnet service lookup
external-dns/source/service.go
Line 585 in 9bb899c
There's no clean answer here. Either you keep a record that points nowhere - which is going to be painful to debug - or you remove it when it stops resolving. Both options have real downsides.
There was a problem hiding this comment.
@ivankatliarchuk Yes, that's true. Maybe we can improve UserXP, though.
Wdyt of documenting this limitation and log a warning or a SoftError ?
| // Skip early if not a CNAME record, as only those can have hostname targets that need resolution. | ||
| if ep.RecordType != endpoint.RecordTypeCNAME { | ||
| result = append(result, ep) | ||
| continue | ||
| } |
There was a problem hiding this comment.
The resolveTargetProperty should be cleaned here, otherwise it won't converge with a non-empty UpdateNew in the plan.
func TestLeakedPropertyShouldNotUpdate(t *testing.T) {
current := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
desired := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
desired.WithProviderSpecific("resolve-target", "true")
changes := (&Plan{
Current: []*endpoint.Endpoint{current},
Desired: []*endpoint.Endpoint{desired},
ManagedRecords: []string{endpoint.RecordTypeA},
}).Calculate().Changes
assert.Empty(t, changes.Create, "no create expected")
assert.Empty(t, changes.Delete, "no delete expected")
// Correct: unchanged record must NOT be updated. Fails today due to the leak.
assert.Empty(t, changes.UpdateNew, "unchanged record should not be updated")
}There was a problem hiding this comment.
fixed in 862a87d
i had to change the test to run the resolve target wrapper before the "Plan" as the Plan happens after the wrappers
func TestLeakedPropertyShouldNotUpdate(t *testing.T) {
current := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
desired := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
desired.WithProviderSpecific("resolve-target", "true")
ms := new(testutils.MockSource)
ms.On("Endpoints").Return([]*endpoint.Endpoint{desired}, nil)
wrapped := NewResolveTarget(ms)
desiredEndpoints, err := wrapped.Endpoints(t.Context())
require.NoError(t, err)
changes := (&plan.Plan{
Current: []*endpoint.Endpoint{current},
Desired: desiredEndpoints,
ManagedRecords: []string{endpoint.RecordTypeA},
}).Calculate().Changes
assert.Empty(t, changes.Create, "no create expected")
assert.Empty(t, changes.Delete, "no delete expected")
// Correct: unchanged record must NOT be updated. Fails today due to the leak.
assert.Empty(t, changes.UpdateNew, "unchanged record should not be updated")
}Co-authored-by: Michel Loiseleur <97035654+mloiseleur@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
139596f to
862a87d
Compare
|
@Apoorva64: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What does this PR do?
This Pr adds per-resource annotation:
external-dns.alpha.kubernetes.io/resolve-target. Setting this annotation totruecauses ExternalDNS to resolve any CNAME target (i.e. a hostname returned by a cloud load balancer) to its A and AAAA records at sync time, and emit those IP addresses as A/AAAA endpoints instead. DNS resolution uses net.LookupIP and honours the system resolver.Implementation
Resolution is implemented as a Source Wrapper in resolvetarget.go, wrapping any source. Each source propagates the annotation value as an endpoint label using provider_specific.go; the resolvesource wrapper reads the label and resolves the hostname
If resolution fails for a target (e.g. the hostname is temporarily unresolvable), that target is silently skipped and a debug log entry is emitted.
Motivation
Some DNS providers or configurations do not support
CNAMErecords (e.g., internal Dns), requiring IP-based records.For example
In a public cloud provider, when we create a Gateway and an Http Route or an Ingress we usually end up with something like this:

When we create a Gateway we end up with a hostname in the status which points to the loadbalancer's ips.
The loadbalancer IPs are then routed in an infrastructure where we can't resolve external Records like
example-gateway.region.mycloud.com.If we create a CNAME
my-app.internaltoexample-gateway.region.mycloud.comit won't work as the appmy-second-appwon't be able to resolve theexample-gateway.region.mycloud.comRecordWe must then create an A Record in the air-gapped infrastructure instead of a CNAME.

More
The discussion over this PR is also in #6289. The old PR was closed due to a bug in the CI workflow