Skip to content

feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target#6329

Open
Apoorva64 wants to merge 6 commits into
kubernetes-sigs:masterfrom
Orange-OpenSource:feat-resolve-load-balancer-hostname-to-A-AAAA-records
Open

feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target#6329
Apoorva64 wants to merge 6 commits into
kubernetes-sigs:masterfrom
Orange-OpenSource:feat-resolve-load-balancer-hostname-to-A-AAAA-records

Conversation

@Apoorva64

@Apoorva64 Apoorva64 commented Mar 30, 2026

Copy link
Copy Markdown

What does this PR do?

This Pr adds per-resource annotation: external-dns.alpha.kubernetes.io/resolve-target. Setting this annotation to true causes ExternalDNS to resolve any CNAME target (i.e. a hostname returned by a cloud load balancer) to its A and AAAA records at sync time, and emit those IP addresses as A/AAAA endpoints instead. DNS resolution uses net.LookupIP and honours the system resolver.
Implementation
Resolution is implemented as a Source Wrapper in resolvetarget.go, wrapping any source. Each source propagates the annotation value as an endpoint label using provider_specific.go; the resolvesource wrapper reads the label and resolves the hostname

If resolution fails for a target (e.g. the hostname is temporarily unresolvable), that target is silently skipped and a debug log entry is emitted.

Motivation

Some DNS providers or configurations do not support CNAME records (e.g., internal Dns), requiring IP-based records.
For example

In a public cloud provider, when we create a Gateway and an Http Route or an Ingress we usually end up with something like this:
When we create a Gateway we end up with a hostname in the status which points to the loadbalancer's ips.
Diagramme sans nom drawio(1)

The loadbalancer IPs are then routed in an infrastructure where we can't resolve external Records like example-gateway.region.mycloud.com.

If we create a CNAME my-app.internal to example-gateway.region.mycloud.com it won't work as the app my-second-app won't be able to resolve the example-gateway.region.mycloud.com Record

We must then create an A Record in the air-gapped infrastructure instead of a CNAME.
image

More

The discussion over this PR is also in #6289. The old PR was closed due to a bug in the CI workflow

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2026
@k8s-ci-robot k8s-ci-robot added docs internal Issues or PRs related to internal code source needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @Apoorva64. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 30, 2026
@Apoorva64 Apoorva64 marked this pull request as ready for review March 30, 2026 18:58
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2026
@k8s-ci-robot k8s-ci-robot requested a review from szuecs March 30, 2026 18:58
@ivankatliarchuk

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2026
@coveralls

coveralls commented Mar 30, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 26357600034

Coverage increased (+0.1%) to 80.726%

Details

  • Coverage increased (+0.1%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 10 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

10 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
integration/toolkit/toolkit.go 10 90.86%

Coverage Stats

Coverage Status
Relevant Lines: 21516
Covered Lines: 17369
Line Coverage: 80.73%
Coverage Strength: 1444.92 hits per line

💛 - Coveralls

@ivankatliarchuk ivankatliarchuk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread docs/annotations/annotations.md Outdated
Comment thread source/wrappers/types.go Outdated
Comment thread docs/annotations/annotations.md Outdated
Comment thread source/wrappers/resolvetarget.go Outdated
Comment thread tests/integration/scenarios/tests.yaml
Comment thread source/wrappers/resolvetarget.go Outdated
Comment thread source/wrappers/resolvetarget.go Outdated
Comment thread source/wrappers/resolvetarget.go Outdated
Comment thread source/wrappers/resolvetarget_test.go Outdated
Comment thread source/wrappers/resolvesource_test.go Outdated
@ivankatliarchuk

Copy link
Copy Markdown
Member

/retitle feat(wrappers): resolve endpoint target CNAME to A/AAAA records

@k8s-ci-robot k8s-ci-robot changed the title feat(wrappers): resolve load balancer hostnames to A/AAAA records feat(wrappers): resolve endpoint target CNAME to A/AAAA records Mar 30, 2026
@ivankatliarchuk

Copy link
Copy Markdown
Member

/retitle feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target

@k8s-ci-robot k8s-ci-robot changed the title feat(wrappers): resolve endpoint target CNAME to A/AAAA records feat: resolve endpoint target CNAME to A/AAAA records with external-dns.alpha.kubernetes.io/resolve-target Mar 30, 2026
Comment thread docs/annotations/annotations.md
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Apoorva64 Apoorva64 marked this pull request as draft April 4, 2026 15:04
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2026
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 24, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from 201e186 to a7ad2ad Compare May 24, 2026 08:43
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 24, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from 6b08e75 to a7ad2ad Compare May 24, 2026 08:47
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 24, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from a7ad2ad to 9492f95 Compare May 24, 2026 08:51
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from 0a15c3d to 6d3f5e5 Compare May 24, 2026 08:58
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from 6d3f5e5 to 8f5fc03 Compare May 24, 2026 09:29
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2026
Comment thread docs/annotations/annotations.md Outdated
func NewResolveTarget(src source.Source, opts ...resolveTargetOption) source.Source {
rs := &resolveTarget{
source: src,
lookupIP: net.LookupIP,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net.LookupIP is synchronous, system-resolver default timeout, one call per target serially inside RunOnce. Many resolve-target endpoints + a slow resolver stalls the whole reconcile loop.
Wdyt about a context-aware resolver (net.Resolver.LookupIP(ctx, ...)) with bounded timeout ?

Comment on lines +109 to +112
if len(ipTargets) == 0 {
// All resolutions failed; skip this endpoint entirely.
continue
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all targets transiently fail to resolve, the endpoint vanishes from desired state → plan computes a delete → DNS record removed → outage, then recreated next sync when DNS recovers.
This is record-flapping driven by resolver health.

Do you think that, on total resolution failure, it can keep the previous/CNAME endpoint (skip the change) rather than emitting nothing ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test like this would confirm that behavior:

  func TestTotalFailureKeepsCNAME(t *testing.T) {
        ep := endpoint.NewEndpoint("app.example.internal", endpoint.RecordTypeCNAME, "lb.example.com")
        ep.WithProviderSpecific(resolveTargetPropertyName, "true")

        ms := new(testutils.MockSource)
        ms.On("Endpoints").Return([]*endpoint.Endpoint{ep}, nil)
        src := NewResolveTarget(ms, WithResolveTargetLookupIP(
                func(string) ([]net.IP, error) { return nil, errors.New("i/o timeout") }, // transient
        ))

        got, err := src.Endpoints(t.Context())
        require.NoError(t, err)

        // Expected: original CNAME preserved, not dropped.
        require.Len(t, got, 1, "endpoint must be kept when resolution totally fails")
        require.Equal(t, endpoint.RecordTypeCNAME, got[0].RecordType, "should fall back to the original CNAME")
        require.Equal(t, endpoint.Targets{"lb.example.com"}, got[0].Targets)

        // Property still consumed so it does not leak downstream.
        _, ok := got[0].GetProviderSpecificProperty(resolveTargetPropertyName)
        require.False(t, ok, "resolve-target property should be consumed even on the fallback path")
  }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the host returns no IPs, why would you want to preserve the record? That's a signal something is wrong - and we don't know whether the host is being reprovisioned, permanently removed, or just experiencing a transient network issue.

This comes down to reconciliation interval tuning. The current service lookup is straightforward - no special logic, just binary: resolved or not. Currnet service lookup

ips, err := net.LookupIP(lb.Hostname)
no magic, just binary - either resolved or not.

There's no clean answer here. Either you keep a record that points nowhere - which is going to be painful to debug - or you remove it when it stops resolving. Both options have real downsides.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivankatliarchuk Yes, that's true. Maybe we can improve UserXP, though.

Wdyt of documenting this limitation and log a warning or a SoftError ?

Comment thread source/wrappers/resolvetarget.go Outdated
Comment on lines +84 to +88
// Skip early if not a CNAME record, as only those can have hostname targets that need resolution.
if ep.RecordType != endpoint.RecordTypeCNAME {
result = append(result, ep)
continue
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolveTargetProperty should be cleaned here, otherwise it won't converge with a non-empty UpdateNew in the plan.

func TestLeakedPropertyShouldNotUpdate(t *testing.T) {
	current := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")

	desired := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
	desired.WithProviderSpecific("resolve-target", "true")

	changes := (&Plan{
		Current:        []*endpoint.Endpoint{current},
		Desired:        []*endpoint.Endpoint{desired},
		ManagedRecords: []string{endpoint.RecordTypeA},
	}).Calculate().Changes

	assert.Empty(t, changes.Create, "no create expected")
	assert.Empty(t, changes.Delete, "no delete expected")
	// Correct: unchanged record must NOT be updated. Fails today due to the leak.
	assert.Empty(t, changes.UpdateNew, "unchanged record should not be updated")
}

@Apoorva64 Apoorva64 Jun 20, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 862a87d

i had to change the test to run the resolve target wrapper before the "Plan" as the Plan happens after the wrappers

func TestLeakedPropertyShouldNotUpdate(t *testing.T) {
	current := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")

	desired := endpoint.NewEndpoint("foo.example.com", endpoint.RecordTypeA, "1.2.3.4")
	desired.WithProviderSpecific("resolve-target", "true")

	ms := new(testutils.MockSource)
	ms.On("Endpoints").Return([]*endpoint.Endpoint{desired}, nil)
	wrapped := NewResolveTarget(ms)

	desiredEndpoints, err := wrapped.Endpoints(t.Context())
	require.NoError(t, err)

	changes := (&plan.Plan{
		Current:        []*endpoint.Endpoint{current},
		Desired:        desiredEndpoints,
		ManagedRecords: []string{endpoint.RecordTypeA},
	}).Calculate().Changes

	assert.Empty(t, changes.Create, "no create expected")
	assert.Empty(t, changes.Delete, "no delete expected")
	// Correct: unchanged record must NOT be updated. Fails today due to the leak.
	assert.Empty(t, changes.UpdateNew, "unchanged record should not be updated")
}

Co-authored-by: Michel Loiseleur <97035654+mloiseleur@users.noreply.github.com>
@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubernetes-prow

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubernetes-prow kubernetes-prow Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2026
@Apoorva64 Apoorva64 force-pushed the feat-resolve-load-balancer-hostname-to-A-AAAA-records branch from 139596f to 862a87d Compare June 20, 2026 12:56
@kubernetes-prow

Copy link
Copy Markdown

@Apoorva64: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-external-dns-unit-test 862a87d link true /test pull-external-dns-unit-test
pull-external-dns-licensecheck 862a87d link true /test pull-external-dns-licensecheck

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apis Issues or PRs related to API change chart cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. controller Issues or PRs related to the controller docs github_actions Pull requests that update GitHub Actions code internal Issues or PRs related to internal code needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. plan Issues or PRs related to external-dns plan provider Issues or PRs related to a provider registry Issues or PRs related to a registry size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants