Skip to content

fix(awssd): use namespace-aware parsing for dotted service names#6365

Merged
k8s-ci-robot merged 4 commits intokubernetes-sigs:masterfrom
am-ltk:fix-awssd-parse-hostname-dotted-service
Apr 30, 2026
Merged

fix(awssd): use namespace-aware parsing for dotted service names#6365
k8s-ci-robot merged 4 commits intokubernetes-sigs:masterfrom
am-ltk:fix-awssd-parse-hostname-dotted-service

Conversation

@am-ltk
Copy link
Copy Markdown
Contributor

@am-ltk am-ltk commented Apr 11, 2026

What does it do ?

Changes parseHostname in the aws-sd provider to accept known Cloud Map namespaces and use longest-suffix matching instead of naive first-dot splitting. This correctly handles service names containing dots (e.g. my-app.elb in namespace dev.local).

Falls back to the original first-dot split when no namespace suffix matches, preserving full backward compatibility for all existing configurations.

Motivation

parseHostname splits hostnames at the first dot with no awareness of actual Cloud Map namespace boundaries. For my-app.elb.dev.local with namespace dev.local, it produces:

Field Parsed (wrong) Correct
service my-app my-app.elb
namespace elb.dev.local dev.local

matchingNamespaces then does an exact-match lookup and finds no namespace named elb.dev.local, so the record is silently dropped with:

Skipping record <hostname> because no namespace matching record DNS Name was detected

This affects any hostname where the Cloud Map service name contains a dot, including SRV records (#5714), where hostnames like _backend._tcp.backend.mynet.svc.internal are parsed with namespace _tcp.backend.mynet.svc.internal instead of mynet.svc.internal.

No --domain-filter workaround is possible because the issue is structural in parseHostname.

Fixes #6364
Ref #5714

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

No documentation update is needed — this is an internal bug fix to hostname parsing logic with no user-facing configuration changes.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 11, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 11, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @am-ltk!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @am-ltk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 11, 2026
@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 11, 2026
@am-ltk am-ltk force-pushed the fix-awssd-parse-hostname-dotted-service branch from cbb4482 to 96098fa Compare April 11, 2026 22:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 11, 2026
@am-ltk am-ltk force-pushed the fix-awssd-parse-hostname-dotted-service branch from 96098fa to 1716ee6 Compare April 11, 2026 22:51
parseHostname splits at the first dot, which breaks when Cloud Map
service names contain dots (e.g. my-app.elb in namespace dev.local).
The naive split produces namespace=elb.dev.local instead of
namespace=dev.local, causing the record to be silently skipped.

Change parseHostname to accept known namespaces and use longest-suffix
matching, falling back to the original first-dot split when no
namespace matches for backward compatibility.

Signed-off-by: Andrew Moes <andrew@moes.dev>
Made-with: Cursor
@am-ltk am-ltk force-pushed the fix-awssd-parse-hostname-dotted-service branch from 1716ee6 to 30e3a19 Compare April 11, 2026 23:13
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 11, 2026
@am-ltk
Copy link
Copy Markdown
Contributor Author

am-ltk commented Apr 11, 2026

/easycla

Copy link
Copy Markdown
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the fix. Looks like a patch work, and method parseHostname now returns multiple values, which could be a sing of design that was made when the provider was implemented.

each call site only uses one of the two return values, and the work is duplicated...

  ┌──────────────────────┬────────────┐
  │      Call site       │    Uses    │
  ├──────────────────────┼────────────┤
  │ changesByNamespaceID │ nsName, _  │
  ├──────────────────────┼────────────┤
  │ submitCreates        │ _, srvName │
  ├──────────────────────┼────────────┤
  │ submitDeletes        │ _, srvName │
  └──────────────────────┴────────────┘

And submitCreates/submitDeletes re-parse hostnames that changesByNamespaceID already parsed to group them - so the namespace matching runs twice per endpoint.

The optimization: since submitCreates/submitDeletes iterate over changesByNamespaceID's result, they already know the nsID. They can look up the namespace name from it and derive the service name with a plain TrimSuffix - no re-parsing needed:

Something like

// build once, before the loop
  nsIDToName := make(map[string]string, len(namespaces))
  for _, ns := range namespaces {
      nsIDToName[aws.ToString(ns.Id)] = aws.ToString(ns.Name)
  }

  for nsID, changeList := range changesByNamespaceID {
      nsName := nsIDToName[nsID]
      for _, ch := range changeList {
          hostname := strings.TrimSuffix(ch.DNSName, ".")
          srvName := strings.TrimSuffix(hostname, "."+nsName)
          ...
      }
  }

This eliminates two of the three parseHostname call sites - only changesByNamespaceID keeps it. At that point parseHostname only ever returns the namespace name, so it could be renamed to parseNamespace and simplified back to return a single string.

I played with this PR

Image Image Image

Comment thread provider/awssd/aws_sd.go Outdated
Comment thread provider/awssd/aws_sd_test.go
@ivankatliarchuk
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 12, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 12, 2026

Coverage Report for CI Build 24352432864

Coverage increased (+0.02%) to 80.544%

Details

  • Coverage increased (+0.02%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 32 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

32 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
awssd/aws_sd.go 32 84.3%

Coverage Stats

Coverage Status
Relevant Lines: 21330
Covered Lines: 17180
Line Coverage: 80.54%
Coverage Strength: 1469.92 hits per line

💛 - Coveralls

…dundant calls

- Rename parseHostname to parseNamespace returning only the namespace
  name (single string) since that is the only value changesByNamespaceID
  needs.
- Extract namespaceIDToName helper to build namespace ID-to-name map.
- Refactor submitCreates/submitDeletes to derive service names via
  TrimSuffix using the known namespace name from the map, eliminating
  two redundant parseHostname calls per endpoint.
- Add edge-case tests: trailing dot, hostname-equals-namespace, and
  single-label hostname.

Signed-off-by: Andrew Moes <andrew@moes.dev>
Made-with: Cursor
@am-ltk am-ltk requested a review from ivankatliarchuk April 12, 2026 18:45
Copy link
Copy Markdown
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share similar results for this PR #5085 (comment). Need to make sure it works before we merge

Comment thread provider/awssd/aws_sd.go Outdated
- Remove the AWSSDProvider receiver from parseNamespace since it has no
  dependency on the provider struct.
- Reorder file so all receiver methods precede package-level helpers.
- Update call site and test accordingly.

Signed-off-by: Andrew Moes <andrew@moes.dev>
Made-with: Cursor
@am-ltk
Copy link
Copy Markdown
Contributor Author

am-ltk commented Apr 12, 2026

Could you share similar results for this PR #5085 (comment). Need to make sure it works before we merge

external-dns arguments

--provider=aws-sd --registry=aws-sd --domain-filter=dev.local --source=service --source=ingress

Cloud Map private namespace dev.local already exists in the account.

Create a Service with a dotted hostname annotation

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: external-dns-pr-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy-app
  namespace: external-dns-pr-test
spec:
  replicas: 1
  selector:
    matchLabels: { app: dummy-app }
  template:
    metadata:
      labels: { app: dummy-app }
    spec:
      containers:
        - name: busybox
          image: busybox
          command: ["sh", "-c", "while true; do sleep 3600; done"]
          ports: [{ containerPort: 80 }]
---
apiVersion: v1
kind: Service
metadata:
  name: dummy-svc
  namespace: external-dns-pr-test
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "pr-test.dotfix.dev.local"
spec:
  type: ClusterIP
  clusterIP: None
  selector: { app: dummy-app }
  ports: [{ port: 80 }]
EOF

external-dns logs

Service created with the dotted name pr-test.dotfix — no "Skipping record" warning:

level=info msg="Creating a new service \"pr-test.dotfix\" in \"ns-XXXXXXXX\" namespace"
level=info msg="Registering a new instance \"10.x.x.x\" for service \"pr-test.dotfix\" (srv-XXXXXXXX)"

Cloud Map — service created

aws servicediscovery list-services \
  --filters Name=NAMESPACE_ID,Values=<dev.local-ns-id> \
  --query 'Services[?Name==`pr-test.dotfix`].{Name:Name,RoutingPolicy:DnsConfig.RoutingPolicy,RecordType:DnsConfig.DnsRecords[0].Type,TTL:DnsConfig.DnsRecords[0].TTL}'
[{ "Name": "pr-test.dotfix", "RoutingPolicy": "MULTIVALUE", "RecordType": "A", "TTL": 300 }]

Cloud Map — instance registered

aws servicediscovery discover-instances \
  --namespace-name dev.local --service-name pr-test.dotfix
{
    "Instances": [
        {
            "InstanceId": "10.x.x.x",
            "NamespaceName": "dev.local",
            "ServiceName": "pr-test.dotfix",
            "HealthStatus": "UNKNOWN",
            "Attributes": { "AWS_INSTANCE_IPV4": "10.x.x.x" }
        }
    ]
}

@am-ltk am-ltk requested a review from ivankatliarchuk April 12, 2026 20:48
@ivankatliarchuk
Copy link
Copy Markdown
Member

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ivankatliarchuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
Comment thread provider/awssd/aws_sd_test.go Outdated
Callers already trim trailing dots before invoking parseNamespace, but
the function itself silently returns the wrong namespace when given a
FQDN with a trailing dot (strings.HasSuffix misses the match). Add a
defensive TrimSuffix at the top so parseNamespace is self-contained,
and update the test to assert correct behavior.

Made-with: Cursor
@am-ltk am-ltk requested a review from ivankatliarchuk April 13, 2026 15:55
@ivankatliarchuk
Copy link
Copy Markdown
Member

ivankatliarchuk commented Apr 30, 2026

3 weeks since approval

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 30, 2026
@k8s-ci-robot k8s-ci-robot merged commit d37a832 into kubernetes-sigs:master Apr 30, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. provider Issues or PRs related to a provider size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws-sd provider: parseHostname fails for dotted Cloud Map service names

4 participants