refactor(aws): extract and restructure alias-handling logic to enable safe upcoming fixes #6021

u-kai · 2025-12-12T10:46:03Z

What does it do?

This PR refactors the alias-handling logic inside adjustEndpointAndNewAaaaIfNeeded and prepares the codebase for upcoming fixes in #5997 and #6017.

To ensure safety and correctness, the refactoring was performed in multiple verifiable stages:

Extracted the existing logic
In commit 7fda240, the original implementation was split into
oldAdjustEndpointAndNewAaaaIfNeeded and a new adjustEndpointAndNewAaaaIfNeeded,
with the latter simply delegating to the former.
Added comprehensive compatibility tests
These tests verified that both functions returned identical results for all meaningful input patterns.
This test suite ensured that extraction did not introduce any behavioral changes.
Refactored the new implementation
In commit b50b0a5, the new function was rewritten for clarity and testability.
The compatibility tests confirmed the refactored logic remained fully identical to the old version.
Removed the old logic and updated tests
After strict behavioral parity was confirmed, the old function and compatibility tests were removed.
The test suite was rewritten to validate the intended behavior directly.

This staged process allowed the refactoring to be completed safely while maintaining full functional equivalence throughout the transition.

Motivation

This work originates from the discussion here:
#5997 (comment)

To make future fixes #5997 and #6017 easier and safer to apply, the logic needed to be isolated, simplified, and thoroughly tested.

More

Yes, this PR title follows Conventional Commits
Yes, I added unit tests
Yes, I updated end user documentation accordingly

Signed-off-by: u-kai <[email protected]>

k8s-ci-robot · 2025-12-12T10:46:16Z

Hi @u-kai. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ivankatliarchuk · 2025-12-13T11:03:48Z

/ok-to-test

coveralls · 2025-12-13T11:09:29Z

Pull Request Test Coverage Report for Build 20164321325

Details

0 of 0 changed or added relevant lines in 0 files are covered.
33 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.04%) to 78.728%

Files with Coverage Reduction	New Missed Lines	%
testutils/endpoint.go	6	77.42%
aws/aws.go	27	91.47%

Totals
Change from base Build 20163464704:	0.04%
Covered Lines:	16040
Relevant Lines:	20374

💛 - Coveralls

ivankatliarchuk

Probably not a big deal, there were some missing lines not covered with tests

before

with the change

u-kai · 2025-12-13T12:40:06Z

@ivankatliarchuk

Thank you for taking a look.

I don’t think the missing test coverage in the previous tests is a problem, since the behavior between the previous implementation and the refactored one has already been verified to be identical by the comprehensive tests added in commit b50b0a.

The currently untested case is essentially:

if providerSpecificAlias does not exist, nothing happens

if it does exist, it gets removed

In both cases, the final outcome is the same — providerSpecificAlias is absent.
Because of that, my personal view is that adding a dedicated test for this scenario may not be strictly necessary.

That said, if you think it should be covered explicitly, I’m happy to add a test.
What do you think?

ivankatliarchuk · 2025-12-14T15:04:57Z

/lgtm

ivankatliarchuk · 2025-12-24T09:03:43Z

/lgtm cancel

ivankatliarchuk · 2025-12-24T09:14:30Z

Confusing mutation + unclear return semantics + internal functions in methods + side effects. The current design makes the code hard to reason about because callers must track both what happens to their input parameter AND what comes back from the return value.

u-kai · 2025-12-24T23:54:02Z

@ivankatliarchuk

Based on your feedback, I tried to make the behavior more predictable from the function signatures by splitting responsibilities.

adjustXXXRecord helpers perform in-place normalization only.
AdjustEndpoints is the aggregation layer that decides whether to append additional endpoints (e.g., the extra AAAA for CNAME alias).

Mutation is still required here because the existing Endpoint objects need to be modified, but I’ve confined mutations to the adjustXXXRecord helpers and made the “add an extra AAAA” behavior explicit at the aggregation layer.

func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
	// Holds CNAME targets that we will treat as Alias records. Such records are
	// hard coded to 'A' type aliases but we also need their 'AAAA' counterparts.
	var aliasCnameAaaaEndpoints []*endpoint.Endpoint

	for _, ep := range endpoints {
		switch ep.RecordType {
		case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
			p.adjustAOrAAAARecord(ep)
		case endpoint.RecordTypeCNAME:
			p.adjustCnameRecord(ep)
			if aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias); aliasString == "true" {
				aliasCnameAaaaEndpoints = append(aliasCnameAaaaEndpoints, &endpoint.Endpoint{
					DNSName:          ep.DNSName,
					Targets:          ep.Targets,
					RecordType:       endpoint.RecordTypeAAAA,
					RecordTTL:        ep.RecordTTL,
					Labels:           ep.Labels,
					ProviderSpecific: ep.ProviderSpecific,
					SetIdentifier:    ep.SetIdentifier,
				})
			}
		default:
			p.adjustOtherRecord(ep)
		}
	}
	return append(endpoints, aliasCnameAaaaEndpoints...), nil
}

If this approach looks reasonable to you, I’m happy to update the code accordingly.

ivankatliarchuk

The aws provider is quite messy. It's not this PR, but the complexity of the original code is just too high in my opinion

ivankatliarchuk · 2025-12-31T08:10:57Z

provider/aws/aws.go

+			log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
+			ep.RecordTTL = defaultTTL
+		}
+		if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok {


ep.GetProviderSpecificProperty

As from first look this introduce buggy logic like

if prop != "true" && prop != "false" {

What we could have

// GetBoolProperty returns a boolean provider-specific property value. func (e *Endpoint) GetBoolProviderSpecificProperty(key string) (value bool, ok bool) { prop, exists := e.GetProviderSpecificProperty(key) if !exists { return false, false } switch prop { case "true": return true, true case "false": return false, true default: return false, false } }

Thanks — I agree the if prop != "true" && prop != "false" style is harder to read at a glance.

That said, I’m not sure it’s actually buggy by itself. As long as the code treats any non-"true"/"false" value as “not a valid boolean” (i.e., ok=false) and doesn’t silently coerce it, it should behave correctly. So my concern here is more about readability / maintainability than a functional bug.

Also, from a quick scan of the codebase it looks like reading provider-specific boolean values ("true"/"false") is currently only needed for AWS alias-related logic. Given that, I’m not yet convinced it’s worth adding an Endpoint method for this (it might leak provider-specific concerns into the core type).

I'm not aligned with this change and think it needs reconsideration.

Create a PR with boolean abstracted, so no strings. Will see what other reviewers think about it.

I’ve opened a separate PR to explore the boolean abstraction (no string comparisons),
as discussed. Please take a look when you have time.
#6078

Once that PR is merged, I’ll follow up on this PR with commits
that incorporate your suggestions while keeping the approach I proposed here.

ivankatliarchuk · 2025-12-31T08:20:34Z

provider/aws/aws.go

+	setAliasConf := func(ep *endpoint.Endpoint) {
+		if ep.RecordTTL.IsConfigured() {
+			log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
+			ep.RecordTTL = defaultTTL


This does not seems correct. Why we unsetting TTL if it's already IsConfigured?

I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers

// COMMENT func setAliasConfiguration(ep *endpoint.Endpoint) not-sure-what-return-type-is { if ep.RecordTTL.IsConfigured() { log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL) ep.RecordTTL = defaultTTL } .....other logic... } + tests

This does not seems correct. Why we unsetting TTL if it's already IsConfigured?

According to the Route 53 documentation:

If an alias record points to an AWS resource, you can't set the time to live (TTL); Route 53 uses the default TTL for the resource.

Based on this behavior, the existing code unsets TTL to avoid exposing a value that will not be honored by the provider.
For this PR, my goal is to refactor the code without changing behavior, so I intentionally kept the same semantics.
Given the Route 53 TTL behavior for alias records, I think this is a reasonable and consistent choice.

I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers

Thanks for raising this concern. I understand the general caution around helpers that mutate their inputs. In this case, the helper is intentionally scoped to alias records only, and it normalizes fields (like TTL) that are inherently constrained by Route53 alias semantics. I may be missing something, but I’m not seeing a concrete failure mode with this approach yet. If there are Go-specific guidelines or examples where this pattern has caused issues, I’d appreciate a pointer so I can better align with idiomatic expectations here.

ivankatliarchuk · 2025-12-31T08:26:22Z

provider/aws/aws.go

-				log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
-				ep.RecordTTL = defaultTTL
+	}
+	cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint {


Another helper function, that uses another helper function..... What could go wrong?

I’d like to clarify the concern.

Is the issue:

having helper functions at all,

defining helper functions inside another function,

or helper functions calling other helpers?

And more importantly, what concrete downside do you see here?
Readability, hidden side effects, testability, or something else?

Function nesting is just evil le's say in my opinion. Flatter structure, easier to read. No closure overhead means, methods are defined once on the type, not created per function call + more efficient memory usage

And nested functions hard to test, especially as there is less isolation.

ivankatliarchuk · 2025-12-31T08:37:17Z

provider/aws/aws.go

+			ProviderSpecific: ep.ProviderSpecific,
+			SetIdentifier:    ep.SetIdentifier,
+		}
+		ep.RecordType = endpoint.RecordTypeA


I don't understand the logic behind this function.

Setting alias, creating shallow copy of an endpoint, then changing outerscope/level-up ep endpoint to A record, and returning shallow copy. Is it even a helper function?

Not sure maybe deepcopy is not what we are after

// Simple, safe, maintainable func createAAAATwin(ep *endpoint.Endpoint) *endpoint.Endpoint { aaaa := ep.DeepCopy() aaaa.RecordType = endpoint.RecordTypeAAAA return aaaa }

As currently

cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint { setAliasConf(ep) result := &endpoint.Endpoint{ DNSName: ep.DNSName, Targets: ep.Targets, // Shallow copy - shared! RecordType: endpoint.RecordTypeAAAA, RecordTTL: ep.RecordTTL, Labels: ep.Labels, // Shallow copy - shared! ProviderSpecific: ep.ProviderSpecific, // Shallow copy - shared! SetIdentifier: ep.SetIdentifier, } ep.RecordType = endpoint.RecordTypeA return result }

so every time we apply functions with side effects, it could lead to state change.

k8s-ci-robot · 2025-12-31T08:39:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ivankatliarchuk · 2026-01-02T14:16:25Z

provider/aws/aws.go

+			log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
+			ep.RecordTTL = defaultTTL
+		}
+		if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok {


I'm not aligned with this change and think it needs reconsideration.

ivankatliarchuk · 2026-01-02T14:58:53Z

Due to initial complexity of aws.go, I'm struggling with the code. I think it's the size and all this features that aws supports.

Currently there is endpoint object and dozens of functional operations happening.
Maybe we could try another PR with more OOP like change? I'm not sure how it's allign with your follow up plans though.

something like provider/aws/type.go

// EndpointAdjuster handles endpoint adjustments for AWS Route53
  type EndpointAdjuster struct {
  	evaluateTargetHealth bool
  	preferCNAME          bool
  }

// AdjustmentResult encapsulates the result of adjusting an endpoint
  type AdjustmentResult struct {
  	ModifiedEndpoint *endpoint.Endpoint
  	AdditionalAAAA   *endpoint.Endpoint
  }

// Adjust processes an endpoint and returns the adjustment result
  func (a *EndpointAdjuster) Adjust(ep *endpoint.Endpoint) *AdjustmentResult {
  	result := &AdjustmentResult{
  		ModifiedEndpoint: ep,
  	}

  	handler := a.getHandlerForRecordType(ep.RecordType)
  	result.AdditionalAAAA = handler.Handle(ep, a)

  	adjustGeoProximityLocationEndpoint(ep)
  	return result
  }

// RecordTypeHandler defines the interface for handling different record types
  type RecordTypeHandler interface {
  	Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint
  }

func (h *ARecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, ok := ep.GetProviderSpecificProperty(providerSpecificAlias)

  	switch aliasString {
  	  .. do magic...
  	}

  	return nil // A/AAAA records don't create additional endpoints
  }

  // CNAMERecordHandler handles CNAME records
  type CNAMERecordHandler struct{}

  func (h *CNAMERecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)

  	switch aliasString {
......  	
  	}
  }

// DefaultRecordHandler handles all other record types
  type DefaultRecordHandler struct{}

 func (h *DefaultRecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)
       .... 	
  }

  // Helper methods on EndpointAdjuster

  func (a *EndpointAdjuster) getHandlerForRecordType(recordType string) RecordTypeHandler {
  	switch recordType {
  	case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
  		return &ARecordHandler{}
  	case endpoint.RecordTypeCNAME:
  		return &CNAMERecordHandler{}
  	default:
  		return &DefaultRecordHandler{}
  	}
  }

 func (a *EndpointAdjuster) convertCNAMEToAliasWithAAAA(ep *endpoint.Endpoint) *endpoint.Endpoint {
  	a.applyAliasConfiguration(ep)

  	// Create AAAA copy before modifying the original
  	aaaa := &endpoint.Endpoint{
  		DNSName:          ep.DNSName,
  		Targets:          ep.Targets,
  		RecordType:       endpoint.RecordTypeAAAA,
  		RecordTTL:        ep.RecordTTL,
  		Labels:           ep.Labels,
  		ProviderSpecific: ep.ProviderSpecific,
  		SetIdentifier:    ep.SetIdentifier,
  	}

  	// Convert CNAME to A
  	ep.RecordType = endpoint.RecordTypeA

  	return aaaa
  }

Where each record type has its own handler class (ARecordHandler, CNAMERecordHandler, DefaultRecordHandler). Each handler is independently testable + easy to add new record types or modify behavior. Each handler focuses on one record type + separation between configuration and execution + EndpointAdjuster coordinates but doesn't implement logic.

Explicit steate management

EndpointAdjuster struct holds configuration (no closure captures)
AdjustmentResult makes return values explicit
No hidden state in closures
Can add middleware/decorators

and in aws/aws.go

NewAWSProvider() {
  adjuster := NewEndpointAdjuster(p.evaluateTargetHealth, p.preferCNAME)
}

// Updated AWSProvider method
  func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
  	var additionalEndpoints []*endpoint.Endpoint

  	for _, ep := range endpoints {
  		result := p.adjuster.Adjust(ep)
  		if result.AdditionalAAAA != nil {
  			additionalEndpoints = append(additionalEndpoints, result.AdditionalAAAA)
  		}
  	}

  	return append(endpoints, additionalEndpoints...), nil
  }

Or simple single EndpointAdjuster struct with methods, and replace all clousures with named methods.

// EndpointAdjuster encapsulates endpoint adjustment logic
  type EndpointAdjuster struct {
  	ep                   *endpoint.Endpoint
  	evaluateTargetHealth bool
  	preferCNAME          bool
  }

  // AdjustAndGetAAAA adjusts the endpoint in place and returns an optional AAAA record
  func (a *EndpointAdjuster) AdjustAndGetAAAA() *endpoint.Endpoint {
  	var additionalAAAA *endpoint.Endpoint

  	switch a.ep.RecordType {
  	case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
  		a.handleARecord()
  	case endpoint.RecordTypeCNAME:
  		additionalAAAA = a.handleCNAMERecord()
  	default:
  		a.handleOtherRecord()
  	}

  	adjustGeoProximityLocationEndpoint(a.ep)
  	return additionalAAAA
  }

  func (a *EndpointAdjuster) handleARecord() {
  	aliasString, ok := a.getAlias()

  	switch aliasString {
  	case "true":
  		setAliasConfiguration
  	case "":
  		if ok {
  			deleteAlias
  		}
  		deleteEvaluateTargetHealth
  	default:
  		deleteAlias
  		deleteEvaluateTargetHealth
  	}
  }

 func (a *EndpointAdjuster) handleOtherRecord() {
  	// TODO: fix For records other than A, AAAA, and CNAME, if an alias record is set,
  	// the alias record processing is not performed. This will be fixed in another PR.
  	aliasString, _ := a.getAlias()

  	switch aliasString {
  	case "true":
  		.. all what is required ...
  	default:
  		... all what is required ...
  	}
  }

// Usage in AWSProvider
  func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
  	var aliasCnameAaaaEndpoints []*endpoint.Endpoint

  	for _, ep := range endpoints {
  		adjuster := NewEndpointAdjuster(ep, p.evaluateTargetHealth, p.preferCNAME)
  		if aaaa := adjuster.AdjustAndGetAAAA(); aaaa != nil {
  			aliasCnameAaaaEndpoints = append(aliasCnameAaaaEndpoints, aaaa)
  		}
  	}

  	return append(endpoints, aliasCnameAaaaEndpoints...), nil
  }

I do agree that functional has less bolierplate and less interface overhead + all logic in single function.

ivankatliarchuk · 2026-01-02T15:06:12Z

There are multiple solutions as to any problem. Probably your current approach is just good enough. But if we could have 2 PRs, and w8 for other reviewers to decide what approach they prefere.

u-kai added 3 commits December 11, 2025 07:56

test: add regression test to ensure behavior before refactor

7fda240

Signed-off-by: u-kai <[email protected]>

refactor aws adjustEndpointAndAaaaIfNeeded

b50b0a5

Signed-off-by: u-kai <[email protected]>

test(aws): add comprehensive tests and remove old logic

0033044

Signed-off-by: u-kai <[email protected]>

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 12, 2025

k8s-ci-robot requested a review from mloiseleur December 12, 2025 10:46

k8s-ci-robot added the internal Issues or PRs related to internal code label Dec 12, 2025

k8s-ci-robot requested a review from szuecs December 12, 2025 10:46

k8s-ci-robot added provider Issues or PRs related to a provider size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 12, 2025

u-kai mentioned this pull request Dec 12, 2025

fix(aws): incorrect behavior for non-aliasable record types #6017

Open

3 tasks

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 13, 2025

ivankatliarchuk suggested changes Dec 13, 2025

View reviewed changes

k8s-ci-robot assigned ivankatliarchuk Dec 13, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 14, 2025

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 24, 2025

ivankatliarchuk suggested changes Dec 31, 2025

View reviewed changes

ivankatliarchuk suggested changes Jan 2, 2026

View reviewed changes

u-kai mentioned this pull request Jan 4, 2026

refactor(aws): abstract provider-specific boolean parsing #6078

Open

3 tasks

refactor(aws): extract and restructure alias-handling logic to enable safe upcoming fixes #6021

Are you sure you want to change the base?

refactor(aws): extract and restructure alias-handling logic to enable safe upcoming fixes #6021

Conversation

u-kai commented Dec 12, 2025

What does it do?

Motivation

More

Uh oh!

k8s-ci-robot commented Dec 12, 2025

Uh oh!

ivankatliarchuk commented Dec 13, 2025

Uh oh!

coveralls commented Dec 13, 2025

Pull Request Test Coverage Report for Build 20164321325

Details

💛 - Coveralls

Uh oh!

ivankatliarchuk left a comment

Choose a reason for hiding this comment

Uh oh!

u-kai commented Dec 13, 2025

Uh oh!

ivankatliarchuk commented Dec 14, 2025

Uh oh!

ivankatliarchuk commented Dec 24, 2025

Uh oh!

ivankatliarchuk commented Dec 24, 2025

Uh oh!

u-kai commented Dec 24, 2025

Uh oh!

ivankatliarchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Dec 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivankatliarchuk commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivankatliarchuk commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ivankatliarchuk commented Jan 2, 2026 •

edited

Loading

ivankatliarchuk commented Jan 2, 2026 •

edited

Loading