Skip to content

Conversation

@u-kai
Copy link
Contributor

@u-kai u-kai commented Dec 12, 2025

What does it do?

This PR refactors the alias-handling logic inside adjustEndpointAndNewAaaaIfNeeded and prepares the codebase for upcoming fixes in #5997 and #6017.

To ensure safety and correctness, the refactoring was performed in multiple verifiable stages:

  1. Extracted the existing logic
    In commit 7fda240, the original implementation was split into
    oldAdjustEndpointAndNewAaaaIfNeeded and a new adjustEndpointAndNewAaaaIfNeeded,
    with the latter simply delegating to the former.

  2. Added comprehensive compatibility tests
    These tests verified that both functions returned identical results for all meaningful input patterns.
    This test suite ensured that extraction did not introduce any behavioral changes.

  3. Refactored the new implementation
    In commit b50b0a5, the new function was rewritten for clarity and testability.
    The compatibility tests confirmed the refactored logic remained fully identical to the old version.

  4. Removed the old logic and updated tests
    After strict behavioral parity was confirmed, the old function and compatibility tests were removed.
    The test suite was rewritten to validate the intended behavior directly.

This staged process allowed the refactoring to be completed safely while maintaining full functional equivalence throughout the transition.

Motivation

This work originates from the discussion here:
#5997 (comment)

To make future fixes #5997 and #6017 easier and safer to apply, the logic needed to be isolated, simplified, and thoroughly tested.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 12, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @u-kai. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the internal Issues or PRs related to internal code label Dec 12, 2025
@k8s-ci-robot k8s-ci-robot requested a review from szuecs December 12, 2025 10:46
@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 12, 2025
@ivankatliarchuk
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 13, 2025
@coveralls
Copy link

Pull Request Test Coverage Report for Build 20164321325

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 33 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.04%) to 78.728%

Files with Coverage Reduction New Missed Lines %
testutils/endpoint.go 6 77.42%
aws/aws.go 27 91.47%
Totals Coverage Status
Change from base Build 20163464704: 0.04%
Covered Lines: 16040
Relevant Lines: 20374

💛 - Coveralls

Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a big deal, there were some missing lines not covered with tests

before
Screenshot 2025-12-13 at 11 07 39

with the change
Screenshot 2025-12-13 at 11 10 06

@u-kai
Copy link
Contributor Author

u-kai commented Dec 13, 2025

@ivankatliarchuk

Thank you for taking a look.

I don’t think the missing test coverage in the previous tests is a problem, since the behavior between the previous implementation and the refactored one has already been verified to be identical by the comprehensive tests added in commit b50b0a.

The currently untested case is essentially:

if providerSpecificAlias does not exist, nothing happens

if it does exist, it gets removed

In both cases, the final outcome is the same — providerSpecificAlias is absent.
Because of that, my personal view is that adding a dedicated test for this scenario may not be strictly necessary.

That said, if you think it should be covered explicitly, I’m happy to add a test.
What do you think?

@ivankatliarchuk
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 14, 2025
@ivankatliarchuk
Copy link
Member

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 24, 2025
@ivankatliarchuk
Copy link
Member

Confusing mutation + unclear return semantics + internal functions in methods + side effects. The current design makes the code hard to reason about because callers must track both what happens to their input parameter AND what comes back from the return value.

@u-kai
Copy link
Contributor Author

u-kai commented Dec 24, 2025

@ivankatliarchuk

Based on your feedback, I tried to make the behavior more predictable from the function signatures by splitting responsibilities.

  • adjustXXXRecord helpers perform in-place normalization only.
  • AdjustEndpoints is the aggregation layer that decides whether to append additional endpoints (e.g., the extra AAAA for CNAME alias).

Mutation is still required here because the existing Endpoint objects need to be modified, but I’ve confined mutations to the adjustXXXRecord helpers and made the “add an extra AAAA” behavior explicit at the aggregation layer.

func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
	// Holds CNAME targets that we will treat as Alias records. Such records are
	// hard coded to 'A' type aliases but we also need their 'AAAA' counterparts.
	var aliasCnameAaaaEndpoints []*endpoint.Endpoint

	for _, ep := range endpoints {
		switch ep.RecordType {
		case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
			p.adjustAOrAAAARecord(ep)
		case endpoint.RecordTypeCNAME:
			p.adjustCnameRecord(ep)
			if aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias); aliasString == "true" {
				aliasCnameAaaaEndpoints = append(aliasCnameAaaaEndpoints, &endpoint.Endpoint{
					DNSName:          ep.DNSName,
					Targets:          ep.Targets,
					RecordType:       endpoint.RecordTypeAAAA,
					RecordTTL:        ep.RecordTTL,
					Labels:           ep.Labels,
					ProviderSpecific: ep.ProviderSpecific,
					SetIdentifier:    ep.SetIdentifier,
				})
			}
		default:
			p.adjustOtherRecord(ep)
		}
	}
	return append(endpoints, aliasCnameAaaaEndpoints...), nil
}

If this approach looks reasonable to you, I’m happy to update the code accordingly.

Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aws provider is quite messy. It's not this PR, but the complexity of the original code is just too high in my opinion

log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
ep.RecordTTL = defaultTTL
}
if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep.GetProviderSpecificProperty

As from first look this introduce buggy logic like

if prop != "true" && prop != "false" {

What we could have

  // GetBoolProperty returns a boolean provider-specific property value.
  func (e *Endpoint) GetBoolProviderSpecificProperty(key string) (value bool, ok bool) {
      prop, exists := e.GetProviderSpecificProperty(key)
      if !exists {
          return false, false
      }
      switch prop {
      case "true":
          return true, true
      case "false":
          return false, true
      default:
          return false, false
      }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — I agree the if prop != "true" && prop != "false" style is harder to read at a glance.

That said, I’m not sure it’s actually buggy by itself. As long as the code treats any non-"true"/"false" value as “not a valid boolean” (i.e., ok=false) and doesn’t silently coerce it, it should behave correctly. So my concern here is more about readability / maintainability than a functional bug.

Also, from a quick scan of the codebase it looks like reading provider-specific boolean values ("true"/"false") is currently only needed for AWS alias-related logic. Given that, I’m not yet convinced it’s worth adding an Endpoint method for this (it might leak provider-specific concerns into the core type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aligned with this change and think it needs reconsideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a PR with boolean abstracted, so no strings. Will see what other reviewers think about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve opened a separate PR to explore the boolean abstraction (no string comparisons),
as discussed. Please take a look when you have time.
#6078

Once that PR is merged, I’ll follow up on this PR with commits
that incorporate your suggestions while keeping the approach I proposed here.

setAliasConf := func(ep *endpoint.Endpoint) {
if ep.RecordTTL.IsConfigured() {
log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
ep.RecordTTL = defaultTTL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seems correct. Why we unsetting TTL if it's already IsConfigured?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers

// COMMENT
func setAliasConfiguration(ep *endpoint.Endpoint)  not-sure-what-return-type-is {
      if ep.RecordTTL.IsConfigured() {
          log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
          ep.RecordTTL = defaultTTL
      }
      .....other logic...
  }

+ tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seems correct. Why we unsetting TTL if it's already IsConfigured?

According to the Route 53 documentation:

If an alias record points to an AWS resource, you can't set the time to live (TTL); Route 53 uses the default TTL for the resource.

Based on this behavior, the existing code unsets TTL to avoid exposing a value that will not be honored by the provider.
For this PR, my goal is to refactor the code without changing behavior, so I intentionally kept the same semantics.
Given the Route 53 TTL behavior for alias records, I think this is a reasonable and consistent choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers

Thanks for raising this concern. I understand the general caution around helpers that mutate their inputs. In this case, the helper is intentionally scoped to alias records only, and it normalizes fields (like TTL) that are inherently constrained by Route53 alias semantics. I may be missing something, but I’m not seeing a concrete failure mode with this approach yet. If there are Go-specific guidelines or examples where this pattern has caused issues, I’d appreciate a pointer so I can better align with idiomatic expectations here.

log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
ep.RecordTTL = defaultTTL
}
cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another helper function, that uses another helper function..... What could go wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to clarify the concern.

Is the issue:

  • having helper functions at all,
  • defining helper functions inside another function,
  • or helper functions calling other helpers?

And more importantly, what concrete downside do you see here?
Readability, hidden side effects, testability, or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function nesting is just evil le's say in my opinion. Flatter structure, easier to read. No closure overhead means, methods are defined once on the type, not created per function call + more efficient memory usage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And nested functions hard to test, especially as there is less isolation.

ProviderSpecific: ep.ProviderSpecific,
SetIdentifier: ep.SetIdentifier,
}
ep.RecordType = endpoint.RecordTypeA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the logic behind this function.

Setting alias, creating shallow copy of an endpoint, then changing outerscope/level-up ep endpoint to A record, and returning shallow copy. Is it even a helper function?

Not sure maybe deepcopy is not what we are after

 // Simple, safe, maintainable
  func createAAAATwin(ep *endpoint.Endpoint) *endpoint.Endpoint {
      aaaa := ep.DeepCopy()
      aaaa.RecordType = endpoint.RecordTypeAAAA
      return aaaa
  }

As currently

  cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint {
      setAliasConf(ep)
      result := &endpoint.Endpoint{
          DNSName:          ep.DNSName,
          Targets:          ep.Targets,          // Shallow copy - shared!
          RecordType:       endpoint.RecordTypeAAAA,
          RecordTTL:        ep.RecordTTL,
          Labels:           ep.Labels,           //  Shallow copy - shared!
          ProviderSpecific: ep.ProviderSpecific, // Shallow copy - shared!
          SetIdentifier:    ep.SetIdentifier,
      }
      ep.RecordType = endpoint.RecordTypeA
      return result
  }

so every time we apply functions with side effects, it could lead to state change.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
ep.RecordTTL = defaultTTL
}
if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aligned with this change and think it needs reconsideration.

@ivankatliarchuk
Copy link
Member

ivankatliarchuk commented Jan 2, 2026

Due to initial complexity of aws.go, I'm struggling with the code. I think it's the size and all this features that aws supports.

Currently there is endpoint object and dozens of functional operations happening.
Maybe we could try another PR with more OOP like change? I'm not sure how it's allign with your follow up plans though.

something like provider/aws/type.go

// EndpointAdjuster handles endpoint adjustments for AWS Route53
  type EndpointAdjuster struct {
  	evaluateTargetHealth bool
  	preferCNAME          bool
  }

// AdjustmentResult encapsulates the result of adjusting an endpoint
  type AdjustmentResult struct {
  	ModifiedEndpoint *endpoint.Endpoint
  	AdditionalAAAA   *endpoint.Endpoint
  }

// Adjust processes an endpoint and returns the adjustment result
  func (a *EndpointAdjuster) Adjust(ep *endpoint.Endpoint) *AdjustmentResult {
  	result := &AdjustmentResult{
  		ModifiedEndpoint: ep,
  	}

  	handler := a.getHandlerForRecordType(ep.RecordType)
  	result.AdditionalAAAA = handler.Handle(ep, a)

  	adjustGeoProximityLocationEndpoint(ep)
  	return result
  }

// RecordTypeHandler defines the interface for handling different record types
  type RecordTypeHandler interface {
  	Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint
  }

func (h *ARecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, ok := ep.GetProviderSpecificProperty(providerSpecificAlias)

  	switch aliasString {
  	  .. do magic...
  	}

  	return nil // A/AAAA records don't create additional endpoints
  }

  // CNAMERecordHandler handles CNAME records
  type CNAMERecordHandler struct{}

  func (h *CNAMERecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)

  	switch aliasString {
......  	
  	}
  }

// DefaultRecordHandler handles all other record types
  type DefaultRecordHandler struct{}

 func (h *DefaultRecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
  	aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)
       .... 	
  }

  // Helper methods on EndpointAdjuster

  func (a *EndpointAdjuster) getHandlerForRecordType(recordType string) RecordTypeHandler {
  	switch recordType {
  	case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
  		return &ARecordHandler{}
  	case endpoint.RecordTypeCNAME:
  		return &CNAMERecordHandler{}
  	default:
  		return &DefaultRecordHandler{}
  	}
  }

 func (a *EndpointAdjuster) convertCNAMEToAliasWithAAAA(ep *endpoint.Endpoint) *endpoint.Endpoint {
  	a.applyAliasConfiguration(ep)

  	// Create AAAA copy before modifying the original
  	aaaa := &endpoint.Endpoint{
  		DNSName:          ep.DNSName,
  		Targets:          ep.Targets,
  		RecordType:       endpoint.RecordTypeAAAA,
  		RecordTTL:        ep.RecordTTL,
  		Labels:           ep.Labels,
  		ProviderSpecific: ep.ProviderSpecific,
  		SetIdentifier:    ep.SetIdentifier,
  	}

  	// Convert CNAME to A
  	ep.RecordType = endpoint.RecordTypeA

  	return aaaa
  }

Where each record type has its own handler class (ARecordHandler, CNAMERecordHandler, DefaultRecordHandler). Each handler is independently testable + easy to add new record types or modify behavior. Each handler focuses on one record type + separation between configuration and execution + EndpointAdjuster coordinates but doesn't implement logic.

Explicit steate management

  • EndpointAdjuster struct holds configuration (no closure captures)
  • AdjustmentResult makes return values explicit
  • No hidden state in closures
  • Can add middleware/decorators

and in aws/aws.go

NewAWSProvider() {
  adjuster := NewEndpointAdjuster(p.evaluateTargetHealth, p.preferCNAME)
}

// Updated AWSProvider method
  func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
  	var additionalEndpoints []*endpoint.Endpoint

  	for _, ep := range endpoints {
  		result := p.adjuster.Adjust(ep)
  		if result.AdditionalAAAA != nil {
  			additionalEndpoints = append(additionalEndpoints, result.AdditionalAAAA)
  		}
  	}

  	return append(endpoints, additionalEndpoints...), nil
  }

Or simple single EndpointAdjuster struct with methods, and replace all clousures with named methods.

// EndpointAdjuster encapsulates endpoint adjustment logic
  type EndpointAdjuster struct {
  	ep                   *endpoint.Endpoint
  	evaluateTargetHealth bool
  	preferCNAME          bool
  }

  // AdjustAndGetAAAA adjusts the endpoint in place and returns an optional AAAA record
  func (a *EndpointAdjuster) AdjustAndGetAAAA() *endpoint.Endpoint {
  	var additionalAAAA *endpoint.Endpoint

  	switch a.ep.RecordType {
  	case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
  		a.handleARecord()
  	case endpoint.RecordTypeCNAME:
  		additionalAAAA = a.handleCNAMERecord()
  	default:
  		a.handleOtherRecord()
  	}

  	adjustGeoProximityLocationEndpoint(a.ep)
  	return additionalAAAA
  }

  func (a *EndpointAdjuster) handleARecord() {
  	aliasString, ok := a.getAlias()

  	switch aliasString {
  	case "true":
  		setAliasConfiguration
  	case "":
  		if ok {
  			deleteAlias
  		}
  		deleteEvaluateTargetHealth
  	default:
  		deleteAlias
  		deleteEvaluateTargetHealth
  	}
  }

 func (a *EndpointAdjuster) handleOtherRecord() {
  	// TODO: fix For records other than A, AAAA, and CNAME, if an alias record is set,
  	// the alias record processing is not performed. This will be fixed in another PR.
  	aliasString, _ := a.getAlias()

  	switch aliasString {
  	case "true":
  		.. all what is required ...
  	default:
  		... all what is required ...
  	}
  }

// Usage in AWSProvider
  func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
  	var aliasCnameAaaaEndpoints []*endpoint.Endpoint

  	for _, ep := range endpoints {
  		adjuster := NewEndpointAdjuster(ep, p.evaluateTargetHealth, p.preferCNAME)
  		if aaaa := adjuster.AdjustAndGetAAAA(); aaaa != nil {
  			aliasCnameAaaaEndpoints = append(aliasCnameAaaaEndpoints, aaaa)
  		}
  	}

  	return append(endpoints, aliasCnameAaaaEndpoints...), nil
  }

I do agree that functional has less bolierplate and less interface overhead + all logic in single function.

@ivankatliarchuk
Copy link
Member

ivankatliarchuk commented Jan 2, 2026

There are multiple solutions as to any problem. Probably your current approach is just good enough. But if we could have 2 PRs, and w8 for other reviewers to decide what approach they prefere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. internal Issues or PRs related to internal code ok-to-test Indicates a non-member PR verified by an org member that is safe to test. provider Issues or PRs related to a provider size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants