-
Notifications
You must be signed in to change notification settings - Fork 2.8k
refactor(aws): extract and restructure alias-handling logic to enable safe upcoming fixes #6021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: u-kai <[email protected]>
Signed-off-by: u-kai <[email protected]>
Signed-off-by: u-kai <[email protected]>
|
Hi @u-kai. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
Pull Request Test Coverage Report for Build 20164321325Details
💛 - Coveralls |
ivankatliarchuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thank you for taking a look. I don’t think the missing test coverage in the previous tests is a problem, since the behavior between the previous implementation and the refactored one has already been verified to be identical by the comprehensive tests added in commit b50b0a. The currently untested case is essentially: if providerSpecificAlias does not exist, nothing happens if it does exist, it gets removed In both cases, the final outcome is the same — providerSpecificAlias is absent. That said, if you think it should be covered explicitly, I’m happy to add a test. |
|
/lgtm |
|
/lgtm cancel |
|
Confusing mutation + unclear return semantics + internal functions in methods + side effects. The current design makes the code hard to reason about because callers must track both what happens to their input parameter AND what comes back from the return value. |
|
Based on your feedback, I tried to make the behavior more predictable from the function signatures by splitting responsibilities.
Mutation is still required here because the existing Endpoint objects need to be modified, but I’ve confined mutations to the func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
// Holds CNAME targets that we will treat as Alias records. Such records are
// hard coded to 'A' type aliases but we also need their 'AAAA' counterparts.
var aliasCnameAaaaEndpoints []*endpoint.Endpoint
for _, ep := range endpoints {
switch ep.RecordType {
case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
p.adjustAOrAAAARecord(ep)
case endpoint.RecordTypeCNAME:
p.adjustCnameRecord(ep)
if aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias); aliasString == "true" {
aliasCnameAaaaEndpoints = append(aliasCnameAaaaEndpoints, &endpoint.Endpoint{
DNSName: ep.DNSName,
Targets: ep.Targets,
RecordType: endpoint.RecordTypeAAAA,
RecordTTL: ep.RecordTTL,
Labels: ep.Labels,
ProviderSpecific: ep.ProviderSpecific,
SetIdentifier: ep.SetIdentifier,
})
}
default:
p.adjustOtherRecord(ep)
}
}
return append(endpoints, aliasCnameAaaaEndpoints...), nil
}If this approach looks reasonable to you, I’m happy to update the code accordingly. |
ivankatliarchuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aws provider is quite messy. It's not this PR, but the complexity of the original code is just too high in my opinion
| log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL) | ||
| ep.RecordTTL = defaultTTL | ||
| } | ||
| if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ep.GetProviderSpecificProperty
As from first look this introduce buggy logic like
if prop != "true" && prop != "false" {What we could have
// GetBoolProperty returns a boolean provider-specific property value.
func (e *Endpoint) GetBoolProviderSpecificProperty(key string) (value bool, ok bool) {
prop, exists := e.GetProviderSpecificProperty(key)
if !exists {
return false, false
}
switch prop {
case "true":
return true, true
case "false":
return false, true
default:
return false, false
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks — I agree the if prop != "true" && prop != "false" style is harder to read at a glance.
That said, I’m not sure it’s actually buggy by itself. As long as the code treats any non-"true"/"false" value as “not a valid boolean” (i.e., ok=false) and doesn’t silently coerce it, it should behave correctly. So my concern here is more about readability / maintainability than a functional bug.
Also, from a quick scan of the codebase it looks like reading provider-specific boolean values ("true"/"false") is currently only needed for AWS alias-related logic. Given that, I’m not yet convinced it’s worth adding an Endpoint method for this (it might leak provider-specific concerns into the core type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aligned with this change and think it needs reconsideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a PR with boolean abstracted, so no strings. Will see what other reviewers think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve opened a separate PR to explore the boolean abstraction (no string comparisons),
as discussed. Please take a look when you have time.
#6078
Once that PR is merged, I’ll follow up on this PR with commits
that incorporate your suggestions while keeping the approach I proposed here.
| setAliasConf := func(ep *endpoint.Endpoint) { | ||
| if ep.RecordTTL.IsConfigured() { | ||
| log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL) | ||
| ep.RecordTTL = defaultTTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seems correct. Why we unsetting TTL if it's already IsConfigured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers
// COMMENT
func setAliasConfiguration(ep *endpoint.Endpoint) not-sure-what-return-type-is {
if ep.RecordTTL.IsConfigured() {
log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL)
ep.RecordTTL = defaultTTL
}
.....other logic...
}
+ tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seems correct. Why we unsetting TTL if it's already IsConfigured?
According to the Route 53 documentation:
If an alias record points to an AWS resource, you can't set the time to live (TTL); Route 53 uses the default TTL for the resource.
Based on this behavior, the existing code unsets TTL to avoid exposing a value that will not be honored by the provider.
For this PR, my goal is to refactor the code without changing behavior, so I intentionally kept the same semantics.
Given the Route 53 TTL behavior for alias records, I think this is a reasonable and consistent choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this way of writing code will lead to disaster. It's fine to keep something similar in tests, not in the code as it's not idiomatic Go for non-trivial helpers
Thanks for raising this concern. I understand the general caution around helpers that mutate their inputs. In this case, the helper is intentionally scoped to alias records only, and it normalizes fields (like TTL) that are inherently constrained by Route53 alias semantics. I may be missing something, but I’m not seeing a concrete failure mode with this approach yet. If there are Go-specific guidelines or examples where this pattern has caused issues, I’d appreciate a pointer so I can better align with idiomatic expectations here.
| log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL) | ||
| ep.RecordTTL = defaultTTL | ||
| } | ||
| cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another helper function, that uses another helper function..... What could go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to clarify the concern.
Is the issue:
- having helper functions at all,
- defining helper functions inside another function,
- or helper functions calling other helpers?
And more importantly, what concrete downside do you see here?
Readability, hidden side effects, testability, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function nesting is just evil le's say in my opinion. Flatter structure, easier to read. No closure overhead means, methods are defined once on the type, not created per function call + more efficient memory usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And nested functions hard to test, especially as there is less isolation.
| ProviderSpecific: ep.ProviderSpecific, | ||
| SetIdentifier: ep.SetIdentifier, | ||
| } | ||
| ep.RecordType = endpoint.RecordTypeA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the logic behind this function.
Setting alias, creating shallow copy of an endpoint, then changing outerscope/level-up ep endpoint to A record, and returning shallow copy. Is it even a helper function?
Not sure maybe deepcopy is not what we are after
// Simple, safe, maintainable
func createAAAATwin(ep *endpoint.Endpoint) *endpoint.Endpoint {
aaaa := ep.DeepCopy()
aaaa.RecordType = endpoint.RecordTypeAAAA
return aaaa
}
As currently
cnameAliasCase := func(ep *endpoint.Endpoint) *endpoint.Endpoint {
setAliasConf(ep)
result := &endpoint.Endpoint{
DNSName: ep.DNSName,
Targets: ep.Targets, // Shallow copy - shared!
RecordType: endpoint.RecordTypeAAAA,
RecordTTL: ep.RecordTTL,
Labels: ep.Labels, // Shallow copy - shared!
ProviderSpecific: ep.ProviderSpecific, // Shallow copy - shared!
SetIdentifier: ep.SetIdentifier,
}
ep.RecordType = endpoint.RecordTypeA
return result
}so every time we apply functions with side effects, it could lead to state change.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| log.Debugf("Modifying endpoint: %v, setting ttl=%v", ep, defaultTTL) | ||
| ep.RecordTTL = defaultTTL | ||
| } | ||
| if prop, ok := ep.GetProviderSpecificProperty(providerSpecificEvaluateTargetHealth); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aligned with this change and think it needs reconsideration.
|
Due to initial complexity of aws.go, I'm struggling with the code. I think it's the size and all this features that aws supports. Currently there is endpoint object and dozens of functional operations happening. something like // EndpointAdjuster handles endpoint adjustments for AWS Route53
type EndpointAdjuster struct {
evaluateTargetHealth bool
preferCNAME bool
}
// AdjustmentResult encapsulates the result of adjusting an endpoint
type AdjustmentResult struct {
ModifiedEndpoint *endpoint.Endpoint
AdditionalAAAA *endpoint.Endpoint
}
// Adjust processes an endpoint and returns the adjustment result
func (a *EndpointAdjuster) Adjust(ep *endpoint.Endpoint) *AdjustmentResult {
result := &AdjustmentResult{
ModifiedEndpoint: ep,
}
handler := a.getHandlerForRecordType(ep.RecordType)
result.AdditionalAAAA = handler.Handle(ep, a)
adjustGeoProximityLocationEndpoint(ep)
return result
}
// RecordTypeHandler defines the interface for handling different record types
type RecordTypeHandler interface {
Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint
}
func (h *ARecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
aliasString, ok := ep.GetProviderSpecificProperty(providerSpecificAlias)
switch aliasString {
.. do magic...
}
return nil // A/AAAA records don't create additional endpoints
}
// CNAMERecordHandler handles CNAME records
type CNAMERecordHandler struct{}
func (h *CNAMERecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)
switch aliasString {
......
}
}
// DefaultRecordHandler handles all other record types
type DefaultRecordHandler struct{}
func (h *DefaultRecordHandler) Handle(ep *endpoint.Endpoint, adjuster *EndpointAdjuster) *endpoint.Endpoint {
aliasString, _ := ep.GetProviderSpecificProperty(providerSpecificAlias)
....
}
// Helper methods on EndpointAdjuster
func (a *EndpointAdjuster) getHandlerForRecordType(recordType string) RecordTypeHandler {
switch recordType {
case endpoint.RecordTypeA, endpoint.RecordTypeAAAA:
return &ARecordHandler{}
case endpoint.RecordTypeCNAME:
return &CNAMERecordHandler{}
default:
return &DefaultRecordHandler{}
}
}
func (a *EndpointAdjuster) convertCNAMEToAliasWithAAAA(ep *endpoint.Endpoint) *endpoint.Endpoint {
a.applyAliasConfiguration(ep)
// Create AAAA copy before modifying the original
aaaa := &endpoint.Endpoint{
DNSName: ep.DNSName,
Targets: ep.Targets,
RecordType: endpoint.RecordTypeAAAA,
RecordTTL: ep.RecordTTL,
Labels: ep.Labels,
ProviderSpecific: ep.ProviderSpecific,
SetIdentifier: ep.SetIdentifier,
}
// Convert CNAME to A
ep.RecordType = endpoint.RecordTypeA
return aaaa
}Where each record type has its own handler class (ARecordHandler, CNAMERecordHandler, DefaultRecordHandler). Each handler is independently testable + easy to add new record types or modify behavior. Each handler focuses on one record type + separation between configuration and execution + EndpointAdjuster coordinates but doesn't implement logic. Explicit steate management
and in NewAWSProvider() {
adjuster := NewEndpointAdjuster(p.evaluateTargetHealth, p.preferCNAME)
}
// Updated AWSProvider method
func (p *AWSProvider) AdjustEndpoints(endpoints []*endpoint.Endpoint) ([]*endpoint.Endpoint, error) {
var additionalEndpoints []*endpoint.Endpoint
for _, ep := range endpoints {
result := p.adjuster.Adjust(ep)
if result.AdditionalAAAA != nil {
additionalEndpoints = append(additionalEndpoints, result.AdditionalAAAA)
}
}
return append(endpoints, additionalEndpoints...), nil
}Or simple single EndpointAdjuster struct with methods, and replace all clousures with named methods. I do agree that functional has less bolierplate and less interface overhead + all logic in single function. |
|
There are multiple solutions as to any problem. Probably your current approach is just good enough. But if we could have 2 PRs, and w8 for other reviewers to decide what approach they prefere. |


What does it do?
This PR refactors the alias-handling logic inside
adjustEndpointAndNewAaaaIfNeededand prepares the codebase for upcoming fixes in #5997 and #6017.To ensure safety and correctness, the refactoring was performed in multiple verifiable stages:
Extracted the existing logic
In commit 7fda240, the original implementation was split into
oldAdjustEndpointAndNewAaaaIfNeededand a newadjustEndpointAndNewAaaaIfNeeded,with the latter simply delegating to the former.
Added comprehensive compatibility tests
These tests verified that both functions returned identical results for all meaningful input patterns.
This test suite ensured that extraction did not introduce any behavioral changes.
Refactored the new implementation
In commit b50b0a5, the new function was rewritten for clarity and testability.
The compatibility tests confirmed the refactored logic remained fully identical to the old version.
Removed the old logic and updated tests
After strict behavioral parity was confirmed, the old function and compatibility tests were removed.
The test suite was rewritten to validate the intended behavior directly.
This staged process allowed the refactoring to be completed safely while maintaining full functional equivalence throughout the transition.
Motivation
This work originates from the discussion here:
#5997 (comment)
To make future fixes #5997 and #6017 easier and safer to apply, the logic needed to be isolated, simplified, and thoroughly tested.
More