client_routes: simplify concurrency, fix stale entries, add sticky routing#850
client_routes: simplify concurrency, fix stale entries, add sticky routing#850sylwiaszunejko wants to merge 3 commits intoscylladb:masterfrom
Conversation
0a5f713 to
179bf31
Compare
179bf31 to
73467e8
Compare
73467e8 to
8de13e0
Compare
Remove the entire ResolvedClientRoute type hierarchy, ClientRoutesResolver interface, simpleClientRoutesResolver with DNS cache, and resolveAndUpdateInPlace. Replace atomic.Pointer with sync.RWMutex for route storage. Routes are now stored as unexported clientRouteList with Merge and FindByHostID methods. TranslateHost resolves DNS on every call via DNSResolver.LookupIP rather than maintaining cached resolved addresses. Rename UnresolvedClientRoute/UnresolvedClientRouteList to unexported clientRoute/clientRouteList since the resolved counterpart no longer exists. Also deprecates ResolveHealthyEndpointPeriod, MaxResolverConcurrency, and ResolverCacheDuration config fields and their WithXxx option functions.
8de13e0 to
dd49949
Compare
| @@ -119,169 +115,54 @@ type UnresolvedClientRoute struct { | |||
| } | |||
|
|
|||
| // Similar returns true if both records targets same host and connection id | |||
| func (r UnresolvedClientRoute) Similar(o UnresolvedClientRoute) bool { | |||
| func (r clientRoute) Similar(o clientRoute) bool { | |||
| return r.ConnectionID == o.ConnectionID && r.HostID == o.HostID | |||
| } | |||
|
|
|||
| // Equal returns true if both records are exactly the same | |||
| func (r UnresolvedClientRoute) Equal(o UnresolvedClientRoute) bool { | |||
| func (r clientRoute) Equal(o clientRoute) bool { | |||
| return r == o | |||
| } | |||
There was a problem hiding this comment.
❓ Is it acceptable/common in Go to have types names in camelCase instead of CapitalizedCamelCase?
There was a problem hiding this comment.
Coming from Rust - where types are CapitalizedCamelCase - this is confusing.
There was a problem hiding this comment.
in Go camelCase is for internal types and CapitalizedCamelCase is for public ones, I think UnresolvedClientRoute and ResolvedClientRoute were capitalized by mistake, I don't think anyone should use them externaly, but probably to not break the API I should left it as it is. Still I wanted to propose different naming and making it internal, maybe we could agree to break the API like that
There was a problem hiding this comment.
if not in this release then maybe it's something to do in v2
There was a problem hiding this comment.
- If it's API-breaking, then we technically mustn't do this without releasing v2.
- Did any public API reference those types? If not, we may consider them "exported by mistake", without any practical implications - if no one had any reason to reference them, then no one should suffer from deleting them.
- I vote for renaming then. @Lorak-mmk WDYT?
There was a problem hiding this comment.
If it's API-breaking, then we technically mustn't do this without releasing v2.
Is that the case? I remember @dkropachev saying that in gocql API breaks are fine in minor versions.
There was a problem hiding this comment.
Did any public API reference those types?
nope
There was a problem hiding this comment.
https://kn100.me/making-breaking-changes-in-go/
I don't know how up-to-date and correct this article is, but it makes me feel uneasy...
Add scopeConnectionIDs and scopeHostIDs parameters to Merge so it can prune stale entries before upserting: - Both non-empty (partial update): prune entries matching BOTH lists - Only scopeConnectionIDs (full refresh): prune all entries for those connections - Both empty: additive-only merge, no pruning updateHostPortMapping now passes connection/host IDs through to Merge, enabling automatic cleanup of decommissioned hosts.
When a host has multiple routes (one per connection), remember which connectionID was used on the first successful TranslateHost call and prefer it on subsequent calls via findPreferredRoute. If the preferred route is removed (e.g. connection pruned), fall back to FindByHostID. This avoids unnecessary connection churn when multiple PrivateLink endpoints serve the same host.
dd49949 to
23e0fd3
Compare
| } | ||
|
|
||
| type UnresolvedClientRouteList []UnresolvedClientRoute | ||
| type clientRouteList []clientRoute |
There was a problem hiding this comment.
💭 Why do we operate on slice instead of map?
- Slice makes lookup quadratic, which is acceptable only because client routes-related operations only take place on connecting, not on executing statements.
- Map is conceptually better to represent the mapping of host ids to client routes entries.
- See the Rust Driver's impl for how use of a map improves the logic.
| func (l *clientRouteList) Len() int { | ||
| return len(*l) | ||
| } | ||
|
|
||
| type ResolvedClientRoute struct { | ||
| updateTime time.Time | ||
| UnresolvedClientRoute | ||
| allKnownIPs []net.IP | ||
| currentIP net.IP | ||
| forcedResolve bool | ||
| } | ||
|
|
||
| func (r ResolvedClientRoute) String() string { | ||
| var ip string | ||
| if r.currentIP == nil { | ||
| ip = "<nil>" | ||
| } else { | ||
| ip = r.currentIP.String() | ||
| } | ||
|
|
||
| return fmt.Sprintf( | ||
| "ResolvedClientRoute{ConnectionID=%s, HostID=%s, Address=%s, CQLPort=%d, SecureCQLPort=%d, CurrentIP=%s}", | ||
| r.ConnectionID, | ||
| r.HostID, | ||
| r.Address, | ||
| r.CQLPort, | ||
| r.SecureCQLPort, | ||
| ip, | ||
| ) | ||
| } | ||
|
|
||
| func (r ResolvedClientRoute) Clone() ResolvedClientRoute { | ||
| res := r | ||
| if res.allKnownIPs != nil { | ||
| res.allKnownIPs = make([]net.IP, 0, len(r.allKnownIPs)) | ||
| for _, ip := range r.allKnownIPs { | ||
| res.allKnownIPs = append(res.allKnownIPs, slices.Clone(ip)) | ||
| } | ||
| } | ||
| if len(res.currentIP) != 0 { | ||
| copy(res.currentIP, r.currentIP) | ||
| res.currentIP = slices.Clone(res.currentIP) | ||
| } | ||
| return res | ||
| } | ||
|
|
||
| // Newer returns true if o is newer than r | ||
| func (r ResolvedClientRoute) Newer(o ResolvedClientRoute) bool { | ||
| if len(r.currentIP) == 0 && len(o.currentIP) != 0 { | ||
| return true | ||
| } | ||
| if len(r.allKnownIPs) == 0 && len(o.allKnownIPs) != 0 { | ||
| return true | ||
| // Merge upserts entries from incoming into the list. | ||
| // Before upserting it prunes stale entries within the query scope defined by | ||
| // scopeConnectionIDs and scopeHostIDs: | ||
| // - Both non-empty (partial update): prune entries matching BOTH lists. | ||
| // - Only scopeConnectionIDs (full refresh): prune all entries for those connections. | ||
| // | ||
| // scopeConnectionIDs must not be empty. | ||
| func (l *clientRouteList) Merge(incoming clientRouteList, scopeConnectionIDs, scopeHostIDs []string) { |
There was a problem hiding this comment.
As clientRouteList is now a private type, shouldn't the methods be private, too? (camelCase, not CapitalizedCamelCase).
Summary
Replaces the complex resolved-route caching and background DNS resolution system in
ClientRoutesHandlerwith a simpler design: store unresolved routes and resolve DNS on demand duringTranslateHost. Adds scoped pruning to automatically clean up stale entries, and sticky route tracking to avoid unnecessary connection churn.Motivation
The previous implementation maintained a
ResolvedClientRoutetype hierarchy with cached IP addresses, aClientRoutesResolverinterface, a DNS cache with configurable TTL, concurrent background resolution workers, and anatomic.Pointerwith CAS-loop updates. This was complex, hard to reason about, and prone to stale-data bugs (e.g. decommissioned hosts lingering in the route list).Fixes: #835