-
Notifications
You must be signed in to change notification settings - Fork 49
Description
This is a regression introduced in SM 3.7.0 by #4590.
This PR made it so that SM doesn't always perform connectivity check on cluster update:
// shouldValidateHostsConnectivityOnUpdate based on changed cluster params.
// It needs to be done when cluster params influencing connectivity over http to sm-agents have been updated.
// It doesn't need to be done when cluster params influencing connectivity over cql to scylla nodes have been updated,
// as cql connectivity is not required for all tasks.
func shouldValidateHostsConnectivityOnUpdate(c, old *Cluster) bool {
return old.Host != c.Host || old.Port != c.Port || old.AuthToken != c.AuthToken
}It turns out that the connectivity check is the place in our code which sets cluster.KnownHosts.
The problem is that Cluster.KnownHosts field is a strange one. It is kept in SM DB, but it's not exposed over SM API - in some cases, it needs to be fetched from DB or be recalculated from scratch.
When updating cluster with PutCluster, Cluster.KnownHosts is not filled with API middle ware. In case shouldValidateHostsConnectivityOnUpdate returns false, it is also not filled during connectivity check.
After that, we simply insert cluster into DB. This results in inserting empty Cluster.KnownHosts which overwrites previously stored known hosts...
This is not the end of the world, because we still store Cluster.Host (the initial contact point) in the DB correctly, but if we additionally encounter connectivity issues with this host, then SM is in a bad state until those connectivity issues disappear. Cluster.KnownHosts aim to help with such scenarios, but since we cleaned them, they don't.