Open
Description
Currently s-m
does not track topology changes, unless it is doing something on the cluster, then it does the following:
hosts, err := s.discoverHosts(ctx, client)
if err != nil {
return nil, errors.Wrap(err, "discover cluster topology")
}
if err := s.setKnownHosts(c, hosts); err != nil {
return nil, errors.Wrap(err, "update cluster")
}
Reasons to have topology change tracking:
- There could be a case when
s-m
looses access to the cluster, while cluster is well and fully alive, after major topology change, see s-m can lose connection to a cluster if major topology change occurred between jobs #3454. - When restore is suspended and resumed it would be good for
s-m
to validate if any change done on the cluster that is not compatible with resuming.
These two reasons have different requirements:
- To not to lose connection to the cluster it is enough to track ip addresses of all nodes in the cluster.
- Restore needs to make sure that there is no nodes gone and no nodes where replaced, so you have to track node ids, ip addresses and probably file system state