avoid double DownIndirectlyConnected when indirectly-connected is between independently-known unreachable#32794
Conversation
…t change overall reachability
patriknw
left a comment
There was a problem hiding this comment.
Seems like a good optimization with the disclaimer that it might have unforeseen consequences.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Add more tests that are covering similar thing for the other strategies? KeepOldest, DownAllNodes, LeaseMajority
There was a problem hiding this comment.
Would also be good to expand or add new multi-jvm test that cover this scenario. You can run locally with:
multi-jvm:testOnly akka.cluster.sbr.IndirectlyConnected5NodeSpec
Note that we also have SBR tests in akka-cluster-sharding/src/multi-jvm/scala/akka/cluster/sbr/
| val unreachableByDirectlyConnectedObservers = | ||
| reachability.records.iterator.flatMap { r => | ||
| if (directlyConnectedObservers(r.observer)) Some(r.subject) else None | ||
| }.toSet |
There was a problem hiding this comment.
isn't that a collect?
reachability.records.iterator.collect { case r if directlyConnectedObservers(r.observer) =>
r.subject
}.toSet
There was a problem hiding this comment.
Yeah, I tend to forget about collect...
| throw new IllegalStateException( | ||
| s"SBR double $additionalDecision decision, downing all instead. " + | ||
| s"originalReachability: [$originalReachability], filtered reachability [$reachability], " + | ||
| s"still indirectlyConnected: [$indirectlyConnected], seenBy: [$seenBy]") |
There was a problem hiding this comment.
perhaps we should keep this message around and include it in the final IllegalStateException, so that we have full information about both decisions?
Scenario:
Thus A is indirectly connected (both a subject and observer of unreachability) but B is not. SBR resolution will be
DownIndirectlyConnected(adding A to the downing decision) and then consider a secondary decision.In considering that decision, unreachability edges between nodes where both are in the set of observers and subjects are removed, as are all edges where both of "the subject has seen the latest gossip" and "observer has observed a subject which had seen the latest gossip" are true. If indirectly connected nodes remain, the decision effectively becomes
DownAll. These nodes can only be indirectly connected by the first criterion (intersection between observers and subjects): the definition of the second criterion (unreachable but seen latest gossip) is always removed (as any observers of that subject are also indirectly connected by the second criterion).In this scenario, the SBR on the other side from A & B will
DownAll: all nodes on either side of the partition after this are either crashed or downed.The double-down-indirectly-connected is safe, but it seems to be literal overkill.