Description
This is a dup of #3611, but since that issue has a long history of comments and was stale, I decided to create a new one with updated argumentation.
One of the more important changes in SM 3.2 repair was sticking to the one job per one host rule
. I believe that for a bigger cluster this might kill any parallelism on the node level. Let's analyze a big cluster: 2 dcs 30 nodes each - all nodes have max_repair_ranges_in_parallel = 7
. By default, each keyspace in such cluster would consist of 60 * 256 = 15360
token ranges. Assuming that they keyspace has replication {'dc1': 3, 'dc2': 3'}
, we have (30!/(3! * 27!))^2 = 4060^2 = 16 483 600
possible replica sets. Assuming that token ranges are distributed uniformly across all possible replica sets, it is rather unlikely that a single repaired replica set owns more than 1 token range. This combined with the fact that SM sends repair jobs only for a single replica set results in SM sending only a single token range per repair job despite max_repair_ranges_in_parallel = 7
.
This behavior could be controlled by an additional flag or repair config option in scylla-manager.yaml
.
In terms of testing, it would be good to see performance improvement on a big cluster like: 2ds, 15 nodes each, keyspace with RF 3 in each dc, setup in which the repair indeed has to do some work (missing rows on some nodes). This bigger setup would definitely require a help from QA.