Commit 2f3beb0
taylanisikdemir
[active-active] Fix failover version increment logic (cadence-workflow#7246)
<!-- Describe what has changed in this PR -->
**What changed?**
Active-active domain failovers (changing the region to cluster map)
increment the failover version of the updated entries. Rest of the
entries are untouched. This causes a problem when an ongoing workflow is
attempted to be started after failover.
Example scenario:
- Domain's initial state
```
ActiveClustersByRegion: [
phx: {staging_phx 2}
dca: {staging_dca 0}
]
```
- A workflow with id `cron.phx` is running with version 2. It's active
on staging_phx cluster.
- Domain is failed over from PHX to DCA by operator/automation. The map
in DB now looks like this:
```
ActiveClustersByRegion: [
phx: {staging_dca 10} # version is incremented from 2 to 10.
dca: {staging_dca 0}
]
```
- A `StartWorkflow(cron.phx)` request is made by a client.
- Call arrives to PHX frontend.
- It checks the `ActiveClustersByRegion` of the domain and decides to
forward to DCA frontend.
- DCA frontend receives the request and makes corresponding request to
DCA history.
- The history engine responsible from `cron.phx` shard processes the
request.
- It gets workflow already started error and checks the new mutable
state version with the existing one in DB.
- Existing mutable state in DB (which was replicated from PHX cluster)
has version 2.
- New mutable state in memory has version 0 which is the version DCA
uses.
- New version is less than previous version so it returns domain not
active error
[ref](https://github.com/cadence-workflow/cadence/blob/147489a7e507a04eade6594854234396daebcd8f/service/history/engine/engineimpl/start_workflow_execution.go#L253)
The problem is caused by multiple entries in the
`ActiveClustersByRegion` map pointing to the same cluster but having
different versions.
One way to prevent getting into this state is to increment failover
versions of all entries pointing to the same cluster when updating
ActiveClustersByRegion.
Before:
```
ActiveClustersByRegion: [
phx: {staging_phx 2}
dca: {staging_dca 0}
]
```
After:
```
ActiveClustersByRegion: [
phx: {staging_dca 10} # incremented from 2 to 10 so it points to dca cluster
dca: {staging_dca 10} # incremented from 0 to 10 so it's greater than or equal to the entry that was updated (above)
]
```
**Validation Change**
Another change in this PR is to restrict what kind of updates are
allowed for `ActiveClustersByRegion` map. It doesn't make sense to
support multiple hops and we should prevent cycles.
For example below map contains multiple hops and will not be allowed.
```
ActiveClustersByRegion: [
phx: {staging_dca}
dca: {staging_klm}
klm: {staging_klm}
]
```
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
- unit tests
- new simulation1 parent d6c7517 commit 2f3beb0
File tree
9 files changed
+252
-6
lines changed- .github/workflows
- common/domain
- config/dynamicconfig
- simulation/replication
- testdata
9 files changed
+252
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
117 | 118 | | |
118 | 119 | | |
119 | 120 | | |
| 121 | + | |
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
| |||
126 | 128 | | |
127 | 129 | | |
128 | 130 | | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
129 | 137 | | |
130 | 138 | | |
131 | 139 | | |
| |||
177 | 185 | | |
178 | 186 | | |
179 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
| |||
284 | 288 | | |
285 | 289 | | |
286 | 290 | | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1384 | 1384 | | |
1385 | 1385 | | |
1386 | 1386 | | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
1387 | 1391 | | |
1388 | 1392 | | |
1389 | 1393 | | |
| |||
1394 | 1398 | | |
1395 | 1399 | | |
1396 | 1400 | | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
| 1407 | + | |
| 1408 | + | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
| 1412 | + | |
| 1413 | + | |
| 1414 | + | |
| 1415 | + | |
| 1416 | + | |
1397 | 1417 | | |
1398 | 1418 | | |
1399 | 1419 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1722 | 1722 | | |
1723 | 1723 | | |
1724 | 1724 | | |
1725 | | - | |
| 1725 | + | |
| 1726 | + | |
1726 | 1727 | | |
1727 | 1728 | | |
1728 | 1729 | | |
| |||
1786 | 1787 | | |
1787 | 1788 | | |
1788 | 1789 | | |
1789 | | - | |
| 1790 | + | |
1790 | 1791 | | |
1791 | 1792 | | |
1792 | 1793 | | |
| |||
1822 | 1823 | | |
1823 | 1824 | | |
1824 | 1825 | | |
1825 | | - | |
| 1826 | + | |
1826 | 1827 | | |
1827 | 1828 | | |
1828 | 1829 | | |
| |||
2885 | 2886 | | |
2886 | 2887 | | |
2887 | 2888 | | |
2888 | | - | |
| 2889 | + | |
2889 | 2890 | | |
2890 | 2891 | | |
2891 | 2892 | | |
| |||
Lines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
156 | 163 | | |
157 | 164 | | |
158 | 165 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
Lines changed: 60 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
0 commit comments