Commit 96dc4e3
committed
migration: Send IsReady to all upstream full mat nodes
On restart, we have a race condition that causes a panic when
"recovering" the `DfState`. This occurs on joins with fully
materialized nodes, when replay begins. Before replay, all nodes will
be sent a `Ready` message, but base nodes (which must open files from
disk), "ready" themselves asynchronously - meaning, the Leader does
not block for them to become ready.
The problem we are encountering is when a replay that contains a join
node begins, the `DomainMigrationPlan` ensures only the left-side is
ready (by sending an `IsReady` message). The right-side has been sent
a `Ready` message, but we have not guaranteed that it is, indeed,
"ready".
This CL checks each ReplayPath, in the `DomainMigrationPlan`'s `Plan`
phase, and if the path contains fully-materialized nodes, it checks to
see if there's a join node. If there is a join, it walks up the DAG to
look for both base nodes and fully materialized nodes. The indices of
those nodes are added to a set, and a further part of the
`DomainMigrationPlan::commit()` will send an `IsReady` message to
those additional nodes.
This will not cause any additional traffic at startup as there's
already a `HashSet` in `DomainMigrationPlan::commit()` which tracks if
a `IsReady` has already been sent to a target node.
Fixes: REA-5126
Change-Id: Icbc815ce0a4aa4ed1b03aa734c785122c0839cbb
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/8563
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>1 parent 3d43048 commit 96dc4e3
2 files changed
+55
-0
lines changedLines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1325 | 1325 | | |
1326 | 1326 | | |
1327 | 1327 | | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
1328 | 1333 | | |
1329 | 1334 | | |
1330 | 1335 | | |
| |||
Lines changed: 50 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
71 | 77 | | |
72 | 78 | | |
73 | 79 | | |
| |||
286 | 292 | | |
287 | 293 | | |
288 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
289 | 334 | | |
290 | 335 | | |
291 | 336 | | |
| |||
718 | 763 | | |
719 | 764 | | |
720 | 765 | | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
721 | 770 | | |
722 | 771 | | |
723 | 772 | | |
| |||
742 | 791 | | |
743 | 792 | | |
744 | 793 | | |
| 794 | + | |
745 | 795 | | |
746 | 796 | | |
747 | 797 | | |
| |||
0 commit comments