Skip to content

Conversation

@Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented Nov 27, 2025

This PR contains complete implementation of new tablet repair command (#4644).
We need to merge those changes in a few PRs (changes to different SM submodules) - this PR uses go work as a workaround.
This PR serves as the overview of all changes and as a way of executing tests early.

Here are some visual examples:

sctool tasks
miles@fedora:~/scylla-manager$ ./sctool.dev -c myc tasks
╭────────────────────────────────────────────────────┬────────┬──────────────┬────────┬───────────────┬─────────┬───────┬────────────────────────┬────────────┬────────┬────────────────────────╮
│ Task                                               │ Labels │ Schedule     │ Window │ Timezone      │ Success │ Error │ Last Success           │ Last Error │ Status │ Next                   │
├────────────────────────────────────────────────────┼────────┼──────────────┼────────┼───────────────┼─────────┼───────┼────────────────────────┼────────────┼────────┼────────────────────────┤
│ healthcheck/alternator                             │        │ * * * * *    │        │ Europe/Warsaw │ 5       │ 0     │ 27 Nov 25 22:27:00 CET │            │ DONE   │ 27 Nov 25 22:28:00 CET │
│ healthcheck/rest                                   │        │ * * * * *    │        │ Europe/Warsaw │ 5       │ 0     │ 27 Nov 25 22:27:00 CET │            │ DONE   │ 27 Nov 25 22:28:00 CET │
│ healthcheck/cql                                    │        │ * * * * *    │        │ Europe/Warsaw │ 5       │ 0     │ 27 Nov 25 22:27:00 CET │            │ DONE   │ 27 Nov 25 22:28:00 CET │
│ repair/all-weekly                                  │        │ 0 23 * * SAT │        │ Europe/Warsaw │ 0       │ 0     │                        │            │ NEW    │ 29 Nov 25 23:00:00 CET │
│ tablet_repair/36303fb8-dcdd-47a2-950a-dc48dd1f38ab │        │              │        │ Europe/Warsaw │ 1       │ 0     │ 27 Nov 25 22:23:30 CET │            │ DONE   │                        │
╰────────────────────────────────────────────────────┴────────┴──────────────┴────────┴───────────────┴─────────┴───────┴────────────────────────┴────────────┴────────┴────────────────────────╯

sctool progress (running)
miles@fedora:~/scylla-manager$ ./sctool.dev progress -c myc tablet_repair/36303fb8-dcdd-47a2-950a-dc48dd1f38ab
Run:            3b2c9f4a-cbd7-11f0-9fc2-0892040e83bb
Status:         RUNNING
Start time:     27 Nov 25 22:22:51 CET
Duration:       21s
╭────────────────────────┬─────────────┬──────────┬──────────╮
│ Keyspace               │       Table │ Progress │ Duration │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_1-.-.-. │ tab_1-.-.-. │ done     │ 5s       │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_2-.-.-. │ tab_2-.-.-. │ -        │ -        │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_3-.-.-. │ tab_3-.-.-. │ done     │ 5s       │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ cql_ks                 │       tab_1 │ running  │ 4s       │
│ cql_ks                 │       tab_2 │ done     │ 5s       │
│ cql_ks                 │       tab_3 │ -        │ -        │
│ cql_ks                 │ tab_3$paxos │ -        │ -        │
╰────────────────────────┴─────────────┴──────────┴──────────╯

sctool progress (done)
miles@fedora:~/scylla-manager$ ./sctool.dev progress -c myc tablet_repair/36303fb8-dcdd-47a2-950a-dc48dd1f38ab
Run:            3b2c9f4a-cbd7-11f0-9fc2-0892040e83bb
Status:         DONE
Start time:     27 Nov 25 22:22:51 CET
End time:       27 Nov 25 22:23:30 CET
Duration:       38s
╭────────────────────────┬─────────────┬──────────┬──────────╮
│ Keyspace               │       Table │ Progress │ Duration │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_1-.-.-. │ tab_1-.-.-. │ done     │ 5s       │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_2-.-.-. │ tab_2-.-.-. │ done     │ 7s       │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ alternator_tab_3-.-.-. │ tab_3-.-.-. │ done     │ 5s       │
├────────────────────────┼─────────────┼──────────┼──────────┤
│ cql_ks                 │       tab_1 │ done     │ 6s       │
│ cql_ks                 │       tab_2 │ done     │ 5s       │
│ cql_ks                 │       tab_3 │ done     │ 6s       │
│ cql_ks                 │ tab_3$paxos │ done     │ 0s       │
╰────────────────────────┴─────────────┴──────────┴──────────╯
sctool progress --details
miles@fedora:~/scylla-manager$ ./sctool.dev progress -c myc tablet_repair/36303fb8-dcdd-47a2-950a-dc48dd1f38ab --details
Run:            3b2c9f4a-cbd7-11f0-9fc2-0892040e83bb
Status:         DONE
Start time:     27 Nov 25 22:22:51 CET
End time:       27 Nov 25 22:23:30 CET
Duration:       38s
╭────────────────────────┬─────────────┬──────────┬──────────┬────────────────────────┬────────────────────────┬───────╮
│ Keyspace               │       Table │ Progress │ Duration │ Started at             │ Completed at           │ Error │
├────────────────────────┼─────────────┼──────────┼──────────┼────────────────────────┼────────────────────────┼───────┤
│ alternator_tab_1-.-.-. │ tab_1-.-.-. │ done     │ 5s       │ 27 Nov 25 22:22:57 CET │ 27 Nov 25 22:23:03 CET │       │
├────────────────────────┼─────────────┼──────────┼──────────┼────────────────────────┼────────────────────────┼───────┤
│ alternator_tab_2-.-.-. │ tab_2-.-.-. │ done     │ 7s       │ 27 Nov 25 22:23:22 CET │ 27 Nov 25 22:23:30 CET │       │
├────────────────────────┼─────────────┼──────────┼──────────┼────────────────────────┼────────────────────────┼───────┤
│ alternator_tab_3-.-.-. │ tab_3-.-.-. │ done     │ 5s       │ 27 Nov 25 22:22:52 CET │ 27 Nov 25 22:22:57 CET │       │
├────────────────────────┼─────────────┼──────────┼──────────┼────────────────────────┼────────────────────────┼───────┤
│ cql_ks                 │       tab_1 │ done     │ 6s       │ 27 Nov 25 22:23:09 CET │ 27 Nov 25 22:23:16 CET │       │
│ cql_ks                 │       tab_2 │ done     │ 5s       │ 27 Nov 25 22:23:03 CET │ 27 Nov 25 22:23:09 CET │       │
│ cql_ks                 │       tab_3 │ done     │ 6s       │ 27 Nov 25 22:23:16 CET │ 27 Nov 25 22:23:22 CET │       │
│ cql_ks                 │ tab_3$paxos │ done     │ 0s       │ 27 Nov 25 22:23:22 CET │ 27 Nov 25 22:23:22 CET │       │
╰────────────────────────┴─────────────┴──────────┴──────────┴────────────────────────┴────────────────────────┴───────╯
metrics
# HELP scylla_manager_tablet_repair_progress Tablet repair progress in percents (0-100).
# TYPE scylla_manager_tablet_repair_progress gauge
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="alternator_tab_1-.-.-.",table="tab_1-.-.-."} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="alternator_tab_2-.-.-.",table="tab_2-.-.-."} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="alternator_tab_3-.-.-.",table="tab_3-.-.-."} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="cql_ks",table="tab_1"} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="cql_ks",table="tab_2"} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="cql_ks",table="tab_3"} 100
scylla_manager_tablet_repair_progress{cluster="2ed727e5-3ad8-413d-b5c2-559ce8e5b6e8",keyspace="cql_ks",table="tab_3$paxos"} 100

Some task manager tasks are local to a given node, meaning that
we need to check progress of this task on the node on which it
was scheduled. Tablet repair is cluster wide, meaning that we
can query any node for its progress. This commit makes it possible
to not specify host when scheduling or checking progress of tablet repair.
…ient

This way it can be accessed from different pkgs without importing repair pkg.
This way it can be accessed from different pkgs without importing repair pkg.
This error is non retryable, and we use it
to safely skip colocated table tablet repair.
This way it can be accessed from different pkgs without importing repair pkg.
It will be used for storing progress of the new tablet repair task.
This commit also runs `make generate`.
Tablet repair svc is supposed to handle tablet repair
in a light weighted way. Since tablet repair works
fine with topology changes on scylla side, tablet repair
svc also makes sure to support it. It does not contain
any vnode related bloat, which is not needed for tablets,
and could cause problems with parallel topology changes.
This commit adds `sctool repair tablet` command.
It also updates progress commands to work with tablet repair.
…able_tablets

enable_tablets has been deprecated in favor of tablets_mode_for_new_keyspaces.
We can use both for simplicity when using the same scylla.yaml for multiple
scylla versions. This allows for controlling default alternator table replication
(vnode/tablets) in the same as for the CQL tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants