Skip to content
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
caf0dee
feat(taosd): implement phase1 repair CLI parsing and validation
hzcheng Mar 6, 2026
555e6d6
feat(repair): add meta force repair flow and coverage
hzcheng Mar 6, 2026
2ac1025
feat(repair): add tsdb force repair flow and coverage
hzcheng Mar 8, 2026
03eae54
feat(repair): refactor repair option structure and remove deprecated …
hzcheng Mar 9, 2026
c7b7f49
feat(meta): refactor meta repair to separate strategy functions
hzcheng Mar 9, 2026
9006f50
feat(repair): redesign CLI for multi-target data repair
hzcheng Mar 9, 2026
47ef776
refactor(dmRepair): replace generic target array with type-specific a…
hzcheng Mar 9, 2026
0277b27
feat(dnode): refactor repair options to support multiple node types
hzcheng Mar 10, 2026
c69c223
feat: remove dnode from node-type option and clean up generateNewMeta…
hzcheng Mar 10, 2026
14cca94
feat(meta): refactor meta repair logic and remove unused static variable
hzcheng Mar 10, 2026
03f767f
feat(meta): temporarily disable meta backup during forced repair
hzcheng Mar 10, 2026
a8080fe
feat(wal): add dynamic corruption handling with dmRepair integration
hzcheng Mar 10, 2026
bdc47aa
remove useless files
hzcheng Mar 10, 2026
452be5f
feat(repair): add vnode type and force mode detection functions
hzcheng Mar 10, 2026
a434412
feat(tsdb): add force repair functionality for file system integrity
hzcheng Mar 10, 2026
fdb9965
feat(tsdb): implement deep scan and fix for data file repair
hzcheng Mar 10, 2026
44fd1e3
docs: update TSDB repair strategies and documentation
hzcheng Mar 11, 2026
88df66f
docs: clarify TSDB repair strategy behavior and limitations
hzcheng Mar 11, 2026
25e38ba
docs: plan tsdb force repair test redesign
hzcheng Mar 11, 2026
0480ebc
feat(test): add comprehensive force repair test suite for TSDB
hzcheng Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,6 @@ test/screenlog*
test/output.tmp

CMakeUserPresets.json

.agents/
skills-lock.json
27 changes: 27 additions & 0 deletions docs/en/08-operation/04-maintenance.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,33 @@ restore qnode on dnode <dnode_id>; # Restore qnode on dnode
- This feature is based on the recovery of existing replication capabilities, not disaster recovery or backup recovery. Therefore, for the mnode and vnode to be recovered, the prerequisite for using this command is that the other two replicas of the mnode or vnode can still function normally.
- This command cannot repair individual files in the data directory that are damaged or lost. For example, if individual files or data in an mnode or vnode are damaged, it is not possible to recover a specific file or block of data individually. In this case, you can choose to completely clear the data of that mnode/vnode and then perform recovery.

## Local Repair Mode

If the issue is limited to local files on one node and you want TDengine to perform repair checks during startup, you can start `taosd` in local repair mode:

```bash
taosd -r --mode force --node-type vnode \
--repair-target meta:vnode=3
```

You can also declare multiple repair targets in one startup:

```bash
taosd -r --mode force --node-type vnode --backup-path /tmp/repair-bak \
--repair-target meta:vnode=3 \
--repair-target tsdb:vnode=5:fileid=1809 \
--repair-target wal:vnode=6
```

Current limitations:

- Only `--mode force` is supported.
- Only `--node-type vnode` is supported.
- `tsdb` repair targets must include `fileid`.
- `wal` repair targets currently do not support `strategy`.

For the complete CLI grammar, supported keys, default strategies, and more examples, see [taosd Reference](../../tdengine-reference/components/taosd/).

## Splitting Virtual Groups

When a vgroup is overloaded with CPU or Disk resource usage due to too many subtables, after adding a dnode, you can split the vgroup into two virtual groups using the `split vgroup` command. After the split, the newly created two vgroups will undertake the read and write services originally provided by one vgroup. This command was first released in version 3.0.6.0, and it is recommended to use the latest version whenever possible.
Expand Down
74 changes: 74 additions & 0 deletions docs/en/14-reference/01-components/01-taosd.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,84 @@ The command line parameters for taosd are as follows:
- -s: Prints SDB information.
- -C: Prints configuration information.
- -e: Specifies environment variables, formatted like `-e 'TAOS_FQDN=td1'`.
- -r: Starts local repair mode. This option must be used together with `--mode force`, `--node-type vnode`, and at least one `--repair-target`.
- -k: Retrieves the machine code.
- -dm: Enables memory scheduling.
- -V: Prints version information.

## Repair Mode

Use `taosd -r` to start local repair mode. In the current phase, repair mode only supports `--mode force` and `--node-type vnode`.

### Syntax

```bash
taosd -r --mode force --node-type vnode [--backup-path <path>] \
--repair-target <target> [--repair-target <target>]...
```

### Repair Target Grammar

Each `--repair-target` value uses the following grammar:

```text
<file-type>:<key>=<value>[:<key>=<value>]...
```

Rules:

- `<file-type>` must be the first segment.
- Supported file types are `meta`, `tsdb`, and `wal`.
- Key order is not significant, but examples in this document use a consistent order.
- Repeating the same key in one target is invalid.
- Repeating the same repair object across multiple targets is invalid.

### Supported Targets

| File Type | Required Keys | Optional Keys | Default Strategy | Supported Strategies |
| --- | --- | --- | --- | --- |
| `meta` | `vnode` | `strategy` | `from_uid` | `from_uid`, `from_redo` |
| `tsdb` | `vnode`, `fileid` | `strategy` | `shallow_repair` | `shallow_repair`, `deep_repair` |
| `wal` | `vnode` | none | none | none |

Additional notes:

- `fileid` is only valid for `tsdb`, and it is required in the current phase.
- `strategy` is not currently supported for `wal`.
- `--backup-path` is global for the whole repair startup, not per target.

### Limitations

- Only `--mode force` is supported.
- Only `--node-type vnode` is supported.
- `taosd -r` without `--mode`, `--node-type`, or `--repair-target` is invalid.
- The older repair parameters `--file-type`, `--vnode-id`, and `--replica-node` have been removed from this interface.

### Examples

Repair meta on one vnode and use the default strategy:

```bash
taosd -r --mode force --node-type vnode \
--repair-target meta:vnode=3
```

Repair one TSDB file set and use an explicit strategy:

```bash
taosd -r --mode force --node-type vnode \
--repair-target tsdb:vnode=5:fileid=1809:strategy=deep_repair
```

Repair multiple targets in one startup:

```bash
taosd -r --mode force --node-type vnode --backup-path /tmp/repair-bak \
--repair-target meta:vnode=3 \
--repair-target tsdb:vnode=5:fileid=1809 \
--repair-target wal:vnode=6
```

## Configuration Parameters

Configuration parameters are divided into two categories:
Expand Down
27 changes: 27 additions & 0 deletions docs/zh/08-operation/05-maintenance.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,33 @@ restore qnode on dnode <dnode_id>;# 恢复dnode上的qnode
- 该功能是基于已有的复制功能的恢复,不是灾难恢复或者备份恢复,所以对于要恢复的 mnode 和 vnode 来说,使用该命令的前提是还存在该 mnode 或 vnode 的其它两个副本仍然能够正常工作。
- 该命令不能修复数据目录中的个别文件的损坏或者丢失。例如,如果某个 mnode 或者 vnode 中的个别文件或数据损坏,无法单独恢复损坏的某个文件或者某块数据。此时,可以选择将该 mnode/vnode 的数据全部清空再进行恢复。

## 本地修复模式

如果问题只涉及单个节点上的本地文件,并且希望在启动过程中执行修复检查,可以用如下方式启动 `taosd`:

```bash
taosd -r --mode force --node-type vnode \
--repair-target meta:vnode=3
```

也可以在同一次启动中声明多个修复目标:

```bash
taosd -r --mode force --node-type vnode --backup-path /tmp/repair-bak \
--repair-target meta:vnode=3 \
--repair-target tsdb:vnode=5:fileid=1809 \
--repair-target wal:vnode=6
```

当前限制:

- 当前只支持 `--mode force`。
- 当前只支持 `--node-type vnode`。
- `tsdb` repair target 必须显式指定 `fileid`。
- `wal` repair target 当前不支持 `strategy`。

完整的命令行 grammar、字段约束、默认策略和更多示例,请参考 [taosd 参考手册](../../reference/components/taosd/)。

## 分裂虚拟组

当一个 vgroup 因为子表数过多而导致 CPU 或 Disk 资源使用量负载过高时,增加 dnode 节点后,可通过 `split vgroup` 命令把该 vgroup 分裂为两个虚拟组。分裂完成后,新产生的两个 vgroup 承担原来由一个 vgroup 提供的读写服务。该命令在 3.0.6.0 版本第一次发布,建议尽可能使用最新版本。
Expand Down
74 changes: 74 additions & 0 deletions docs/zh/14-reference/01-components/01-taosd.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,84 @@ taosd 命令行参数如下:
- -e:指定环境变量的字符串,例如 `-e 'TAOS_FQDN=td1'`。
- -E:指定环境变量的文件路径,默认是 `./.env`,.env 文件中的内容可以是 `TAOS_FQDN=td1`。
- -o:指定日志输入方式,可选 `stdout`、`stderr`、`/dev/null`、`<directory>`、`<directory>/<filename>`、`<filename>`。
- -r:启动本地修复模式。该参数必须与 `--mode force`、`--node-type vnode` 以及至少一个 `--repair-target` 一起使用。
- -k:获取机器码
- -dm:启用内存调度
- -V:打印版本信息

## 修复模式

使用 `taosd -r` 可以进入本地修复模式。当前阶段只支持 `--mode force` 和 `--node-type vnode`。

### 语法

```bash
taosd -r --mode force --node-type vnode [--backup-path <path>] \
--repair-target <target> [--repair-target <target>]...
```

### `--repair-target` 语法

每个 `--repair-target` 的取值格式如下:

```text
<file-type>:<key>=<value>[:<key>=<value>]...
```

规则如下:

- `<file-type>` 必须放在第一个 segment。
- 当前支持的 file type 为 `meta`、`tsdb`、`wal`。
- `key=value` 的顺序不影响语义,但文档示例统一采用固定顺序。
- 同一条 target 内,key 不允许重复。
- 多条 target 如果命中同一个修复对象,会直接报错。

### 当前支持的 Target

| 文件类型 | 必填字段 | 可选字段 | 默认策略 | 支持的策略 |
| --- | --- | --- | --- | --- |
| `meta` | `vnode` | `strategy` | `from_uid` | `from_uid`、`from_redo` |
| `tsdb` | `vnode`、`fileid` | `strategy` | `shallow_repair` | `shallow_repair`、`deep_repair` |
| `wal` | `vnode` | 无 | 无 | 无 |

补充说明:

- `fileid` 仅允许用于 `tsdb`,且当前阶段必须显式指定。
- `wal` 当前阶段不支持 `strategy`。
- `--backup-path` 是本次 repair 启动的全局参数,不属于某个特定 target。

### 当前限制

- 当前只支持 `--mode force`。
- 当前只支持 `--node-type vnode`。
- `taosd -r` 如果缺少 `--mode`、`--node-type` 或 `--repair-target`,会直接报错。
- 旧的修复参数 `--file-type`、`--vnode-id`、`--replica-node` 已经从这套接口中移除。

### 示例

修复某个 vnode 的 meta,并使用默认策略:

```bash
taosd -r --mode force --node-type vnode \
--repair-target meta:vnode=3
```

修复一个 TSDB file set,并显式指定策略:

```bash
taosd -r --mode force --node-type vnode \
--repair-target tsdb:vnode=5:fileid=1809:strategy=deep_repair
```

一次启动同时声明多个修复目标:

```bash
taosd -r --mode force --node-type vnode --backup-path /tmp/repair-bak \
--repair-target meta:vnode=3 \
--repair-target tsdb:vnode=5:fileid=1809 \
--repair-target wal:vnode=6
```

## 配置参数

配置参数按作用范围分为全局和局部两类:
Expand Down
55 changes: 55 additions & 0 deletions include/common/dmRepair.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Copyright (c) 2019 TAOS Data, Inc. <jhtao@taosdata.com>
*
* This program is free software: you can use, redistribute, and/or modify
* it under the terms of the GNU Affero General Public License, version 3
* or later ("AGPL"), as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#ifndef _TD_DM_REPAIR_H_
#define _TD_DM_REPAIR_H_

#include "tdef.h"

#ifdef __cplusplus
extern "C" {
#endif

typedef enum {
DM_REPAIR_STRATEGY_NONE = 0,
DM_REPAIR_STRATEGY_META_FROM_UID,
DM_REPAIR_STRATEGY_META_FROM_REDO,
DM_REPAIR_STRATEGY_TSDB_SHALLOW_REPAIR,
DM_REPAIR_STRATEGY_TSDB_DEEP_REPAIR,
} EDmRepairStrategy;

typedef struct {
EDmRepairStrategy strategy;
} SRepairMetaVnodeOpt;

typedef struct {
EDmRepairStrategy strategy;
} SRepairTsdbFileOpt;

bool dmRepairFlowEnabled();
bool dmRepairNodeTypeIsVnode();
bool dmRepairModeIsForce();
bool dmRepairHasBackupPath();
const char *dmRepairBackupPath();
const SRepairMetaVnodeOpt *dmRepairGetMetaVnodeOpt(int32_t vnodeId);
bool dmRepairNeedTsdbRepair(int32_t vnodeId);
const SRepairTsdbFileOpt *dmRepairGetTsdbFileOpt(int32_t vnodeId, int32_t fileId);
bool dmRepairNeedWalRepair(int32_t vnodeId);

#ifdef __cplusplus
}
#endif

#endif /*_TD_DM_REPAIR_H_*/
Loading
Loading