feat: data force repair by hzcheng · Pull Request #34707 · taosdata/TDengine

hzcheng · 2026-03-08T04:49:05Z

enable tsdb repair to enter the real execution path
- add vnode/tsdb force repair handling and backup logic
- add tsdb force repair pytest coverage and design notes
- normalize timezone strings for repair-related tests

Description

Issue(s)

Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

- enable tsdb repair to enter the real execution path - add vnode/tsdb force repair handling and backup logic - add tsdb force repair pytest coverage and design notes - normalize timezone strings for repair-related tests

gemini-code-assist · 2026-03-08T04:50:18Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求为 TDengine 引入了关键的数据修复功能，通过对 taosd -r 命令的参数层进行全面重构，使其能够支持更精细化的 vnode 和 TSDB 文件强制修复操作。核心工作包括设计并实现了元数据和 TSDB 文件的故障恢复逻辑，确保在数据损坏时能够进行有针对性的修复，同时通过引入崩溃安全机制和详细的备份策略，极大地提升了系统的健壮性和可恢复性。此外，还通过广泛的测试用例和对测试环境稳定性的改进，确保了新功能的可靠性。

Highlights

命令行参数重构: 重构了 taosd -r 命令的参数解析逻辑，以支持新的修复选项，同时保持对旧行为的兼容性。
元数据强制修复 (META Force Repair): 实现了 vnode 元数据（meta）的强制修复功能，包括针对指定 vnode 或所有 vnode 的修复，以及外部备份机制。
TSDB 强制修复 (TSDB Force Repair): 引入了 TSDB 文件（tsdb）的强制修复能力，能够分析文件组、备份受影响的文件集、处理缺失的 STT/核心文件，并支持从有效块重建核心文件组。
崩溃安全 (Crash-Safe) 机制: 在 TSDB 修复过程中，对 current.json 的更新采用了事务性 manifest 切换机制，确保在系统崩溃时数据一致性。
测试覆盖与环境稳定性: 增加了大量的 pytest 测试用例，覆盖了参数解析、元数据修复和 TSDB 修复的各种场景，并修复了与 ASAN/LSAN 环境相关的测试稳定性问题，以及时区字符串规范化问题。
详细文档与规划: 新增了多份详细的文档，包括设计、实施计划、发现和进度日志，为数据修复功能的开发提供了全面的记录和指导。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.gitignore
- 新增了 .agents/ 和 skills-lock.json 的忽略条目。
AGENTS.md
- 新增了一份文档，概述了仓库指南、项目结构、构建命令、编码风格、测试指南和提交/PR 约定。
docs/data_repair/01-参数重构/findings.md
- 新增了一份文档，详细说明了数据修复第一阶段参数重构的发现、决策和遇到的问题。
docs/data_repair/01-参数重构/progress.md
- 新增了一份进度日志，记录了数据修复第一阶段参数重构所采取的行动、测试结果和错误解决方案。
docs/data_repair/01-参数重构/task_plan.md
- 新增了一份任务计划文档，用于 taosd -r 参数层重构的第一阶段。
docs/data_repair/02-META_repair/findings.md
- 新增了一份文档，详细说明了 META 强制修复第二阶段的发现、决策和实施说明。
docs/data_repair/02-META_repair/progress.md
- 新增了一份进度日志，记录了 META 强制修复第二阶段的会话、行动和验证步骤。
docs/data_repair/02-META_repair/task_plan.md
- 新增了一份任务计划文档，用于 META 强制修复的第二阶段。
docs/data_repair/03-TSDB_repair/2026-03-07-next-session-handoff-prompt.md
- 新增了一份详细的交接文档，用于 TSDB 强制修复的下一阶段，概述了上下文、当前状态和优先级。
docs/data_repair/03-TSDB_repair/findings.md
- 新增了一份文档，详细说明了 TSDB 强制修复第三阶段的发现、确认范围、代码库洞察和崩溃安全分析。
docs/data_repair/03-TSDB_repair/progress.md
- 新增了一份进度日志，记录了 TSDB 强制修复第三阶段的会话、行动和验证步骤。
docs/data_repair/03-TSDB_repair/task_plan.md
- 新增了一份任务计划文档，用于 TSDB 强制修复的第三阶段。
docs/plans/2026-03-06-meta-force-repair-design.md
- 新增了一份 META 强制修复的设计文档。
docs/plans/2026-03-06-meta-force-repair-plan.md
- 新增了一份 META 强制修复的实施计划。
docs/plans/2026-03-07-tsdb-force-repair-design.md
- 新增了一份 TSDB 强制修复的设计文档。
docs/plans/2026-03-07-tsdb-force-repair-plan.md
- 新增了一份 TSDB 强制修复的实施计划。
source/dnode/mgmt/exe/dmMain.c
- 重构了命令行参数解析，以支持新的修复选项，废弃了 --force，添加了修复专用帮助，并引入了管理修复流程状态的函数。
source/dnode/mgmt/node_mgmt/inc/dmMgmt.h
- 为新的修复选项访问器函数添加了外部声明。
source/dnode/vnode/CMakeLists.txt
- 更新了 CMake 配置，以包含新的 vnodeRepair.c 源文件。
source/dnode/vnode/src/meta/metaOpen.c
- 集成了新的修复流程逻辑，包括 vnode 特定的修复匹配、修复前的外部元数据备份，以及标记已修复的 vnode 以防止重复执行。
source/dnode/vnode/src/tsdb/tsdbFS2.c
- 实现了 TSDB 强制修复逻辑，包括文件集分析、受影响文件集备份、崩溃安全的 manifest 更新，以及核心文件组的修复/重建。
source/dnode/vnode/src/vnd/vnodeRepair.c
- 为修复选项访问器函数添加了弱符号定义，允许它们被 dmMain.c 覆盖。
source/os/src/osTimezone.c
- 修改了 truncateTimezoneString 以移除时区字符串中的前导斜杠，并将其应用于 getTimezoneStr。
source/os/test/osTimeTests.cpp
- 为 truncateTimezoneString 添加了一个单元测试，以验证前导斜杠的移除。
test/cases/80-Components/01-Taosd/test_com_cmdline.py
- 更新了 _run_taosd 以确保 ASAN_OPTIONS 包含 detect_leaks=0，并为 taosd -r 命令行解析添加了全面的测试套件。
test/cases/80-Components/01-Taosd/test_meta_force_repair.py
- 新增了 META 强制修复的 pytest 用例，包括所有 vnode 修复、非元数据修复占位符行为和真实 vnode 元数据备份创建的测试。
test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py
- 新增了 TSDB 强制修复的 pytest 用例，涵盖了 open_fs 中的调度、真实执行路径入口、受影响文件集备份、缺失 STT/核心文件的移除、崩溃安全的 manifest 更新，以及从有效块重建核心文件组。

Activity

拉取请求描述概述了关键变更，包括启用 TSDB 修复、添加 vnode/TSDB 强制修复处理和备份逻辑，以及添加 pytest 覆盖和设计说明。
描述中还提到了规范化修复相关测试的时区字符串。
作者提供了一份用户手册更新、测试用例自动化和测试覆盖的清单，表明了对变更的全面方法。
在多个修复阶段（01-参数重构、02-META_repair、03-TSDB_repair）中存在详细的 findings.md、progress.md 和 task_plan.md 文件，表明了迭代开发过程和持续的文档与审查。
2026-03-07-next-session-handoff-prompt.md 文件表明了结构化的开发交接过程，强调了特定文件、当前任务范围、边界以及先前会话的关键发现。
progress.md 文件详细记录了所采取的具体行动、测试结果（包括红/绿阶段）、错误日志和解决方案，显示了积极的开发和调试过程。例如，修复了 LSAN 问题、pytest 环境问题和 printf 缓冲问题。
findings.md 文件记录了确认的范围、现有实现事实、用户约束、设计主题、新确认的决策、实施说明、验证说明以及从调试复杂测试环境中获得的长期记忆/经验教训。
task_plan.md 文件概述了目标、当前阶段、详细步骤、关键问题和已做出的决策，表明了周密的开发计划。
test_com_cmdline.py、test_meta_force_repair.py 和 test_tsdb_force_repair.py 文件显示了为覆盖新修复功能和确保捕获回归而进行的活跃测试用例开发。
source/os/src/osTimezone.c 和 source/os/test/osTimeTests.cpp 的变更表明修复了一个导致 taosd 启动问题的时区解析问题，展示了对已识别问题的响应能力。
source/dnode/vnode/src/tsdb/tsdbFS2.c 的变更包括 printf 缓冲的修复以及 tsdbRepairDropCoreOnTmpFSet 正确生成 TSDB_FOP_REMOVE 操作的关键修复，表明对核心逻辑进行了详细的调试和纠正。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature: data force repair for meta and tsdb data, including backup logic and detailed execution paths. While the implementation is comprehensive, a security audit identified two medium-severity issues related to how backups are handled: the default backup location in the system's temporary directory may expose sensitive data, and user-provided backup paths are not sanitized for path traversal, potentially allowing arbitrary directory creation. Furthermore, two critical race conditions related to tracking repaired vnodes in a multi-threaded context and a bug in a test helper function need to be addressed.

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/meta/metaOpen.c

+static void metaMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsMetaRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
+    return;
+  }
+
+  if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
+  snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}


The global static variable tsMetaRepairDoneVnodeId is accessed and modified here without any synchronization. Since metaOpen is called in parallel for different vnodes, this will lead to a race condition when multiple threads try to update this shared string simultaneously. This can cause memory corruption or incorrect behavior.

Please protect all accesses to tsMetaRepairDoneVnodeId (in this function and in metaForceRepairMatchesVnode) with a mutex. A static TdThreadMutex would be appropriate here.

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

+static void tsdbMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsTsdbRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
+    return;
+  }
+
+  if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
+  snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}


Similar to the issue in metaOpen.c, the global static variable tsTsdbRepairDoneVnodeId is modified here without thread synchronization. The tsdbDispatchForceRepair function, which calls this, is part of the parallel vnode startup process. This creates a race condition that can lead to memory corruption or incorrect repair behavior.

Please add a mutex to protect all reads and writes to tsTsdbRepairDoneVnodeId.

gemini-code-assist · 2026-03-08T04:53:57Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+                return int(value)
+        tdLog.exit(f"failed to resolve vnode id from show {dbname}.vgroups result: {row}")
+
+    def _start_repair_process(self, args):


The variable extra_env is used within this function but is not defined as a parameter or a local variable. This will cause a NameError if the function is ever called.

Although this helper function does not seem to be used in the current test suite, this is a latent bug that should be fixed. Please add extra_env=None to the function signature.

Suggested change

def _start_repair_process(self, args):

def _start_repair_process(self, args, extra_env=None):

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/meta/metaOpen.c

+}
+
+static int32_t metaBuildRepairBackupDir(SVnode *pVnode, char *buf, int32_t bufLen) {
+  const char *root = dmRepairHasBackupPath() ? dmRepairBackupPath() : TD_TMP_DIR_PATH;


The data repair process defaults to backing up sensitive database metadata to the system's temporary directory (/tmp) when --backup-path is not specified. On multi-user systems, /tmp is typically world-readable, which could allow unauthorized local users to access sensitive database information. It is recommended to use a more secure default location (e.g., a subdirectory within the TDengine data directory with restricted permissions) or explicitly set restrictive permissions (e.g., 0700) on the created backup directories.

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

+}
+
+static int32_t tsdbRepairBuildBackupFSetDir(STFileSystem *fs, int32_t fid, char *buf, int32_t bufLen) {
+  const char *root = dmRepairHasBackupPath() ? dmRepairBackupPath() : TD_TMP_DIR_PATH;


The data repair process defaults to backing up sensitive database files to the system's temporary directory (/tmp) when --backup-path is not specified. On multi-user systems, /tmp is typically world-readable, which could allow unauthorized local users to access sensitive database information. It is recommended to use a more secure default location or explicitly set restrictive permissions on the created backup directories.

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/meta/metaOpen.c

+  snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%smeta", root, sep, dateBuf, TD_DIRSEP, TD_VID(pVnode),
+           TD_DIRSEP);


The user-provided --backup-path is used to construct filesystem paths for data backups without sanitization for path traversal sequences (e.g., ..). This allows a user with the ability to run the repair command to create directories and write backup files in arbitrary locations on the filesystem. While the impact is limited by the fixed subdirectory structure appended to the path, it is best practice to sanitize the input or validate that the resolved path remains within an authorized base directory.

gemini-code-assist · 2026-03-08T04:53:57Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

+  snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%stsdb%sfid_%d", root, sep, dateBuf, TD_DIRSEP,
+           TD_VID(fs->tsdb->pVnode), TD_DIRSEP, TD_DIRSEP, fid);


The user-provided --backup-path is used to construct filesystem paths for data backups without sanitization for path traversal sequences (e.g., ..). This allows a user with the ability to run the repair command to create directories and write backup files in arbitrary locations on the filesystem. It is recommended to sanitize the input or validate that the resolved path remains within an authorized base directory.

Copilot

Pull request overview

This PR enables real execution for taosd -r --node-type vnode --file-type tsdb --mode force by dispatching repair logic inside vnode/TSDB open paths, adds backup/manifest/log handling for affected TSDB file-sets, and introduces end-to-end pytest coverage plus supporting docs/design notes. It also normalizes timezone strings (e.g., /UTC → UTC) to stabilize repair-related startup/tests.

Changes:

Add TSDB force-repair dispatch in tsdb open fs, including affected file-set backup (manifest + original files) and core-group drop/rebuild logic.
Add META force-repair dispatch in metaOpen() with external backup support and CLI parameter flow accessors.
Add/extend pytest E2E coverage for repair flows and add a timezone normalization fix + unit test.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py	New E2E suite for TSDB force repair dispatch, backup/log behavior, crash-safe manifest staging, and rebuild scenarios.
test/cases/80-Components/01-Taosd/test_meta_force_repair.py	New E2E coverage for META force repair vnode targeting and backup directory creation.
test/cases/80-Components/01-Taosd/test_com_cmdline.py	Adds CLI parameter-layer regression matrix for `taosd -r` (phase1 behaviors).
source/os/src/osTimezone.c	Normalizes timezone strings via `truncateTimezoneString()` to handle leading `/`.
source/os/test/osTimeTests.cpp	Adds a unit test for the timezone string normalization behavior.
source/dnode/mgmt/exe/dmMain.c	Implements repair option parsing/validation and exposes repair accessors for vnode layers.
source/dnode/mgmt/node_mgmt/inc/dmMgmt.h	Declares repair option accessor APIs.
source/dnode/vnode/src/meta/metaOpen.c	Dispatches meta force repair per vnode and adds external backup copy logic.
source/dnode/vnode/src/tsdb/tsdbFS2.c	Adds TSDB force repair dispatch during FS open, backup/manifest/logging, and core repair actions.
source/dnode/vnode/src/vnd/vnodeRepair.c	Provides weak fallback stubs for dmRepair* accessors for linkability in libvnode contexts.
source/dnode/vnode/CMakeLists.txt	Adds the new `vnodeRepair.c` compilation unit.
docs/plans/2026-03-07-tsdb-force-repair-*.md	Adds TSDB force repair design + implementation plan notes.
docs/plans/2026-03-06-meta-force-repair-*.md	Adds META force repair design + implementation plan notes.
docs/data_repair/03-TSDB_repair/*	Adds phase3 task plan/findings/progress/handoff documentation for TSDB repair workstream.
docs/data_repair/02-META_repair/*	Adds/updates phase2 task plan/findings/progress documentation for META repair.
AGENTS.md	Adds repository guidelines for structure/build/test/style.
.gitignore	Ignores `.agents/` and `skills-lock.json`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T04:56:06Z

source/dnode/vnode/src/meta/metaOpen.c

+static void metaMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsMetaRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
+    return;
+  }
+
+  if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
+  snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}
+
+static bool metaForceRepairMatchesVnode(int32_t vgId) {
+  if (!dmRepairFlowEnabled() || metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
+    return false;


The force-repair "done" tracking uses a process-global CSV string (tsMetaRepairDoneVnodeId) that is read/modified from metaOpen(). Vnodes are opened concurrently (vmOpenVnodes launches multiple threads), so this introduces a data race and possible string corruption, leading to repeated or skipped repairs. Protect the read/check/append with a mutex (or replace the string with a thread-safe set/bitmap keyed by vgId).

Copilot · 2026-03-08T04:56:07Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

@@ -32,12 +44,666 @@ typedef struct {
  STFileHashEntry **buckets;
 } STFileHash;

+typedef enum {
+  TSDB_REPAIR_CORE_KEEP = 0,
+  TSDB_REPAIR_CORE_DROP,
+  TSDB_REPAIR_CORE_REBUILD,
+} ECoreRepairAction;
+
+typedef struct {
+  int32_t           fid;
+  bool              affected;
+  bool              dropStt;
+  ECoreRepairAction coreAction;
+  bool              staged;
+  int32_t           totalBlocks;
+  int32_t           keptBlocks;
+  int32_t           droppedBlocks;
+  char              coreReason[64];
+} STsdbRepairPlan;
+
+static int32_t save_json(const cJSON *json, const char *fname);
+
+static int32_t tsdbFSDupState(STFileSystem *fs);
+
+static int32_t commit_edit(STFileSystem *fs);
+
+static void tsdbRepairPlanInit(const STFileSet *fset, STsdbRepairPlan *plan) {
+  memset(plan, 0, sizeof(*plan));
+  plan->fid = fset->fid;
+}
+
+static void tsdbRepairPlanSetCore(STsdbRepairPlan *plan, ECoreRepairAction action, const char *reason) {
+  if (plan->coreAction == TSDB_REPAIR_CORE_DROP) {
+    return;
+  }
+  if (action == TSDB_REPAIR_CORE_DROP || plan->coreAction == TSDB_REPAIR_CORE_KEEP) {
+    plan->coreAction = action;
+    if (reason != NULL) {
+      tstrncpy(plan->coreReason, reason, sizeof(plan->coreReason));
+    }
+  }
+  plan->affected = plan->dropStt || (plan->coreAction != TSDB_REPAIR_CORE_KEEP);
+}
+
 static const char *gCurrentFname[] = {
    [TSDB_FCURRENT] = "current.json",
    [TSDB_FCURRENT_C] = "current.c.json",
    [TSDB_FCURRENT_M] = "current.m.json",
 };

+static bool tsdbRepairListContains(const char *csv, int32_t vgId) {
+  if (csv == NULL || csv[0] == '\0') {
+    return false;
+  }
+
+  char buf[PATH_MAX] = {0};
+  tstrncpy(buf, csv, sizeof(buf));
+
+  char *savePtr = NULL;
+  for (char *token = strtok_r(buf, ",", &savePtr); token != NULL; token = strtok_r(NULL, ",", &savePtr)) {
+    if (atoi(token) == vgId) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
+static void tsdbMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsTsdbRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
+    return;
+  }
+
+  if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
+  snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}
+
+static bool tsdbShouldForceRepair(STFileSystem *fs) {
+  int32_t vgId = TD_VID(fs->tsdb->pVnode);
+
+  if (!dmRepairFlowEnabled() || tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
+    return false;


The TSDB force-repair "done" tracking uses a process-global CSV string (tsTsdbRepairDoneVnodeId) that is read/modified during open_fs(). Vnodes are opened concurrently, so this is not thread-safe and can corrupt the buffer or cause incorrect de-dup decisions. Guard accesses with a mutex (or use a concurrent set/bitmap keyed by vgId).

Copilot · 2026-03-08T04:56:07Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+    def _start_repair_process(self, args):
+        bin_path = self._get_taosd_bin()
+        cmd = [bin_path, "-c", self._get_cfg_dir()] + shlex.split(args)
+        tdLog.info("run repair cmd: %s" % " ".join(cmd))
+        env = os.environ.copy()
+        asan_options = env.get("ASAN_OPTIONS", "")
+        if "detect_leaks=" not in asan_options:
+            env["ASAN_OPTIONS"] = (
+                "detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
+            )
+        env.setdefault("LSAN_OPTIONS", "detect_leaks=0")
+        if extra_env:
+            env.update(extra_env)
+        return subprocess.Popen(


_start_repair_process() references extra_env but the function has no such parameter/variable in scope, so this will raise a NameError at runtime when the helper is used. Add an extra_env parameter (default None) and apply it to env, or remove the dead code path if not needed.

Copilot · 2026-03-08T04:56:07Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+    def _run_taosd_with_cfg(self, args, timeout_sec=None, extra_env=None):
+        bin_path = self._get_taosd_bin()
+        cmd = [bin_path, "-c", self._get_cfg_dir()] + shlex.split(args)
+        tdLog.info("run cmd: %s" % " ".join(cmd))
+        env = os.environ.copy()
+        asan_options = env.get("ASAN_OPTIONS", "")
+        if "detect_leaks=" not in asan_options:
+            env["ASAN_OPTIONS"] = (
+                "detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
+            )
+        env.setdefault("LSAN_OPTIONS", "detect_leaks=0")
+        proc = subprocess.Popen(
+            cmd,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            text=True,
+            encoding="utf-8",
+            env=env,
+        )


_run_taosd_with_cfg() accepts an extra_env argument but never applies it to the subprocess environment. Either remove the parameter or update the function to merge extra_env into env so callers can reliably inject test-only environment variables.

Copilot · 2026-03-08T04:56:07Z

test/cases/80-Components/01-Taosd/test_com_cmdline.py

+    def _run_taosd(self, args):
+        bin_path = self._get_taosd_bin()
+        cmd = [bin_path] + shlex.split(args)
+        tdLog.info("run cmd: %s" % " ".join(cmd))
+        env = os.environ.copy()
+        asan_options = env.get("ASAN_OPTIONS", "")
+        if "detect_leaks=" not in asan_options:
+            env["ASAN_OPTIONS"] = "detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
+        proc = subprocess.run(
+            cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, encoding="utf-8", env=env
+        )
+        output = proc.stdout or ""
+        tdLog.info("ret=%s output=%s" % (proc.returncode, output[:500].replace("\n", "\\n")))
+        return proc.returncode, output


_run_taosd() uses subprocess.run() without a timeout. Several cases invoke taosd -r ... without -V and could hang if the CLI behavior regresses, which would stall CI. Add a reasonable timeout and include the timeout handling in the assertion helper.

…force flag - Simplify repair mode help text and remove deprecated `--force` option validation - Add detailed comments to `SDmRepairOption` structure for better clarity - Remove phase1 execution restriction for non-meta/tsdb file types - Update repair help text to reflect current compatibility rules - Refactor meta repair functions with improved strategy handling

- Split metaForceRepair into strategy-specific functions (metaForceRepairFromUid) - Simplify metaGetRepairStrategy to return default strategy directly - Improve code organization and maintainability for meta repair operations

Copilot

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T06:21:19Z

source/dnode/vnode/src/meta/metaOpen.c

+    metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
+    return code;
+  }
+


metaMarkForceRepairDone() is never called after a successful meta force repair. Without marking the vnode as repaired, repeated metaOpen() calls in the same process could re-run force repair unexpectedly. After a successful metaForceRepair() (or after metaForceRepairIfShould() returns success), record completion via metaMarkForceRepairDone(TD_VID(pVnode)).

Suggested change

metaMarkForceRepairDone(TD_VID(pVnode));

Copilot · 2026-03-09T06:21:20Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+        code, output = self._run_taosd_with_cfg(
+            f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"
+        )
+
+        tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)
+        tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)


The test docstring says taosd is stopped before running tsdb force repair, but the test doesn't actually stop the running dnode before launching taosd -r .... This can make the test flaky (port already in use / data directory lock) and diverges from the intended scenario. Stop taosd (e.g., tdDnodes.stop(1)) before invoking _run_taosd_with_cfg, and restart in a finally block like the other tests in this file.

Suggested change

code, output = self._run_taosd_with_cfg(

f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"

)

tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)

tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)

tdDnodes.stop(1)

try:

code, output = self._run_taosd_with_cfg(

f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"

)

tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)

tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)

finally:

tdDnodes.start(1)

Copilot · 2026-03-09T06:21:20Z

source/dnode/mgmt/exe/dmMain.c

+static int32_t dmParseRepairOption(int32_t argc, char const *argv[], int32_t *pIndex, bool *pParsed) {
+  int32_t  code = TSDB_CODE_SUCCESS;
+  int32_t  index = *pIndex;
+  const char *arg = argv[index];
+  bool        matched = false;
+  bool        optMatched = false;
+  SDmRepairOption *pOpt = &global.repairOpt;
+
+  *pParsed = false;


dmParseRepairOption() declares const char *arg = argv[index]; but never uses it. This can trigger -Wunused-variable warnings in builds that treat warnings as errors. Remove the unused variable (or use it) to keep the build clean.

Copilot · 2026-03-09T06:21:20Z

source/dnode/vnode/src/meta/metaOpen.c

+static int32_t metaBackupCurrentMeta(SVnode *pVnode) {
+  char metaDir[TSDB_FILENAME_LEN] = {0};
+  char backupDir[TSDB_FILENAME_LEN] = {0};
+
+  vnodeGetMetaPath(pVnode, VNODE_META_DIR, metaDir);
+
+  int32_t code = metaBuildRepairBackupDir(pVnode, backupDir, sizeof(backupDir));
+  if (code != 0) {
+    return code;
+  }
+
+  if (taosCheckExistFile(backupDir)) {
+    metaError("vgId:%d repair backup dir already exists: %s", TD_VID(pVnode), backupDir);
+    return TSDB_CODE_FS_FILE_ALREADY_EXISTS;
+  }
+
+  code = metaCopyDirRecursive(metaDir, backupDir);
+  if (code != 0) {
+    metaError("vgId:%d failed to back up meta from %s to %s, reason:%s", TD_VID(pVnode), metaDir, backupDir,
+              tstrerror(code));
+    return code;
+  }
+
+  metaInfo("vgId:%d backed up meta to %s", TD_VID(pVnode), backupDir);
+  return TSDB_CODE_SUCCESS;
+}


metaBackupCurrentMeta() is defined but never called. As a result, meta force repair won't create the external backup directory that the tests (and design notes) expect. Call metaBackupCurrentMeta(pVnode) when metaShouldForceRepair() is true (ideally before any local rename/switch operations) and handle/propagate its error code.

- Introduce new `--repair-target` parameter to replace single-profile options (`--file-type`, `--vnode-id`) - Support multiple repair targets in a single `taosd -r` invocation - Define grammar: `<file-type>:<key>=<value>[:<key>=<value>]...` with file types: meta, tsdb, wal - Enforce validation: requires `-r`, `--mode force`, `--node-type vnode`, and at least one `--repair-target` - Remove backward compatibility for old CLI options, treat them as invalid - Update error messages and test cases to reflect new interface - Centralize parsing in `dmMain.c` with normalized target structures

…ggregated structures - Replace `SArray<SDmRepairTarget>` runtime model with three aggregated structures: `meta` (vnodeId -> strategy), `wal` (vnodeId -> presence), and `tsdb` (vnodeId -> fileId -> strategy) - Update `dmRepair.h` to expose only leaf configurations and dedicated accessors, hiding underlying containers - Modify `metaOpen.c` and `tsdbFS2.c` to use new accessors instead of scanning generic target lists - Update documentation in `findings.md` and `progress.md` to reflect the refined data model - Ensure parser test cases (`test_com_cmdline.py -k repair_cmdline_repair_target`) continue to pass after refactoring

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T11:23:05Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

+  for (int32_t ftype = TSDB_FTYPE_HEAD; ftype <= TSDB_FTYPE_SMA; ++ftype) {
+    if (fset->farr[ftype] != NULL) {
+      STFileOp op = {.optype = TSDB_FOP_REMOVE, .fid = fset->fid, .of = fset->farr[ftype]->f[0]};
+      code = TARRAY2_APPEND(fopArr, op);
+      if (code != 0) goto _exit;
+
+      TAOS_UNUSED(tsdbTFileObjUnref(fset->farr[ftype]));
+      fset->farr[ftype] = NULL;
+    }


tsdbRepairDropCoreOnTmpFSet() clears/unrefs fset->farr[ftype] before calling tsdbTFileSetEdit() with a TSDB_FOP_REMOVE op. tsdbTFileSetEdit()'s remove path dereferences fset->farr[op->of.type] to unref it, so setting it to NULL here will lead to a NULL dereference (and also double-unref if it were non-NULL). Let tsdbTFileSetEdit() own the unref/NULL assignment (i.e., remove the manual tsdbTFileObjUnref + farr[ftype]=NULL), or apply the edit before mutating fset->farr.

Copilot · 2026-03-09T11:23:05Z

source/dnode/vnode/src/meta/metaOpen.c

+  // Open a new meta for organization
+  code = metaOpenImpl(pMeta->pVnode, &pNewMeta, VNODE_META_TMP_DIR, false);
+  if (code) {
+    return code;
+  }
+
+  code = metaBegin(pNewMeta, META_BEGIN_HEAP_NIL);
+  if (code) {
+    return code;
+  }


metaForceRepair() has multiple early-return error paths after successfully opening pNewMeta (and/or starting a transaction) that don't close pNewMeta or otherwise clean up the temp meta state. This can leak resources and potentially leave meta_tmp in a partially initialized state. Consider switching to a single goto _exit cleanup path that always metaClose(&pNewMeta) (and aborts/rolls back any started transaction) before returning.

Copilot · 2026-03-09T11:23:05Z

source/dnode/vnode/src/meta/metaOpen.c

+static void metaMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsMetaRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
+    return;
+  }
+
+  if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
+  snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}


metaMarkForceRepairDone() / tsMetaRepairDoneVnodeId are introduced and used for the “already repaired in this process” check, but the code never calls metaMarkForceRepairDone() after a successful force repair. As a result, the done-list check will never take effect, and the helper itself is currently dead code. Either call metaMarkForceRepairDone(TD_VID(pVnode)) after metaForceRepair() succeeds (so repeated metaOpen() calls don't re-run repair), or remove the unused tracking code.

Copilot · 2026-03-09T11:23:06Z

source/dnode/vnode/src/meta/metaOpen.c

+static int32_t metaForceRepairIfShould(SVnode *pVnode, SMeta **ppMeta) {
+  int32_t           code = TSDB_CODE_SUCCESS;
+  EDmRepairStrategy strategy = DM_REPAIR_STRATEGY_META_FROM_UID;
+  bool              shouldForceRepair = metaShouldForceRepair(pVnode, &strategy);
+
+  // Check if meta should repair
+  if (!shouldForceRepair) {
+    metaDebug("vgId:%d, meta should not repair!", TD_VID(pVnode));
+    return code;
+  }
+
+  // Do repair
+  code = metaForceRepair(ppMeta, strategy);
+  if (code) {
+    metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
+    return code;
+  }
+
+  return code;


After a successful repair in metaForceRepairIfShould(), the vnode isn't marked as “repaired” for this process, so a later metaOpen() for the same vnode (e.g., reopen paths) could re-run force repair again. This is especially important since metaForceRepairMatchesVnode() checks tsMetaRepairDoneVnodeId but that list is never updated. Mark the vnode as done on success (or remove the done-list gate entirely).

- Introduce SRepairVnodeOpt structure to encapsulate vnode-specific repair options - Update SDmRepairOption to use vnodeOpt and prepare for mnode/snode support - Adjust cleanup and target insertion functions to work with new structure - Clarify node-type options in comments and add new repair modes (copy, replica)

… flag - Remove 'dnode' from the node-type option comment in dmMain.c as it is no longer supported - Eliminate the global variable generateNewMeta and all its references across dmMain.c and metaOpen.c - Simplify repair option handling by removing unnecessary flag resets - Update code to reflect current supported node types: vnode, mnode, snode

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T03:01:01Z

docs/plans/2026-03-09-vnode-repair-target-cli-plan.md

+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+


These plan docs include assistant-specific instructions (e.g. “For Claude: REQUIRED SUB-SKILL…”). This is likely to confuse human readers and may become stale quickly. Consider moving AI-runbook content into AGENTS.md (or an internal-only doc) and keep repository plans focused on the technical design/tasks.

Suggested change

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copilot · 2026-03-10T03:01:02Z

docs/plans/2026-03-07-tsdb-force-repair-plan.md

+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+


This plan doc contains assistant-specific instructions (“For Claude…Use superpowers…”). If these files are intended for general engineering handoff, consider removing or relocating the AI runbook content to AGENTS.md so the plan remains tool-agnostic and easier to maintain.

Suggested change

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copilot · 2026-03-10T03:01:02Z

docs/plans/2026-03-06-meta-force-repair-plan.md

+# META Force Repair Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Add real execution for `taosd -r --mode force --file-type meta` by enhancing the existing `metaGenerateNewMeta()` path so each vnode decides during `metaOpen()` whether it should run force repair, with external backup support and crash-safe local directory switching.
+


This plan doc includes assistant-specific directions (“For Claude…Use superpowers…”). If the intent is a reusable engineering plan, consider removing those lines or moving them into an AI-specific runbook (e.g., AGENTS.md) to avoid confusing non-AI readers.

- Remove tsMetaRepairDoneVnodeId static variable and related helper functions - Simplify metaShouldForceRepair by directly checking repair flow and vnode options - Eliminate redundant metaForceRepairMatchesVnode function - Improve code clarity by removing unnecessary indirection in repair condition checks

Disable the meta backup step in metaForceRepair by commenting it out with #if 0. This change is likely a temporary measure to allow the repair process to proceed without performing a backup, possibly for debugging or to avoid backup-related failures during forced repairs. The statistics reset step remains active.

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T03:23:29Z

source/dnode/vnode/src/meta/metaOpen.c

+#if 0
+  code = metaBackupCurrentMeta(pVnode);
+  if (code != 0) {
+    metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
+    return code;
+  }
 #endif


metaBackupCurrentMeta() is never executed because the call is wrapped in #if 0 inside metaForceRepair(). This makes the external backup feature effectively disabled and will also break the E2E expectation that a backup directory is created during meta force repair. Please remove the compile-time guard (or gate it on dmRepairHasBackupPath() / repair mode) so backups are actually produced in repair runs.

Suggested change

#if 0

code = metaBackupCurrentMeta(pVnode);

if (code != 0) {

metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));

return code;

}

#endif

if (dmRepairHasBackupPath()) {

code = metaBackupCurrentMeta(pVnode);

if (code != 0) {

metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));

return code;

}

}

Copilot · 2026-03-10T03:23:29Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

@@ -32,12 +36,679 @@ typedef struct {
  STFileHashEntry **buckets;
 } STFileHash;

+typedef enum {
+  TSDB_REPAIR_CORE_KEEP = 0,
+  TSDB_REPAIR_CORE_DROP,
+  TSDB_REPAIR_CORE_REBUILD,
+} ECoreRepairAction;
+
+typedef struct {
+  int32_t           fid;
+  bool              affected;
+  bool              dropStt;
+  ECoreRepairAction coreAction;
+  bool              staged;
+  int32_t           totalBlocks;
+  int32_t           keptBlocks;
+  int32_t           droppedBlocks;
+  char              coreReason[64];
+} STsdbRepairPlan;
+
+static int32_t save_json(const cJSON *json, const char *fname);
+
+static int32_t tsdbFSDupState(STFileSystem *fs);
+
+static int32_t commit_edit(STFileSystem *fs);
+
+static void tsdbRepairPlanInit(const STFileSet *fset, STsdbRepairPlan *plan) {
+  memset(plan, 0, sizeof(*plan));
+  plan->fid = fset->fid;
+}
+
+static void tsdbRepairPlanSetCore(STsdbRepairPlan *plan, ECoreRepairAction action, const char *reason) {
+  if (plan->coreAction == TSDB_REPAIR_CORE_DROP) {
+    return;
+  }
+  if (action == TSDB_REPAIR_CORE_DROP || plan->coreAction == TSDB_REPAIR_CORE_KEEP) {
+    plan->coreAction = action;
+    if (reason != NULL) {
+      tstrncpy(plan->coreReason, reason, sizeof(plan->coreReason));
+    }
+  }
+  plan->affected = plan->dropStt || (plan->coreAction != TSDB_REPAIR_CORE_KEEP);
+}
+
 static const char *gCurrentFname[] = {
    [TSDB_FCURRENT] = "current.json",
    [TSDB_FCURRENT_C] = "current.c.json",
    [TSDB_FCURRENT_M] = "current.m.json",
 };

+static bool tsdbRepairListContains(const char *csv, int32_t vgId) {
+  if (csv == NULL || csv[0] == '\0') {
+    return false;
+  }
+
+  char buf[PATH_MAX] = {0};
+  tstrncpy(buf, csv, sizeof(buf));
+
+  char *savePtr = NULL;
+  for (char *token = strtok_r(buf, ",", &savePtr); token != NULL; token = strtok_r(NULL, ",", &savePtr)) {
+    if (atoi(token) == vgId) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
+static void tsdbMarkForceRepairDone(int32_t vgId) {
+  char vnodeText[32] = {0};
+  snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);
+
+  if (tsTsdbRepairDoneVnodeId[0] == '\0') {
+    tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
+    return;
+  }
+
+  if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
+    return;
+  }
+
+  int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
+  snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
+}


tsTsdbRepairDoneVnodeId is a process-global mutable buffer used to track which vnodes have been repaired. Vnodes are opened concurrently (see vmOpenVnodes() spawning threads in source/dnode/mgmt/mgmt_vnode/src/vmInt.c), so reads/writes to this global (via tsdbShouldForceRepair() / tsdbMarkForceRepairDone()) can race and corrupt the CSV or cause missed/duplicate repairs. Please make this tracking thread-safe (e.g., a mutex-protected hash/set keyed by vgId, or store the flag on SVnode/STsdb instance) or otherwise ensure the open path is single-threaded in repair mode.

Copilot · 2026-03-10T03:23:29Z

docs/plans/2026-03-06-meta-force-repair-plan.md

@@ -0,0 +1,352 @@
+# META Force Repair Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.


This plan document includes assistant-specific meta text ("For Claude" / "superpowers:executing-plans"). Please remove or replace with repo/tool-agnostic guidance to avoid confusion for future maintainers.

Suggested change

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

> Note: This implementation plan is intended to be executed task-by-task, either manually or by automation tooling.

Introduce walShouldDeleteCorruption inline function to conditionally delete corrupted WAL files based on tsWalDeleteOnCorruption flag or dmRepairNeedWalRepair status. This replaces direct checks of tsWalDeleteOnCorruption in walLogEntriesComplete and walCheckAndRepairMeta, enabling more flexible corruption handling that considers both global configuration and vgId-specific repair requirements.

Copilot

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T04:41:31Z

source/dnode/vnode/src/tsdb/tsdbFS2.c

+static char tsTsdbRepairDoneVnodeId[PATH_MAX] = {0};
+


tsTsdbRepairDoneVnodeId is a file-static global that is mutated by tsdbShouldForceRepair() / tsdbMarkForceRepairDone() without synchronization. Vnodes are opened concurrently (multiple threads in vmOpenVnodes), so this is a data race and can corrupt the CSV buffer or lead to repeated/partial dispatch decisions. Please make the “already repaired” state per-vnode/per-STFileSystem (or guard shared state with a mutex/atomic).

Copilot · 2026-03-10T04:41:31Z

source/libs/wal/src/walMeta.c

+static FORCE_INLINE bool walShouldDeleteCorruption(const SWal* pWal) {
+  return tsWalDeleteOnCorruption || dmRepairNeedWalRepair(pWal->cfg.vgId);
+}


walShouldDeleteCorruption() calls dmRepairNeedWalRepair(), but in this PR that symbol is only implemented in the taosd executable (dmMain.c) and in vnodeRepair.c (vnode library). The WAL unit test (walTest) links wal without those objects, so this introduces an undefined symbol at link time for test builds (and any other binary that links wal but not vnode/mgmt). Provide weak/default dmRepair implementations from a library that wal always links (e.g., common), or otherwise ensure the symbol is available wherever wal is used.

Copilot · 2026-03-10T04:41:31Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+        code, output = self._run_taosd_with_cfg(
+            self._tsdb_repair_args(vnode_id, repair_fid, extra_args="--log-output /dev/null")
+        )


This test runs taosd -r ... via _run_taosd_with_cfg() while the dnode is still running (unlike the other tests in this file which stop the dnode first). Starting a second taosd against the same config/data dir is likely to fail (port conflict / file lock / concurrent data access) and makes the test flaky. Stop tdDnodes before invoking the repair process and restart it in a finally block, consistent with test_tsdb_force_repair_dispatches_in_open_fs().

- Add `dmRepairNodeTypeIsVnode()` and `dmRepairModeIsForce()` functions to header and implementation - Introduce `tsdbShouldForceRepair()` and `tsdbForceRepair()` function declarations for TSDB repair logic - Refactor TSDB repair flow to conditionally apply force repair based on node type and mode - Improve repair plan handling with detailed block validation and error reporting

- Implement `tsdbForceRepair` to handle missing or corrupted files in TSDB - Add helper functions for detecting bad files and performing deep scans - Include STT file validation and data part verification mechanisms - Introduce commit change process with proper locking for thread safety - Structure repair operations to maintain data consistency during recovery

Copilot

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T10:31:43Z

source/dnode/vnode/src/tsdb/tsdbRepair.c

+static int32_t tsdbDeepScanAndFixSttFile(STFileset *pFileSet, STFileObj *pStt) {
+  int32_t             code = TSDB_CODE_SUCCESS;
+  SSttFileReader     *reader;
+  const TSttBlkArray *sttBlkArray = NULL;
+
+  // Open
+  SSttFileReaderConfig config = {
+      // TODO
+
+  };
+  code = tsdbSttFileReaderOpen(pStt->fname, &config, &reader);
+  if (code) {
+    // TODO: error handle, need to delete this file
+  }
+
+  // read the index part
+  code = tsdbSttFileReadSttBlk(reader, &sttBlkArray);
+  if (code) {
+    // TODO: error handle, need to delete this file
+  }
+
+  // Loop to read each data part
+  for (int32_t i = 0; i < sttBlkArray->size; i++) {
+    SSttBlk *pSttBlk = ;
+    code = tsdbReadFile(STsdbFD * pFD, int64_t offset, uint8_t *pBuf, int64_t size, int64_t szHint,
+                        SEncryptData *encryptData);
+    if (code) {
+      // TODO: find a bad block, need to eliminate it
+    }
+  };


tsdbRepair.c currently contains incomplete/invalid C code (e.g., wrong type STFileset, empty initializer SSttBlk *pSttBlk = ;, and a placeholder call to tsdbReadFile(...) with a function signature pasted into the call). This will not compile and needs to be completed or removed (e.g., gate unfinished code behind #if 0 and enable the implemented path).

Copilot · 2026-03-10T10:31:43Z

source/dnode/vnode/src/tsdb/tsdbRepair.c

+static int32_t tsdbForceRepairFileSet(STFileSystem *pFS, STFileSet *pFileSet, TFileOpArray *opArr, bool *hasChange) {
+  int32_t code = TSDB_CODE_SUCCESS;
+
+  // TODO: if .head or .data is missing, just delete the data
+  code = tsdbForceRepairFileSetBadFiles(pFS);
+  if (code) {
+    // TODO
+    return code;
+  }
+
+  // TODO: if deep scan and fix the data, do deep scan and fix
+  code = tsdbForceRepairFileSetDeepScanAndFix(pFS);
+  if (code) {


tsdbForceRepairFileSetBadFiles / tsdbForceRepairFileSetDeepScanAndFix are called with the wrong argument lists here (missing pFileSet, opArr, hasChange, etc.), so this code cannot compile and also can’t apply any edits. Please fix the function calls/signatures consistently and ensure opArr is populated with the intended file operations.

Copilot · 2026-03-10T10:31:43Z

source/dnode/vnode/src/tsdb/tsdbRepair.c

+int32_t tsdbForceRepair(STFileSystem *fs) {
+  int32_t code = TSDB_CODE_SUCCESS;
+
+  bool         hasChange = false;
+  TFileOpArray opArr = {0};
+
+  // Loop to force repair each file set
+  STFileSet *pFileSet = NULL;
+  TARRAY2_FOREACH(fs->fSetArr, pFileSet) {
+    code = tsdbForceRepairFileSet(fs, pFileSet, &hasChange);
+    if (code) {
+      tsdbError("vgId:%d %s failed to force repair file set, fid:%d since %s, code:%d", TD_VID(fs->tsdb->pVnode),
+                __func__, pFileSet->fid, tstrerror(code), code);
+      return code;
+    }
+  }
+
+  code = tsdbForceRepairCommitChange(fs, &opArr);
+  if (code) {


In tsdbForceRepair, the loop calls tsdbForceRepairFileSet(fs, pFileSet, &hasChange) but tsdbForceRepairFileSet is declared with four parameters (pFS, pFileSet, opArr, hasChange). This is a compile-time error; please pass the correct arguments and ensure the TFileOpArray opArr is properly initialized (e.g., via the project’s TARRAY2_INIT/append helpers) before using it in tsdbFSEditBegin/commit.

Copilot · 2026-03-10T10:31:44Z

source/dnode/vnode/src/tsdb/tsdbRepair.c

+int32_t tsdbForceRepair(STFileSystem *fs) {
+  int32_t code = TSDB_CODE_SUCCESS;
+
+  bool         hasChange = false;
+  TFileOpArray opArr = {0};
+
+  // Loop to force repair each file set
+  STFileSet *pFileSet = NULL;
+  TARRAY2_FOREACH(fs->fSetArr, pFileSet) {
+    code = tsdbForceRepairFileSet(fs, pFileSet, &hasChange);
+    if (code) {
+      tsdbError("vgId:%d %s failed to force repair file set, fid:%d since %s, code:%d", TD_VID(fs->tsdb->pVnode),
+                __func__, pFileSet->fid, tstrerror(code), code);
+      return code;
+    }
+  }
+
+  code = tsdbForceRepairCommitChange(fs, &opArr);
+  if (code) {
+    // TODO: output error log
+    return code;
+  }
+
+#if 0
+  int32_t code = tsdbFSDupState(fs);
+  if (code != 0) {
+    return code;
+  }
+
+  bool             changed = false;
+  const STFileSet *srcFset = NULL;
+  TARRAY2_FOREACH(fs->fSetArr, srcFset) {
+    EDmRepairStrategy repairStrategy = DM_REPAIR_STRATEGY_NONE;
+    if (!tsdbRepairMatchTargetForFid(TD_VID(fs->tsdb->pVnode), srcFset->fid, &repairStrategy)) {
+      continue;
+    }
+    TAOS_UNUSED(repairStrategy);
+
+    STsdbRepairPlan plan;
+    code = tsdbRepairAnalyzeFileSet(fs, srcFset, &plan);
+    if (code != 0) {
+      return code;
+    }
+    if (!plan.affected) {
+      continue;
+    }
+
+    code = tsdbRepairBackupAffectedFileSet(fs, srcFset, &plan);
+    if (code != 0) {
+      return code;
+    }
+
+    STFileSet *dstFset = tsdbRepairFindTmpFSet(fs, srcFset->fid);
+    if (dstFset == NULL) {
+      return TSDB_CODE_FAILED;
+    }
+
+    if (plan.dropStt) {
+      tsdbRepairDropSttOnTmpFSet(dstFset);
+      changed = true;
+    }
+
+    if (plan.coreAction == TSDB_REPAIR_CORE_DROP) {
+      code = tsdbRepairDropCoreOnTmpFSet(fs, dstFset);
+      if (code != 0) return code;
+      changed = true;
+    } else if (plan.coreAction == TSDB_REPAIR_CORE_REBUILD) {
+      code = tsdbRepairRebuildCoreOnTmpFSet(fs, srcFset, dstFset, &plan);
+      if (code != 0) {
+        return code;
+      }
+      changed = true;
+    }
+  }
+
+  if (!changed) {
+    printf("tsdb force repair dispatch: vnode%d\n", TD_VID(fs->tsdb->pVnode));
+    fflush(stdout);
+    tsdbMarkForceRepairDone(TD_VID(fs->tsdb->pVnode));
+    return 0;
+  }
+
+  code = tsdbRepairCommitStagedCurrent(fs);
+  if (code != 0) {
+    return code;
+  }
+
+  printf("tsdb force repair dispatch: vnode%d\n", TD_VID(fs->tsdb->pVnode));
+  fflush(stdout);
+  tsdbMarkForceRepairDone(TD_VID(fs->tsdb->pVnode));
+#endif
+  return code;
+}


The current tsdbForceRepair implementation doesn’t print the dispatch marker that the new pytest cases assert on (e.g., "tsdb force repair dispatch"). The only printf/marker emission is in the #if 0 block below, so the tests will fail even if this compiles. Please either re-enable the implemented dispatch/logging path or update the tests to assert on the actual output produced by the enabled repair flow.

Copilot · 2026-03-10T10:31:44Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+        code, output = self._run_taosd_with_cfg(
+            self._tsdb_repair_args(vnode_id, repair_fid, extra_args="--log-output /dev/null")
+        )
+


This test runs taosd -r --mode force ... without stopping the existing tdDnodes taosd instance first. Launching a second taosd process against the same data/cfg directories can fail due to locks or (worse) corrupt state. Consider stopping tdDnodes (as done in other repair tests) before invoking the repair-mode taosd, then restarting in finally.

Add deep scanning logic to tsdbRepair.c to detect and fix corrupted data blocks within brin blocks. The new tsdbDeepScanAndFixDataPart function reads brin blocks, validates data blocks, and skips corrupted entries. This enhances data integrity during repair operations by isolating and handling bad data segments without affecting the entire dataset.

- Change default TSDB repair strategy from `shallow_repair` to `drop_invalid_only` - Rename TSDB repair strategies: `shallow_repair` → `drop_invalid_only`, `deep_repair` → `head_only_rebuild` - Add new `full_rebuild` strategy for complete core data reconstruction - Update documentation in both English and Chinese versions with detailed strategy descriptions - Modify internal enum values and code references to reflect new strategy naming - Update command examples to use new strategy names - Add test configurations for new repair functionality The changes provide more granular control over TSDB repair operations with clearer strategy semantics.

Copilot

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T08:38:06Z

source/dnode/vnode/test/CMakeLists.txt

+ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)
+DEP_ext_gtest(tsdbRepairTest)
+TARGET_LINK_LIBRARIES(
+         tsdbRepairTest
+         PUBLIC os util common vnode
+)
+
+TARGET_INCLUDE_DIRECTORIES(
+         tsdbRepairTest
+         PUBLIC "${TD_SOURCE_DIR}/include/common"
+         PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"
+         PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"
+         PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"
+)


tsdbRepairTest is added unconditionally, but tsdbRepairTest.cpp includes/uses POSIX-only APIs (e.g. sys/syscall.h, mkstemp, SYS_close). This will fail to build on Windows. Suggest guarding the ADD_EXECUTABLE(tsdbRepairTest ...) block with IF(NOT TD_WINDOWS) (similar to tqTest) or providing a Windows-compatible implementation.

Suggested change

ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)

DEP_ext_gtest(tsdbRepairTest)

TARGET_LINK_LIBRARIES(

tsdbRepairTest

PUBLIC os util common vnode

)

TARGET_INCLUDE_DIRECTORIES(

tsdbRepairTest

PUBLIC "${TD_SOURCE_DIR}/include/common"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"

)

IF(NOT TD_WINDOWS)

ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)

DEP_ext_gtest(tsdbRepairTest)

TARGET_LINK_LIBRARIES(

tsdbRepairTest

PUBLIC os util common vnode

)

TARGET_INCLUDE_DIRECTORIES(

tsdbRepairTest

PUBLIC "${TD_SOURCE_DIR}/include/common"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"

PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"

)

ENDIF()

- Add note that default `drop_invalid_only` strategy only handles missing-file damage - Specify that size-mismatch corruption requires explicit deep strategies (`head_only_rebuild` or `full_rebuild`) - Update both English and Chinese documentation consistently - Move repair-related source file from vnode to common directory for better code organization

- Introduce suite groups for metadata, core_e2e, and stt_e2e tests - Add helper methods for running force repair operations and verifying results - Implement test fixtures for core and STT file corruption scenarios - Include assertions for database writability and repair log validation - Refactor existing tests to use new fixture-based approach - Add temporary file handling and improved error recovery mechanisms

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T01:44:39Z

source/dnode/vnode/src/meta/metaOpen.c

+#if 0
+  code = metaBackupCurrentMeta(pVnode);
+  if (code != 0) {
+    metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
+    return code;
+  }
 #endif


The metaBackupCurrentMeta function is defined but the call site is wrapped in #if 0 (line 558), so it's dead code. If this is intentional for a future phase, consider adding a comment explaining when it will be enabled; otherwise remove it to avoid confusion.

Suggested change

#if 0

code = metaBackupCurrentMeta(pVnode);

if (code != 0) {

metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));

return code;

}

#endif

Copilot · 2026-03-12T01:44:39Z

source/dnode/vnode/src/meta/metaOpen.c

+  // Open a new meta for organization
+  code = metaOpenImpl(pMeta->pVnode, &pNewMeta, VNODE_META_TMP_DIR, false);
+  if (code) {
+    return code;
+  }
+
+  code = metaBegin(pNewMeta, META_BEGIN_HEAP_NIL);
+  if (code) {
+    return code;
+  }
+
+  EMetaRepairStrategy strategy = metaGetRepairStrategy(repairStrategy);
+  if (strategy == E_META_REPAIR_FROM_UID) {
+    code = metaForceRepairFromUid(pVnode, pMeta, pNewMeta);
+    if (code) {
+      metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
+      return code;
+    }
+  } else if (strategy == E_META_REPAIR_FROM_REDO) {
+    code = metaForceRepairFromRedo(pVnode, pMeta, pNewMeta);
+    if (code) {
+      metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
+      return code;
+    }
+  }


In metaForceRepair, if metaOpenImpl or metaBegin fails, pNewMeta is leaked (it's opened but never closed on the error path). Similarly, if metaForceRepairFromUid or metaForceRepairFromRedo fails at lines 582-591, pNewMeta is not closed before returning.

Copilot · 2026-03-12T01:44:39Z

source/dnode/mgmt/exe/dmMain.c

+  }
+
+  if (code == TSDB_CODE_OPS_NOT_SUPPORT) {
+    return 1;


dmFinalizeRepairOption returns 1 (line 788) for TSDB_CODE_OPS_NOT_SUPPORT, which is an ad-hoc non-standard error code that gets propagated from dmParseArgs. This breaks the convention where the function otherwise returns TSDB_CODE_* values. The caller in mainWindows checks code != 0 and may misinterpret this as an unrelated error. Consider returning a proper TSDB_CODE_* constant instead.

Suggested change

return 1;

return TSDB_CODE_OPS_NOT_SUPPORT;

Copilot · 2026-03-12T01:44:40Z

test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py

+    def _prepare_stt_fixture(self, total_rows=4000):
+        dbname = f"tsdb_repair_stt_fixture_{time.time_ns()}"
+        ts0 = 1700000000000
+        table_name = "d0"
+
+        tdSql.execute(f"drop database if exists {dbname}")
+        tdSql.execute(f"create database {dbname} vgroups 1 stt_trigger 1 minrows 10 maxrows 200")
+        tdSql.execute(f"drop table if exists {dbname}.meters")
+        tdSql.execute(f"create table {dbname}.meters (ts timestamp, c1 int, c2 float) tags(t1 int)")
+        tdSql.execute(f"create table {dbname}.{table_name} using {dbname}.meters tags(1)")
+
+        sql = f"insert into {dbname}.{table_name} values "
+        sql += ",".join(f"({ts0 + i}, 1, 0.1)" for i in range(100))
+        tdSql.execute(sql)
+        tdSql.execute(f"flush database {dbname}")
+
+        sql = f"insert into {dbname}.{table_name} values "
+        sql += ",".join(f"({ts0 + 99 + i}, 1, 0.1)" for i in range(100))
+        tdSql.execute(sql)
+        tdSql.execute(f"flush database {dbname}")
+
+        tdSql.execute(f"insert into {dbname}.{table_name} values({ts0 + 1000}, 2, 1.0)")
+        tdSql.execute(f"flush database {dbname}")
+        time.sleep(2)
+
+        tdSql.query(f"select count(*) from {dbname}.{table_name}")
+        tdSql.checkData(0, 0, 200)
+
+        vnode_id = self._get_vnode_id_for_db(dbname, table_name=table_name)
+        stt_path, stt_entries = self._wait_for_stt_file(dbname, vnode_id, timeout_sec=90)
+        if stt_path is None or stt_entries <= 0:
+            pytest.skip("real stt fixture was not materialized in time")
+
+        fid = self._parse_fid_from_tsdb_path(stt_path)
+        tdSql.checkEqual(fid is not None, True)
+        return {
+            "dbname": dbname,
+            "vnode_id": vnode_id,
+            "fid": fid,
+            "row_count": 200,
+            "table_name": table_name,
+            "stt_path": stt_path,
+            "stt_entries": stt_entries,
+        }


The _prepare_stt_fixture method accepts a total_rows=4000 parameter but never uses it — the actual row count is hardcoded to 200 (two batches of 100 + 1 extra row, and row_count is returned as 200). The total_rows parameter is misleading and should either be removed or actually used.

hzcheng added 3 commits March 6, 2026 15:02

feat(taosd): implement phase1 repair CLI parsing and validation

caf0dee

feat(repair): add meta force repair flow and coverage

555e6d6

feat(repair): add tsdb force repair flow and coverage

2ac1025

- enable tsdb repair to enter the real execution path - add vnode/tsdb force repair handling and backup logic - add tsdb force repair pytest coverage and design notes - normalize timezone strings for repair-related tests

Copilot AI review requested due to automatic review settings March 8, 2026 04:49

hzcheng requested review from a team, dapan1121, guanshengliang and zitsen as code owners March 8, 2026 04:49

taosdata-bot bot added the team engine label Mar 8, 2026

Copilot started reviewing on behalf of hzcheng March 8, 2026 04:49 View session

gemini-code-assist bot reviewed Mar 8, 2026

View reviewed changes

Copilot AI reviewed Mar 8, 2026

View reviewed changes

hzcheng added 2 commits March 9, 2026 13:37

Copilot AI review requested due to automatic review settings March 9, 2026 06:15

Copilot started reviewing on behalf of hzcheng March 9, 2026 06:16 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

hzcheng added 2 commits March 9, 2026 17:12

Copilot AI review requested due to automatic review settings March 9, 2026 11:16

Copilot started reviewing on behalf of hzcheng March 9, 2026 11:16 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

hzcheng added 2 commits March 10, 2026 10:28

Copilot AI review requested due to automatic review settings March 10, 2026 02:56

Copilot started reviewing on behalf of hzcheng March 10, 2026 02:57 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

hzcheng added 2 commits March 10, 2026 11:11

Copilot AI review requested due to automatic review settings March 10, 2026 03:19

Copilot started reviewing on behalf of hzcheng March 10, 2026 03:19 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

hzcheng added 2 commits March 10, 2026 11:58

remove useless files

bdc47aa

Copilot AI review requested due to automatic review settings March 10, 2026 04:34

Copilot started reviewing on behalf of hzcheng March 10, 2026 04:35 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

hzcheng added 2 commits March 10, 2026 16:53

Copilot AI review requested due to automatic review settings March 10, 2026 10:22

Copilot started reviewing on behalf of hzcheng March 10, 2026 10:23 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

hzcheng added 2 commits March 10, 2026 19:18

Copilot AI review requested due to automatic review settings March 11, 2026 08:31

Copilot started reviewing on behalf of hzcheng March 11, 2026 08:31 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

hzcheng added 3 commits March 11, 2026 18:33

docs: plan tsdb force repair test redesign

25e38ba

Copilot AI review requested due to automatic review settings March 12, 2026 01:41

Copilot started reviewing on behalf of hzcheng March 12, 2026 01:42 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

hzcheng closed this Mar 12, 2026

	def _start_repair_process(self, args):
	def _start_repair_process(self, args, extra_env=None):

		snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%smeta", root, sep, dateBuf, TD_DIRSEP, TD_VID(pVnode),
		TD_DIRSEP);

		snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%stsdb%sfid_%d", root, sep, dateBuf, TD_DIRSEP,
		TD_VID(fs->tsdb->pVnode), TD_DIRSEP, TD_DIRSEP, fid);

		> For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

		@@ -0,0 +1,352 @@
		# META Force Repair Implementation Plan

		> For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

	> For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
	> Note: This implementation plan is intended to be executed task-by-task, either manually or by automation tooling.

Conversation

hzcheng commented Mar 8, 2026

Description

Issue(s)

Checklist

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026