Skip to content

feat: data force repair#34707

Closed
hzcheng wants to merge 20 commits into3.0from
feat/hzcheng/data_repair
Closed

feat: data force repair#34707
hzcheng wants to merge 20 commits into3.0from
feat/hzcheng/data_repair

Conversation

@hzcheng
Copy link
Contributor

@hzcheng hzcheng commented Mar 8, 2026

  • enable tsdb repair to enter the real execution path
    • add vnode/tsdb force repair handling and backup logic
    • add tsdb force repair pytest coverage and design notes
    • normalize timezone strings for repair-related tests

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

hzcheng added 3 commits March 6, 2026 15:02
  - enable tsdb repair to enter the real execution path
  - add vnode/tsdb force repair handling and backup logic
  - add tsdb force repair pytest coverage and design notes
  - normalize timezone strings for repair-related tests
Copilot AI review requested due to automatic review settings March 8, 2026 04:49
@hzcheng hzcheng requested review from a team, dapan1121, guanshengliang and zitsen as code owners March 8, 2026 04:49
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求为 TDengine 引入了关键的数据修复功能,通过对 taosd -r 命令的参数层进行全面重构,使其能够支持更精细化的 vnode 和 TSDB 文件强制修复操作。核心工作包括设计并实现了元数据和 TSDB 文件的故障恢复逻辑,确保在数据损坏时能够进行有针对性的修复,同时通过引入崩溃安全机制和详细的备份策略,极大地提升了系统的健壮性和可恢复性。此外,还通过广泛的测试用例和对测试环境稳定性的改进,确保了新功能的可靠性。

Highlights

  • 命令行参数重构: 重构了 taosd -r 命令的参数解析逻辑,以支持新的修复选项,同时保持对旧行为的兼容性。
  • 元数据强制修复 (META Force Repair): 实现了 vnode 元数据(meta)的强制修复功能,包括针对指定 vnode 或所有 vnode 的修复,以及外部备份机制。
  • TSDB 强制修复 (TSDB Force Repair): 引入了 TSDB 文件(tsdb)的强制修复能力,能够分析文件组、备份受影响的文件集、处理缺失的 STT/核心文件,并支持从有效块重建核心文件组。
  • 崩溃安全 (Crash-Safe) 机制: 在 TSDB 修复过程中,对 current.json 的更新采用了事务性 manifest 切换机制,确保在系统崩溃时数据一致性。
  • 测试覆盖与环境稳定性: 增加了大量的 pytest 测试用例,覆盖了参数解析、元数据修复和 TSDB 修复的各种场景,并修复了与 ASAN/LSAN 环境相关的测试稳定性问题,以及时区字符串规范化问题。
  • 详细文档与规划: 新增了多份详细的文档,包括设计、实施计划、发现和进度日志,为数据修复功能的开发提供了全面的记录和指导。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • 新增了 .agents/skills-lock.json 的忽略条目。
  • AGENTS.md
    • 新增了一份文档,概述了仓库指南、项目结构、构建命令、编码风格、测试指南和提交/PR 约定。
  • docs/data_repair/01-参数重构/findings.md
    • 新增了一份文档,详细说明了数据修复第一阶段参数重构的发现、决策和遇到的问题。
  • docs/data_repair/01-参数重构/progress.md
    • 新增了一份进度日志,记录了数据修复第一阶段参数重构所采取的行动、测试结果和错误解决方案。
  • docs/data_repair/01-参数重构/task_plan.md
    • 新增了一份任务计划文档,用于 taosd -r 参数层重构的第一阶段。
  • docs/data_repair/02-META_repair/findings.md
    • 新增了一份文档,详细说明了 META 强制修复第二阶段的发现、决策和实施说明。
  • docs/data_repair/02-META_repair/progress.md
    • 新增了一份进度日志,记录了 META 强制修复第二阶段的会话、行动和验证步骤。
  • docs/data_repair/02-META_repair/task_plan.md
    • 新增了一份任务计划文档,用于 META 强制修复的第二阶段。
  • docs/data_repair/03-TSDB_repair/2026-03-07-next-session-handoff-prompt.md
    • 新增了一份详细的交接文档,用于 TSDB 强制修复的下一阶段,概述了上下文、当前状态和优先级。
  • docs/data_repair/03-TSDB_repair/findings.md
    • 新增了一份文档,详细说明了 TSDB 强制修复第三阶段的发现、确认范围、代码库洞察和崩溃安全分析。
  • docs/data_repair/03-TSDB_repair/progress.md
    • 新增了一份进度日志,记录了 TSDB 强制修复第三阶段的会话、行动和验证步骤。
  • docs/data_repair/03-TSDB_repair/task_plan.md
    • 新增了一份任务计划文档,用于 TSDB 强制修复的第三阶段。
  • docs/plans/2026-03-06-meta-force-repair-design.md
    • 新增了一份 META 强制修复的设计文档。
  • docs/plans/2026-03-06-meta-force-repair-plan.md
    • 新增了一份 META 强制修复的实施计划。
  • docs/plans/2026-03-07-tsdb-force-repair-design.md
    • 新增了一份 TSDB 强制修复的设计文档。
  • docs/plans/2026-03-07-tsdb-force-repair-plan.md
    • 新增了一份 TSDB 强制修复的实施计划。
  • source/dnode/mgmt/exe/dmMain.c
    • 重构了命令行参数解析,以支持新的修复选项,废弃了 --force,添加了修复专用帮助,并引入了管理修复流程状态的函数。
  • source/dnode/mgmt/node_mgmt/inc/dmMgmt.h
    • 为新的修复选项访问器函数添加了外部声明。
  • source/dnode/vnode/CMakeLists.txt
    • 更新了 CMake 配置,以包含新的 vnodeRepair.c 源文件。
  • source/dnode/vnode/src/meta/metaOpen.c
    • 集成了新的修复流程逻辑,包括 vnode 特定的修复匹配、修复前的外部元数据备份,以及标记已修复的 vnode 以防止重复执行。
  • source/dnode/vnode/src/tsdb/tsdbFS2.c
    • 实现了 TSDB 强制修复逻辑,包括文件集分析、受影响文件集备份、崩溃安全的 manifest 更新,以及核心文件组的修复/重建。
  • source/dnode/vnode/src/vnd/vnodeRepair.c
    • 为修复选项访问器函数添加了弱符号定义,允许它们被 dmMain.c 覆盖。
  • source/os/src/osTimezone.c
    • 修改了 truncateTimezoneString 以移除时区字符串中的前导斜杠,并将其应用于 getTimezoneStr
  • source/os/test/osTimeTests.cpp
    • truncateTimezoneString 添加了一个单元测试,以验证前导斜杠的移除。
  • test/cases/80-Components/01-Taosd/test_com_cmdline.py
    • 更新了 _run_taosd 以确保 ASAN_OPTIONS 包含 detect_leaks=0,并为 taosd -r 命令行解析添加了全面的测试套件。
  • test/cases/80-Components/01-Taosd/test_meta_force_repair.py
    • 新增了 META 强制修复的 pytest 用例,包括所有 vnode 修复、非元数据修复占位符行为和真实 vnode 元数据备份创建的测试。
  • test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py
    • 新增了 TSDB 强制修复的 pytest 用例,涵盖了 open_fs 中的调度、真实执行路径入口、受影响文件集备份、缺失 STT/核心文件的移除、崩溃安全的 manifest 更新,以及从有效块重建核心文件组。
Activity
  • 拉取请求描述概述了关键变更,包括启用 TSDB 修复、添加 vnode/TSDB 强制修复处理和备份逻辑,以及添加 pytest 覆盖和设计说明。
  • 描述中还提到了规范化修复相关测试的时区字符串。
  • 作者提供了一份用户手册更新、测试用例自动化和测试覆盖的清单,表明了对变更的全面方法。
  • 在多个修复阶段(01-参数重构、02-META_repair、03-TSDB_repair)中存在详细的 findings.mdprogress.mdtask_plan.md 文件,表明了迭代开发过程和持续的文档与审查。
  • 2026-03-07-next-session-handoff-prompt.md 文件表明了结构化的开发交接过程,强调了特定文件、当前任务范围、边界以及先前会话的关键发现。
  • progress.md 文件详细记录了所采取的具体行动、测试结果(包括红/绿阶段)、错误日志和解决方案,显示了积极的开发和调试过程。例如,修复了 LSAN 问题、pytest 环境问题和 printf 缓冲问题。
  • findings.md 文件记录了确认的范围、现有实现事实、用户约束、设计主题、新确认的决策、实施说明、验证说明以及从调试复杂测试环境中获得的长期记忆/经验教训。
  • task_plan.md 文件概述了目标、当前阶段、详细步骤、关键问题和已做出的决策,表明了周密的开发计划。
  • test_com_cmdline.pytest_meta_force_repair.pytest_tsdb_force_repair.py 文件显示了为覆盖新修复功能和确保捕获回归而进行的活跃测试用例开发。
  • source/os/src/osTimezone.csource/os/test/osTimeTests.cpp 的变更表明修复了一个导致 taosd 启动问题的时区解析问题,展示了对已识别问题的响应能力。
  • source/dnode/vnode/src/tsdb/tsdbFS2.c 的变更包括 printf 缓冲的修复以及 tsdbRepairDropCoreOnTmpFSet 正确生成 TSDB_FOP_REMOVE 操作的关键修复,表明对核心逻辑进行了详细的调试和纠正。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: data force repair for meta and tsdb data, including backup logic and detailed execution paths. While the implementation is comprehensive, a security audit identified two medium-severity issues related to how backups are handled: the default backup location in the system's temporary directory may expose sensitive data, and user-provided backup paths are not sanitized for path traversal, potentially allowing arbitrary directory creation. Furthermore, two critical race conditions related to tracking repaired vnodes in a multi-threaded context and a bug in a test helper function need to be addressed.

Comment on lines +308 to +323
static void metaMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsMetaRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
return;
}

if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The global static variable tsMetaRepairDoneVnodeId is accessed and modified here without any synchronization. Since metaOpen is called in parallel for different vnodes, this will lead to a race condition when multiple threads try to update this shared string simultaneously. This can cause memory corruption or incorrect behavior.

Please protect all accesses to tsMetaRepairDoneVnodeId (in this function and in metaForceRepairMatchesVnode) with a mutex. A static TdThreadMutex would be appropriate here.

Comment on lines +113 to +128
static void tsdbMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsTsdbRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
return;
}

if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the issue in metaOpen.c, the global static variable tsTsdbRepairDoneVnodeId is modified here without thread synchronization. The tsdbDispatchForceRepair function, which calls this, is part of the parallel vnode startup process. This creates a race condition that can lead to memory corruption or incorrect repair behavior.

Please add a mutex to protect all reads and writes to tsTsdbRepairDoneVnodeId.

return int(value)
tdLog.exit(f"failed to resolve vnode id from show {dbname}.vgroups result: {row}")

def _start_repair_process(self, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable extra_env is used within this function but is not defined as a parameter or a local variable. This will cause a NameError if the function is ever called.

Although this helper function does not seem to be used in the current test suite, this is a latent bug that should be fixed. Please add extra_env=None to the function signature.

Suggested change
def _start_repair_process(self, args):
def _start_repair_process(self, args, extra_env=None):

}

static int32_t metaBuildRepairBackupDir(SVnode *pVnode, char *buf, int32_t bufLen) {
const char *root = dmRepairHasBackupPath() ? dmRepairBackupPath() : TD_TMP_DIR_PATH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The data repair process defaults to backing up sensitive database metadata to the system's temporary directory (/tmp) when --backup-path is not specified. On multi-user systems, /tmp is typically world-readable, which could allow unauthorized local users to access sensitive database information. It is recommended to use a more secure default location (e.g., a subdirectory within the TDengine data directory with restricted permissions) or explicitly set restrictive permissions (e.g., 0700) on the created backup directories.

}

static int32_t tsdbRepairBuildBackupFSetDir(STFileSystem *fs, int32_t fid, char *buf, int32_t bufLen) {
const char *root = dmRepairHasBackupPath() ? dmRepairBackupPath() : TD_TMP_DIR_PATH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The data repair process defaults to backing up sensitive database files to the system's temporary directory (/tmp) when --backup-path is not specified. On multi-user systems, /tmp is typically world-readable, which could allow unauthorized local users to access sensitive database information. It is recommended to use a more secure default location or explicitly set restrictive permissions on the created backup directories.

Comment on lines +404 to +405
snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%smeta", root, sep, dateBuf, TD_DIRSEP, TD_VID(pVnode),
TD_DIRSEP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The user-provided --backup-path is used to construct filesystem paths for data backups without sanitization for path traversal sequences (e.g., ..). This allows a user with the ability to run the repair command to create directories and write backup files in arbitrary locations on the filesystem. While the impact is limited by the fixed subdirectory structure appended to the path, it is best practice to sanitize the input or validate that the resolved path remains within an authorized base directory.

Comment on lines +163 to +164
snprintf(buf, bufLen, "%s%staos_backup_%s%svnode%d%stsdb%sfid_%d", root, sep, dateBuf, TD_DIRSEP,
TD_VID(fs->tsdb->pVnode), TD_DIRSEP, TD_DIRSEP, fid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The user-provided --backup-path is used to construct filesystem paths for data backups without sanitization for path traversal sequences (e.g., ..). This allows a user with the ability to run the repair command to create directories and write backup files in arbitrary locations on the filesystem. It is recommended to sanitize the input or validate that the resolved path remains within an authorized base directory.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables real execution for taosd -r --node-type vnode --file-type tsdb --mode force by dispatching repair logic inside vnode/TSDB open paths, adds backup/manifest/log handling for affected TSDB file-sets, and introduces end-to-end pytest coverage plus supporting docs/design notes. It also normalizes timezone strings (e.g., /UTCUTC) to stabilize repair-related startup/tests.

Changes:

  • Add TSDB force-repair dispatch in tsdb open fs, including affected file-set backup (manifest + original files) and core-group drop/rebuild logic.
  • Add META force-repair dispatch in metaOpen() with external backup support and CLI parameter flow accessors.
  • Add/extend pytest E2E coverage for repair flows and add a timezone normalization fix + unit test.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/cases/80-Components/01-Taosd/test_tsdb_force_repair.py New E2E suite for TSDB force repair dispatch, backup/log behavior, crash-safe manifest staging, and rebuild scenarios.
test/cases/80-Components/01-Taosd/test_meta_force_repair.py New E2E coverage for META force repair vnode targeting and backup directory creation.
test/cases/80-Components/01-Taosd/test_com_cmdline.py Adds CLI parameter-layer regression matrix for taosd -r (phase1 behaviors).
source/os/src/osTimezone.c Normalizes timezone strings via truncateTimezoneString() to handle leading /.
source/os/test/osTimeTests.cpp Adds a unit test for the timezone string normalization behavior.
source/dnode/mgmt/exe/dmMain.c Implements repair option parsing/validation and exposes repair accessors for vnode layers.
source/dnode/mgmt/node_mgmt/inc/dmMgmt.h Declares repair option accessor APIs.
source/dnode/vnode/src/meta/metaOpen.c Dispatches meta force repair per vnode and adds external backup copy logic.
source/dnode/vnode/src/tsdb/tsdbFS2.c Adds TSDB force repair dispatch during FS open, backup/manifest/logging, and core repair actions.
source/dnode/vnode/src/vnd/vnodeRepair.c Provides weak fallback stubs for dmRepair* accessors for linkability in libvnode contexts.
source/dnode/vnode/CMakeLists.txt Adds the new vnodeRepair.c compilation unit.
docs/plans/2026-03-07-tsdb-force-repair-*.md Adds TSDB force repair design + implementation plan notes.
docs/plans/2026-03-06-meta-force-repair-*.md Adds META force repair design + implementation plan notes.
docs/data_repair/03-TSDB_repair/* Adds phase3 task plan/findings/progress/handoff documentation for TSDB repair workstream.
docs/data_repair/02-META_repair/* Adds/updates phase2 task plan/findings/progress documentation for META repair.
AGENTS.md Adds repository guidelines for structure/build/test/style.
.gitignore Ignores .agents/ and skills-lock.json.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +308 to +327
static void metaMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsMetaRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
return;
}

if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
}

static bool metaForceRepairMatchesVnode(int32_t vgId) {
if (!dmRepairFlowEnabled() || metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
return false;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The force-repair "done" tracking uses a process-global CSV string (tsMetaRepairDoneVnodeId) that is read/modified from metaOpen(). Vnodes are opened concurrently (vmOpenVnodes launches multiple threads), so this introduces a data race and possible string corruption, leading to repeated or skipped repairs. Protect the read/check/append with a mutex (or replace the string with a thread-safe set/bitmap keyed by vgId).

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +134
@@ -32,12 +44,666 @@ typedef struct {
STFileHashEntry **buckets;
} STFileHash;

typedef enum {
TSDB_REPAIR_CORE_KEEP = 0,
TSDB_REPAIR_CORE_DROP,
TSDB_REPAIR_CORE_REBUILD,
} ECoreRepairAction;

typedef struct {
int32_t fid;
bool affected;
bool dropStt;
ECoreRepairAction coreAction;
bool staged;
int32_t totalBlocks;
int32_t keptBlocks;
int32_t droppedBlocks;
char coreReason[64];
} STsdbRepairPlan;

static int32_t save_json(const cJSON *json, const char *fname);

static int32_t tsdbFSDupState(STFileSystem *fs);

static int32_t commit_edit(STFileSystem *fs);

static void tsdbRepairPlanInit(const STFileSet *fset, STsdbRepairPlan *plan) {
memset(plan, 0, sizeof(*plan));
plan->fid = fset->fid;
}

static void tsdbRepairPlanSetCore(STsdbRepairPlan *plan, ECoreRepairAction action, const char *reason) {
if (plan->coreAction == TSDB_REPAIR_CORE_DROP) {
return;
}
if (action == TSDB_REPAIR_CORE_DROP || plan->coreAction == TSDB_REPAIR_CORE_KEEP) {
plan->coreAction = action;
if (reason != NULL) {
tstrncpy(plan->coreReason, reason, sizeof(plan->coreReason));
}
}
plan->affected = plan->dropStt || (plan->coreAction != TSDB_REPAIR_CORE_KEEP);
}

static const char *gCurrentFname[] = {
[TSDB_FCURRENT] = "current.json",
[TSDB_FCURRENT_C] = "current.c.json",
[TSDB_FCURRENT_M] = "current.m.json",
};

static bool tsdbRepairListContains(const char *csv, int32_t vgId) {
if (csv == NULL || csv[0] == '\0') {
return false;
}

char buf[PATH_MAX] = {0};
tstrncpy(buf, csv, sizeof(buf));

char *savePtr = NULL;
for (char *token = strtok_r(buf, ",", &savePtr); token != NULL; token = strtok_r(NULL, ",", &savePtr)) {
if (atoi(token) == vgId) {
return true;
}
}

return false;
}

static void tsdbMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsTsdbRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
return;
}

if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
}

static bool tsdbShouldForceRepair(STFileSystem *fs) {
int32_t vgId = TD_VID(fs->tsdb->pVnode);

if (!dmRepairFlowEnabled() || tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
return false;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TSDB force-repair "done" tracking uses a process-global CSV string (tsTsdbRepairDoneVnodeId) that is read/modified during open_fs(). Vnodes are opened concurrently, so this is not thread-safe and can corrupt the buffer or cause incorrect de-dup decisions. Guard accesses with a mutex (or use a concurrent set/bitmap keyed by vgId).

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +112
def _start_repair_process(self, args):
bin_path = self._get_taosd_bin()
cmd = [bin_path, "-c", self._get_cfg_dir()] + shlex.split(args)
tdLog.info("run repair cmd: %s" % " ".join(cmd))
env = os.environ.copy()
asan_options = env.get("ASAN_OPTIONS", "")
if "detect_leaks=" not in asan_options:
env["ASAN_OPTIONS"] = (
"detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
)
env.setdefault("LSAN_OPTIONS", "detect_leaks=0")
if extra_env:
env.update(extra_env)
return subprocess.Popen(
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_start_repair_process() references extra_env but the function has no such parameter/variable in scope, so this will raise a NameError at runtime when the helper is used. Add an extra_env parameter (default None) and apply it to env, or remove the dead code path if not needed.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +156
def _run_taosd_with_cfg(self, args, timeout_sec=None, extra_env=None):
bin_path = self._get_taosd_bin()
cmd = [bin_path, "-c", self._get_cfg_dir()] + shlex.split(args)
tdLog.info("run cmd: %s" % " ".join(cmd))
env = os.environ.copy()
asan_options = env.get("ASAN_OPTIONS", "")
if "detect_leaks=" not in asan_options:
env["ASAN_OPTIONS"] = (
"detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
)
env.setdefault("LSAN_OPTIONS", "detect_leaks=0")
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
encoding="utf-8",
env=env,
)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_run_taosd_with_cfg() accepts an extra_env argument but never applies it to the subprocess environment. Either remove the parameter or update the function to merge extra_env into env so callers can reliably inject test-only environment variables.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +61
def _run_taosd(self, args):
bin_path = self._get_taosd_bin()
cmd = [bin_path] + shlex.split(args)
tdLog.info("run cmd: %s" % " ".join(cmd))
env = os.environ.copy()
asan_options = env.get("ASAN_OPTIONS", "")
if "detect_leaks=" not in asan_options:
env["ASAN_OPTIONS"] = "detect_leaks=0" if not asan_options else asan_options + ":detect_leaks=0"
proc = subprocess.run(
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, encoding="utf-8", env=env
)
output = proc.stdout or ""
tdLog.info("ret=%s output=%s" % (proc.returncode, output[:500].replace("\n", "\\n")))
return proc.returncode, output
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_run_taosd() uses subprocess.run() without a timeout. Several cases invoke taosd -r ... without -V and could hang if the CLI behavior regresses, which would stall CI. Add a reasonable timeout and include the timeout handling in the assertion helper.

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 9, 2026 13:37
…force flag

- Simplify repair mode help text and remove deprecated `--force` option validation
- Add detailed comments to `SDmRepairOption` structure for better clarity
- Remove phase1 execution restriction for non-meta/tsdb file types
- Update repair help text to reflect current compatibility rules
- Refactor meta repair functions with improved strategy handling
- Split metaForceRepair into strategy-specific functions (metaForceRepairFromUid)
- Simplify metaGetRepairStrategy to return default strategy directly
- Improve code organization and maintainability for meta repair operations
Copilot AI review requested due to automatic review settings March 9, 2026 06:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
return code;
}

Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaMarkForceRepairDone() is never called after a successful meta force repair. Without marking the vnode as repaired, repeated metaOpen() calls in the same process could re-run force repair unexpectedly. After a successful metaForceRepair() (or after metaForceRepairIfShould() returns success), record completion via metaMarkForceRepairDone(TD_VID(pVnode)).

Suggested change
metaMarkForceRepairDone(TD_VID(pVnode));

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +244
code, output = self._run_taosd_with_cfg(
f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"
)

tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)
tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test docstring says taosd is stopped before running tsdb force repair, but the test doesn't actually stop the running dnode before launching taosd -r .... This can make the test flaky (port already in use / data directory lock) and diverges from the intended scenario. Stop taosd (e.g., tdDnodes.stop(1)) before invoking _run_taosd_with_cfg, and restart in a finally block like the other tests in this file.

Suggested change
code, output = self._run_taosd_with_cfg(
f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"
)
tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)
tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)
tdDnodes.stop(1)
try:
code, output = self._run_taosd_with_cfg(
f"-r --node-type vnode --file-type tsdb --vnode-id {vnode_id} --mode force --log-output /dev/null"
)
tdSql.checkEqual("repair execution is not enabled in this phase" in output, False)
tdSql.checkEqual(code == 0 and "repair parameter validation succeeded (phase1)" in output, False)
finally:
tdDnodes.start(1)

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +356
static int32_t dmParseRepairOption(int32_t argc, char const *argv[], int32_t *pIndex, bool *pParsed) {
int32_t code = TSDB_CODE_SUCCESS;
int32_t index = *pIndex;
const char *arg = argv[index];
bool matched = false;
bool optMatched = false;
SDmRepairOption *pOpt = &global.repairOpt;

*pParsed = false;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmParseRepairOption() declares const char *arg = argv[index]; but never uses it. This can trigger -Wunused-variable warnings in builds that treat warnings as errors. Remove the unused variable (or use it) to keep the build clean.

Copilot uses AI. Check for mistakes.
Comment on lines +408 to +433
static int32_t metaBackupCurrentMeta(SVnode *pVnode) {
char metaDir[TSDB_FILENAME_LEN] = {0};
char backupDir[TSDB_FILENAME_LEN] = {0};

vnodeGetMetaPath(pVnode, VNODE_META_DIR, metaDir);

int32_t code = metaBuildRepairBackupDir(pVnode, backupDir, sizeof(backupDir));
if (code != 0) {
return code;
}

if (taosCheckExistFile(backupDir)) {
metaError("vgId:%d repair backup dir already exists: %s", TD_VID(pVnode), backupDir);
return TSDB_CODE_FS_FILE_ALREADY_EXISTS;
}

code = metaCopyDirRecursive(metaDir, backupDir);
if (code != 0) {
metaError("vgId:%d failed to back up meta from %s to %s, reason:%s", TD_VID(pVnode), metaDir, backupDir,
tstrerror(code));
return code;
}

metaInfo("vgId:%d backed up meta to %s", TD_VID(pVnode), backupDir);
return TSDB_CODE_SUCCESS;
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaBackupCurrentMeta() is defined but never called. As a result, meta force repair won't create the external backup directory that the tests (and design notes) expect. Call metaBackupCurrentMeta(pVnode) when metaShouldForceRepair() is true (ideally before any local rename/switch operations) and handle/propagate its error code.

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 9, 2026 17:12
- Introduce new `--repair-target` parameter to replace single-profile options (`--file-type`, `--vnode-id`)
- Support multiple repair targets in a single `taosd -r` invocation
- Define grammar: `<file-type>:<key>=<value>[:<key>=<value>]...` with file types: meta, tsdb, wal
- Enforce validation: requires `-r`, `--mode force`, `--node-type vnode`, and at least one `--repair-target`
- Remove backward compatibility for old CLI options, treat them as invalid
- Update error messages and test cases to reflect new interface
- Centralize parsing in `dmMain.c` with normalized target structures
…ggregated structures

- Replace `SArray<SDmRepairTarget>` runtime model with three aggregated structures: `meta` (vnodeId -> strategy), `wal` (vnodeId -> presence), and `tsdb` (vnodeId -> fileId -> strategy)
- Update `dmRepair.h` to expose only leaf configurations and dedicated accessors, hiding underlying containers
- Modify `metaOpen.c` and `tsdbFS2.c` to use new accessors instead of scanning generic target lists
- Update documentation in `findings.md` and `progress.md` to reflect the refined data model
- Ensure parser test cases (`test_com_cmdline.py -k repair_cmdline_repair_target`) continue to pass after refactoring
Copilot AI review requested due to automatic review settings March 9, 2026 11:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +485 to +493
for (int32_t ftype = TSDB_FTYPE_HEAD; ftype <= TSDB_FTYPE_SMA; ++ftype) {
if (fset->farr[ftype] != NULL) {
STFileOp op = {.optype = TSDB_FOP_REMOVE, .fid = fset->fid, .of = fset->farr[ftype]->f[0]};
code = TARRAY2_APPEND(fopArr, op);
if (code != 0) goto _exit;

TAOS_UNUSED(tsdbTFileObjUnref(fset->farr[ftype]));
fset->farr[ftype] = NULL;
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsdbRepairDropCoreOnTmpFSet() clears/unrefs fset->farr[ftype] before calling tsdbTFileSetEdit() with a TSDB_FOP_REMOVE op. tsdbTFileSetEdit()'s remove path dereferences fset->farr[op->of.type] to unref it, so setting it to NULL here will lead to a NULL dereference (and also double-unref if it were non-NULL). Let tsdbTFileSetEdit() own the unref/NULL assignment (i.e., remove the manual tsdbTFileObjUnref + farr[ftype]=NULL), or apply the edit before mutating fset->farr.

Copilot uses AI. Check for mistakes.
Comment on lines +609 to +618
// Open a new meta for organization
code = metaOpenImpl(pMeta->pVnode, &pNewMeta, VNODE_META_TMP_DIR, false);
if (code) {
return code;
}

code = metaBegin(pNewMeta, META_BEGIN_HEAP_NIL);
if (code) {
return code;
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaForceRepair() has multiple early-return error paths after successfully opening pNewMeta (and/or starting a transaction) that don't close pNewMeta or otherwise clean up the temp meta state. This can leak resources and potentially leave meta_tmp in a partially initialized state. Consider switching to a single goto _exit cleanup path that always metaClose(&pNewMeta) (and aborts/rolls back any started transaction) before returning.

Copilot uses AI. Check for mistakes.
Comment on lines +300 to +315
static void metaMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsMetaRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsMetaRepairDoneVnodeId, vnodeText, sizeof(tsMetaRepairDoneVnodeId));
return;
}

if (metaRepairListContains(tsMetaRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsMetaRepairDoneVnodeId);
snprintf(tsMetaRepairDoneVnodeId + offset, sizeof(tsMetaRepairDoneVnodeId) - offset, ",%s", vnodeText);
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaMarkForceRepairDone() / tsMetaRepairDoneVnodeId are introduced and used for the “already repaired in this process” check, but the code never calls metaMarkForceRepairDone() after a successful force repair. As a result, the done-list check will never take effect, and the helper itself is currently dead code. Either call metaMarkForceRepairDone(TD_VID(pVnode)) after metaForceRepair() succeeds (so repeated metaOpen() calls don't re-run repair), or remove the unused tracking code.

Copilot uses AI. Check for mistakes.
Comment on lines +720 to +738
static int32_t metaForceRepairIfShould(SVnode *pVnode, SMeta **ppMeta) {
int32_t code = TSDB_CODE_SUCCESS;
EDmRepairStrategy strategy = DM_REPAIR_STRATEGY_META_FROM_UID;
bool shouldForceRepair = metaShouldForceRepair(pVnode, &strategy);

// Check if meta should repair
if (!shouldForceRepair) {
metaDebug("vgId:%d, meta should not repair!", TD_VID(pVnode));
return code;
}

// Do repair
code = metaForceRepair(ppMeta, strategy);
if (code) {
metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
return code;
}

return code;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a successful repair in metaForceRepairIfShould(), the vnode isn't marked as “repaired” for this process, so a later metaOpen() for the same vnode (e.g., reopen paths) could re-run force repair again. This is especially important since metaForceRepairMatchesVnode() checks tsMetaRepairDoneVnodeId but that list is never updated. Mark the vnode as done on success (or remove the done-list gate entirely).

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 10, 2026 10:28
- Introduce SRepairVnodeOpt structure to encapsulate vnode-specific repair options
- Update SDmRepairOption to use vnodeOpt and prepare for mnode/snode support
- Adjust cleanup and target insertion functions to work with new structure
- Clarify node-type options in comments and add new repair modes (copy, replica)
… flag

- Remove 'dnode' from the node-type option comment in dmMain.c as it is no longer supported
- Eliminate the global variable generateNewMeta and all its references across dmMain.c and metaOpen.c
- Simplify repair option handling by removing unnecessary flag resets
- Update code to reflect current supported node types: vnode, mnode, snode
Copilot AI review requested due to automatic review settings March 10, 2026 02:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +4
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These plan docs include assistant-specific instructions (e.g. “For Claude: REQUIRED SUB-SKILL…”). This is likely to confuse human readers and may become stale quickly. Consider moving AI-runbook content into AGENTS.md (or an internal-only doc) and keep repository plans focused on the technical design/tasks.

Suggested change
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +4
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan doc contains assistant-specific instructions (“For Claude…Use superpowers…”). If these files are intended for general engineering handoff, consider removing or relocating the AI runbook content to AGENTS.md so the plan remains tool-agnostic and easier to maintain.

Suggested change
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
# META Force Repair Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Add real execution for `taosd -r --mode force --file-type meta` by enhancing the existing `metaGenerateNewMeta()` path so each vnode decides during `metaOpen()` whether it should run force repair, with external backup support and crash-safe local directory switching.

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan doc includes assistant-specific directions (“For Claude…Use superpowers…”). If the intent is a reusable engineering plan, consider removing those lines or moving them into an AI-specific runbook (e.g., AGENTS.md) to avoid confusing non-AI readers.

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 10, 2026 11:11
- Remove tsMetaRepairDoneVnodeId static variable and related helper functions
- Simplify metaShouldForceRepair by directly checking repair flow and vnode options
- Eliminate redundant metaForceRepairMatchesVnode function
- Improve code clarity by removing unnecessary indirection in repair condition checks
Disable the meta backup step in metaForceRepair by commenting it out with #if 0. This change is likely a temporary measure to allow the repair process to proceed without performing a backup, possibly for debugging or to avoid backup-related failures during forced repairs. The statistics reset step remains active.
Copilot AI review requested due to automatic review settings March 10, 2026 03:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +558 to 564
#if 0
code = metaBackupCurrentMeta(pVnode);
if (code != 0) {
metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
return code;
}
#endif
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaBackupCurrentMeta() is never executed because the call is wrapped in #if 0 inside metaForceRepair(). This makes the external backup feature effectively disabled and will also break the E2E expectation that a backup directory is created during meta force repair. Please remove the compile-time guard (or gate it on dmRepairHasBackupPath() / repair mode) so backups are actually produced in repair runs.

Suggested change
#if 0
code = metaBackupCurrentMeta(pVnode);
if (code != 0) {
metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
return code;
}
#endif
if (dmRepairHasBackupPath()) {
code = metaBackupCurrentMeta(pVnode);
if (code != 0) {
metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
return code;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +120
@@ -32,12 +36,679 @@ typedef struct {
STFileHashEntry **buckets;
} STFileHash;

typedef enum {
TSDB_REPAIR_CORE_KEEP = 0,
TSDB_REPAIR_CORE_DROP,
TSDB_REPAIR_CORE_REBUILD,
} ECoreRepairAction;

typedef struct {
int32_t fid;
bool affected;
bool dropStt;
ECoreRepairAction coreAction;
bool staged;
int32_t totalBlocks;
int32_t keptBlocks;
int32_t droppedBlocks;
char coreReason[64];
} STsdbRepairPlan;

static int32_t save_json(const cJSON *json, const char *fname);

static int32_t tsdbFSDupState(STFileSystem *fs);

static int32_t commit_edit(STFileSystem *fs);

static void tsdbRepairPlanInit(const STFileSet *fset, STsdbRepairPlan *plan) {
memset(plan, 0, sizeof(*plan));
plan->fid = fset->fid;
}

static void tsdbRepairPlanSetCore(STsdbRepairPlan *plan, ECoreRepairAction action, const char *reason) {
if (plan->coreAction == TSDB_REPAIR_CORE_DROP) {
return;
}
if (action == TSDB_REPAIR_CORE_DROP || plan->coreAction == TSDB_REPAIR_CORE_KEEP) {
plan->coreAction = action;
if (reason != NULL) {
tstrncpy(plan->coreReason, reason, sizeof(plan->coreReason));
}
}
plan->affected = plan->dropStt || (plan->coreAction != TSDB_REPAIR_CORE_KEEP);
}

static const char *gCurrentFname[] = {
[TSDB_FCURRENT] = "current.json",
[TSDB_FCURRENT_C] = "current.c.json",
[TSDB_FCURRENT_M] = "current.m.json",
};

static bool tsdbRepairListContains(const char *csv, int32_t vgId) {
if (csv == NULL || csv[0] == '\0') {
return false;
}

char buf[PATH_MAX] = {0};
tstrncpy(buf, csv, sizeof(buf));

char *savePtr = NULL;
for (char *token = strtok_r(buf, ",", &savePtr); token != NULL; token = strtok_r(NULL, ",", &savePtr)) {
if (atoi(token) == vgId) {
return true;
}
}

return false;
}

static void tsdbMarkForceRepairDone(int32_t vgId) {
char vnodeText[32] = {0};
snprintf(vnodeText, sizeof(vnodeText), "%d", vgId);

if (tsTsdbRepairDoneVnodeId[0] == '\0') {
tstrncpy(tsTsdbRepairDoneVnodeId, vnodeText, sizeof(tsTsdbRepairDoneVnodeId));
return;
}

if (tsdbRepairListContains(tsTsdbRepairDoneVnodeId, vgId)) {
return;
}

int32_t offset = (int32_t)strlen(tsTsdbRepairDoneVnodeId);
snprintf(tsTsdbRepairDoneVnodeId + offset, sizeof(tsTsdbRepairDoneVnodeId) - offset, ",%s", vnodeText);
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsTsdbRepairDoneVnodeId is a process-global mutable buffer used to track which vnodes have been repaired. Vnodes are opened concurrently (see vmOpenVnodes() spawning threads in source/dnode/mgmt/mgmt_vnode/src/vmInt.c), so reads/writes to this global (via tsdbShouldForceRepair() / tsdbMarkForceRepairDone()) can race and corrupt the CSV or cause missed/duplicate repairs. Please make this tracking thread-safe (e.g., a mutex-protected hash/set keyed by vgId, or store the flag on SVnode/STsdb instance) or otherwise ensure the open path is single-threaded in repair mode.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,352 @@
# META Force Repair Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan document includes assistant-specific meta text ("For Claude" / "superpowers:executing-plans"). Please remove or replace with repo/tool-agnostic guidance to avoid confusion for future maintainers.

Suggested change
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
> Note: This implementation plan is intended to be executed task-by-task, either manually or by automation tooling.

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 10, 2026 11:58
Introduce walShouldDeleteCorruption inline function to conditionally delete corrupted WAL files based on tsWalDeleteOnCorruption flag or dmRepairNeedWalRepair status. This replaces direct checks of tsWalDeleteOnCorruption in walLogEntriesComplete and walCheckAndRepairMeta, enabling more flexible corruption handling that considers both global configuration and vgId-specific repair requirements.
Copilot AI review requested due to automatic review settings March 10, 2026 04:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +27
static char tsTsdbRepairDoneVnodeId[PATH_MAX] = {0};

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsTsdbRepairDoneVnodeId is a file-static global that is mutated by tsdbShouldForceRepair() / tsdbMarkForceRepairDone() without synchronization. Vnodes are opened concurrently (multiple threads in vmOpenVnodes), so this is a data race and can corrupt the CSV buffer or lead to repeated/partial dispatch decisions. Please make the “already repaired” state per-vnode/per-STFileSystem (or guard shared state with a mutex/atomic).

Copilot uses AI. Check for mistakes.
Comment on lines +524 to +526
static FORCE_INLINE bool walShouldDeleteCorruption(const SWal* pWal) {
return tsWalDeleteOnCorruption || dmRepairNeedWalRepair(pWal->cfg.vgId);
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walShouldDeleteCorruption() calls dmRepairNeedWalRepair(), but in this PR that symbol is only implemented in the taosd executable (dmMain.c) and in vnodeRepair.c (vnode library). The WAL unit test (walTest) links wal without those objects, so this introduces an undefined symbol at link time for test builds (and any other binary that links wal but not vnode/mgmt). Provide weak/default dmRepair implementations from a library that wal always links (e.g., common), or otherwise ensure the symbol is available wherever wal is used.

Copilot uses AI. Check for mistakes.
Comment on lines +264 to +266
code, output = self._run_taosd_with_cfg(
self._tsdb_repair_args(vnode_id, repair_fid, extra_args="--log-output /dev/null")
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test runs taosd -r ... via _run_taosd_with_cfg() while the dnode is still running (unlike the other tests in this file which stop the dnode first). Starting a second taosd against the same config/data dir is likely to fail (port conflict / file lock / concurrent data access) and makes the test flaky. Stop tdDnodes before invoking the repair process and restart it in a finally block, consistent with test_tsdb_force_repair_dispatches_in_open_fs().

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 10, 2026 16:53
- Add `dmRepairNodeTypeIsVnode()` and `dmRepairModeIsForce()` functions to header and implementation
- Introduce `tsdbShouldForceRepair()` and `tsdbForceRepair()` function declarations for TSDB repair logic
- Refactor TSDB repair flow to conditionally apply force repair based on node type and mode
- Improve repair plan handling with detailed block validation and error reporting
- Implement `tsdbForceRepair` to handle missing or corrupted files in TSDB
- Add helper functions for detecting bad files and performing deep scans
- Include STT file validation and data part verification mechanisms
- Introduce commit change process with proper locking for thread safety
- Structure repair operations to maintain data consistency during recovery
Copilot AI review requested due to automatic review settings March 10, 2026 10:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +82 to +111
static int32_t tsdbDeepScanAndFixSttFile(STFileset *pFileSet, STFileObj *pStt) {
int32_t code = TSDB_CODE_SUCCESS;
SSttFileReader *reader;
const TSttBlkArray *sttBlkArray = NULL;

// Open
SSttFileReaderConfig config = {
// TODO

};
code = tsdbSttFileReaderOpen(pStt->fname, &config, &reader);
if (code) {
// TODO: error handle, need to delete this file
}

// read the index part
code = tsdbSttFileReadSttBlk(reader, &sttBlkArray);
if (code) {
// TODO: error handle, need to delete this file
}

// Loop to read each data part
for (int32_t i = 0; i < sttBlkArray->size; i++) {
SSttBlk *pSttBlk = ;
code = tsdbReadFile(STsdbFD * pFD, int64_t offset, uint8_t *pBuf, int64_t size, int64_t szHint,
SEncryptData *encryptData);
if (code) {
// TODO: find a bad block, need to eliminate it
}
};
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsdbRepair.c currently contains incomplete/invalid C code (e.g., wrong type STFileset, empty initializer SSttBlk *pSttBlk = ;, and a placeholder call to tsdbReadFile(...) with a function signature pasted into the call). This will not compile and needs to be completed or removed (e.g., gate unfinished code behind #if 0 and enable the implemented path).

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +167
static int32_t tsdbForceRepairFileSet(STFileSystem *pFS, STFileSet *pFileSet, TFileOpArray *opArr, bool *hasChange) {
int32_t code = TSDB_CODE_SUCCESS;

// TODO: if .head or .data is missing, just delete the data
code = tsdbForceRepairFileSetBadFiles(pFS);
if (code) {
// TODO
return code;
}

// TODO: if deep scan and fix the data, do deep scan and fix
code = tsdbForceRepairFileSetDeepScanAndFix(pFS);
if (code) {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsdbForceRepairFileSetBadFiles / tsdbForceRepairFileSetDeepScanAndFix are called with the wrong argument lists here (missing pFileSet, opArr, hasChange, etc.), so this code cannot compile and also can’t apply any edits. Please fix the function calls/signatures consistently and ensure opArr is populated with the intended file operations.

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +216
int32_t tsdbForceRepair(STFileSystem *fs) {
int32_t code = TSDB_CODE_SUCCESS;

bool hasChange = false;
TFileOpArray opArr = {0};

// Loop to force repair each file set
STFileSet *pFileSet = NULL;
TARRAY2_FOREACH(fs->fSetArr, pFileSet) {
code = tsdbForceRepairFileSet(fs, pFileSet, &hasChange);
if (code) {
tsdbError("vgId:%d %s failed to force repair file set, fid:%d since %s, code:%d", TD_VID(fs->tsdb->pVnode),
__func__, pFileSet->fid, tstrerror(code), code);
return code;
}
}

code = tsdbForceRepairCommitChange(fs, &opArr);
if (code) {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tsdbForceRepair, the loop calls tsdbForceRepairFileSet(fs, pFileSet, &hasChange) but tsdbForceRepairFileSet is declared with four parameters (pFS, pFileSet, opArr, hasChange). This is a compile-time error; please pass the correct arguments and ensure the TFileOpArray opArr is properly initialized (e.g., via the project’s TARRAY2_INIT/append helpers) before using it in tsdbFSEditBegin/commit.

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +290
int32_t tsdbForceRepair(STFileSystem *fs) {
int32_t code = TSDB_CODE_SUCCESS;

bool hasChange = false;
TFileOpArray opArr = {0};

// Loop to force repair each file set
STFileSet *pFileSet = NULL;
TARRAY2_FOREACH(fs->fSetArr, pFileSet) {
code = tsdbForceRepairFileSet(fs, pFileSet, &hasChange);
if (code) {
tsdbError("vgId:%d %s failed to force repair file set, fid:%d since %s, code:%d", TD_VID(fs->tsdb->pVnode),
__func__, pFileSet->fid, tstrerror(code), code);
return code;
}
}

code = tsdbForceRepairCommitChange(fs, &opArr);
if (code) {
// TODO: output error log
return code;
}

#if 0
int32_t code = tsdbFSDupState(fs);
if (code != 0) {
return code;
}

bool changed = false;
const STFileSet *srcFset = NULL;
TARRAY2_FOREACH(fs->fSetArr, srcFset) {
EDmRepairStrategy repairStrategy = DM_REPAIR_STRATEGY_NONE;
if (!tsdbRepairMatchTargetForFid(TD_VID(fs->tsdb->pVnode), srcFset->fid, &repairStrategy)) {
continue;
}
TAOS_UNUSED(repairStrategy);

STsdbRepairPlan plan;
code = tsdbRepairAnalyzeFileSet(fs, srcFset, &plan);
if (code != 0) {
return code;
}
if (!plan.affected) {
continue;
}

code = tsdbRepairBackupAffectedFileSet(fs, srcFset, &plan);
if (code != 0) {
return code;
}

STFileSet *dstFset = tsdbRepairFindTmpFSet(fs, srcFset->fid);
if (dstFset == NULL) {
return TSDB_CODE_FAILED;
}

if (plan.dropStt) {
tsdbRepairDropSttOnTmpFSet(dstFset);
changed = true;
}

if (plan.coreAction == TSDB_REPAIR_CORE_DROP) {
code = tsdbRepairDropCoreOnTmpFSet(fs, dstFset);
if (code != 0) return code;
changed = true;
} else if (plan.coreAction == TSDB_REPAIR_CORE_REBUILD) {
code = tsdbRepairRebuildCoreOnTmpFSet(fs, srcFset, dstFset, &plan);
if (code != 0) {
return code;
}
changed = true;
}
}

if (!changed) {
printf("tsdb force repair dispatch: vnode%d\n", TD_VID(fs->tsdb->pVnode));
fflush(stdout);
tsdbMarkForceRepairDone(TD_VID(fs->tsdb->pVnode));
return 0;
}

code = tsdbRepairCommitStagedCurrent(fs);
if (code != 0) {
return code;
}

printf("tsdb force repair dispatch: vnode%d\n", TD_VID(fs->tsdb->pVnode));
fflush(stdout);
tsdbMarkForceRepairDone(TD_VID(fs->tsdb->pVnode));
#endif
return code;
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tsdbForceRepair implementation doesn’t print the dispatch marker that the new pytest cases assert on (e.g., "tsdb force repair dispatch"). The only printf/marker emission is in the #if 0 block below, so the tests will fail even if this compiles. Please either re-enable the implemented dispatch/logging path or update the tests to assert on the actual output produced by the enabled repair flow.

Copilot uses AI. Check for mistakes.
Comment on lines +264 to +267
code, output = self._run_taosd_with_cfg(
self._tsdb_repair_args(vnode_id, repair_fid, extra_args="--log-output /dev/null")
)

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test runs taosd -r --mode force ... without stopping the existing tdDnodes taosd instance first. Launching a second taosd process against the same data/cfg directories can fail due to locks or (worse) corrupt state. Consider stopping tdDnodes (as done in other repair tests) before invoking the repair-mode taosd, then restarting in finally.

Copilot uses AI. Check for mistakes.
hzcheng added 2 commits March 10, 2026 19:18
Add deep scanning logic to tsdbRepair.c to detect and fix corrupted data blocks within brin blocks. The new tsdbDeepScanAndFixDataPart function reads brin blocks, validates data blocks, and skips corrupted entries. This enhances data integrity during repair operations by isolating and handling bad data segments without affecting the entire dataset.
- Change default TSDB repair strategy from `shallow_repair` to `drop_invalid_only`
- Rename TSDB repair strategies: `shallow_repair` → `drop_invalid_only`, `deep_repair` → `head_only_rebuild`
- Add new `full_rebuild` strategy for complete core data reconstruction
- Update documentation in both English and Chinese versions with detailed strategy descriptions
- Modify internal enum values and code references to reflect new strategy naming
- Update command examples to use new strategy names
- Add test configurations for new repair functionality

The changes provide more granular control over TSDB repair operations with clearer strategy semantics.
Copilot AI review requested due to automatic review settings March 11, 2026 08:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +58
ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)
DEP_ext_gtest(tsdbRepairTest)
TARGET_LINK_LIBRARIES(
tsdbRepairTest
PUBLIC os util common vnode
)

TARGET_INCLUDE_DIRECTORIES(
tsdbRepairTest
PUBLIC "${TD_SOURCE_DIR}/include/common"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsdbRepairTest is added unconditionally, but tsdbRepairTest.cpp includes/uses POSIX-only APIs (e.g. sys/syscall.h, mkstemp, SYS_close). This will fail to build on Windows. Suggest guarding the ADD_EXECUTABLE(tsdbRepairTest ...) block with IF(NOT TD_WINDOWS) (similar to tqTest) or providing a Windows-compatible implementation.

Suggested change
ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)
DEP_ext_gtest(tsdbRepairTest)
TARGET_LINK_LIBRARIES(
tsdbRepairTest
PUBLIC os util common vnode
)
TARGET_INCLUDE_DIRECTORIES(
tsdbRepairTest
PUBLIC "${TD_SOURCE_DIR}/include/common"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"
)
IF(NOT TD_WINDOWS)
ADD_EXECUTABLE(tsdbRepairTest tsdbRepairTest.cpp)
DEP_ext_gtest(tsdbRepairTest)
TARGET_LINK_LIBRARIES(
tsdbRepairTest
PUBLIC os util common vnode
)
TARGET_INCLUDE_DIRECTORIES(
tsdbRepairTest
PUBLIC "${TD_SOURCE_DIR}/include/common"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/inc"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../src/tsdb"
PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/../inc"
)
ENDIF()

Copilot uses AI. Check for mistakes.
hzcheng added 3 commits March 11, 2026 18:33
- Add note that default `drop_invalid_only` strategy only handles missing-file damage
- Specify that size-mismatch corruption requires explicit deep strategies (`head_only_rebuild` or `full_rebuild`)
- Update both English and Chinese documentation consistently
- Move repair-related source file from vnode to common directory for better code organization
- Introduce suite groups for metadata, core_e2e, and stt_e2e tests
- Add helper methods for running force repair operations and verifying results
- Implement test fixtures for core and STT file corruption scenarios
- Include assertions for database writability and repair log validation
- Refactor existing tests to use new fixture-based approach
- Add temporary file handling and improved error recovery mechanisms
Copilot AI review requested due to automatic review settings March 12, 2026 01:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +558 to 564
#if 0
code = metaBackupCurrentMeta(pVnode);
if (code != 0) {
metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
return code;
}
#endif
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metaBackupCurrentMeta function is defined but the call site is wrapped in #if 0 (line 558), so it's dead code. If this is intentional for a future phase, consider adding a comment explaining when it will be enabled; otherwise remove it to avoid confusion.

Suggested change
#if 0
code = metaBackupCurrentMeta(pVnode);
if (code != 0) {
metaError("vgId:%d failed to back up current meta, reason:%s", TD_VID(pVnode), tstrerror(code));
return code;
}
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +568 to +592
// Open a new meta for organization
code = metaOpenImpl(pMeta->pVnode, &pNewMeta, VNODE_META_TMP_DIR, false);
if (code) {
return code;
}

code = metaBegin(pNewMeta, META_BEGIN_HEAP_NIL);
if (code) {
return code;
}

EMetaRepairStrategy strategy = metaGetRepairStrategy(repairStrategy);
if (strategy == E_META_REPAIR_FROM_UID) {
code = metaForceRepairFromUid(pVnode, pMeta, pNewMeta);
if (code) {
metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
return code;
}
} else if (strategy == E_META_REPAIR_FROM_REDO) {
code = metaForceRepairFromRedo(pVnode, pMeta, pNewMeta);
if (code) {
metaError("vgId:%d, %s failed at %s:%d since %s", TD_VID(pVnode), __func__, __FILE__, __LINE__, tstrerror(code));
return code;
}
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In metaForceRepair, if metaOpenImpl or metaBegin fails, pNewMeta is leaked (it's opened but never closed on the error path). Similarly, if metaForceRepairFromUid or metaForceRepairFromRedo fails at lines 582-591, pNewMeta is not closed before returning.

Copilot uses AI. Check for mistakes.
}

if (code == TSDB_CODE_OPS_NOT_SUPPORT) {
return 1;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmFinalizeRepairOption returns 1 (line 788) for TSDB_CODE_OPS_NOT_SUPPORT, which is an ad-hoc non-standard error code that gets propagated from dmParseArgs. This breaks the convention where the function otherwise returns TSDB_CODE_* values. The caller in mainWindows checks code != 0 and may misinterpret this as an unrelated error. Consider returning a proper TSDB_CODE_* constant instead.

Suggested change
return 1;
return TSDB_CODE_OPS_NOT_SUPPORT;

Copilot uses AI. Check for mistakes.
Comment on lines +828 to +871
def _prepare_stt_fixture(self, total_rows=4000):
dbname = f"tsdb_repair_stt_fixture_{time.time_ns()}"
ts0 = 1700000000000
table_name = "d0"

tdSql.execute(f"drop database if exists {dbname}")
tdSql.execute(f"create database {dbname} vgroups 1 stt_trigger 1 minrows 10 maxrows 200")
tdSql.execute(f"drop table if exists {dbname}.meters")
tdSql.execute(f"create table {dbname}.meters (ts timestamp, c1 int, c2 float) tags(t1 int)")
tdSql.execute(f"create table {dbname}.{table_name} using {dbname}.meters tags(1)")

sql = f"insert into {dbname}.{table_name} values "
sql += ",".join(f"({ts0 + i}, 1, 0.1)" for i in range(100))
tdSql.execute(sql)
tdSql.execute(f"flush database {dbname}")

sql = f"insert into {dbname}.{table_name} values "
sql += ",".join(f"({ts0 + 99 + i}, 1, 0.1)" for i in range(100))
tdSql.execute(sql)
tdSql.execute(f"flush database {dbname}")

tdSql.execute(f"insert into {dbname}.{table_name} values({ts0 + 1000}, 2, 1.0)")
tdSql.execute(f"flush database {dbname}")
time.sleep(2)

tdSql.query(f"select count(*) from {dbname}.{table_name}")
tdSql.checkData(0, 0, 200)

vnode_id = self._get_vnode_id_for_db(dbname, table_name=table_name)
stt_path, stt_entries = self._wait_for_stt_file(dbname, vnode_id, timeout_sec=90)
if stt_path is None or stt_entries <= 0:
pytest.skip("real stt fixture was not materialized in time")

fid = self._parse_fid_from_tsdb_path(stt_path)
tdSql.checkEqual(fid is not None, True)
return {
"dbname": dbname,
"vnode_id": vnode_id,
"fid": fid,
"row_count": 200,
"table_name": table_name,
"stt_path": stt_path,
"stt_entries": stt_entries,
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _prepare_stt_fixture method accepts a total_rows=4000 parameter but never uses it — the actual row count is hardcoded to 200 (two batches of 100 + 1 extra row, and row_count is returned as 200). The total_rows parameter is misleading and should either be removed or actually used.

Copilot uses AI. Check for mistakes.
@hzcheng hzcheng closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants