Skip to content

feat:在mvcc并发控制下的索引实现#22

Merged
Masttf merged 44 commits into
mainfrom
masttf
Jun 28, 2025
Merged

feat:在mvcc并发控制下的索引实现#22
Masttf merged 44 commits into
mainfrom
masttf

Conversation

@Masttf
Copy link
Copy Markdown
Contributor

@Masttf Masttf commented Jun 19, 2025

对mvcc并发控制,支持索引并发,采用蟹行协议
在mvcc下的索引维护

  • update = delete + insert 每次都插入新的键值对
  • delete 导致的索引唯一性检查,采用溢出页解决插入重复索引问题,添加可见性检查来检查插入是否成功

因为 MVCC,必须允许重复的项在物理上存在于索引之中:这些项可能指向某个单一逻辑行的后继版本。实际想强制的行为是,任何 MVCC 快照都不能包含两个具有相同索引键的行。在向一个唯一索引中插入一个新行时需要被检查的情况可分解成:

如果一个有冲突的合法行已被当前事务删除,这是可以的(特别是因为一个 UPDATE 总是在插入新版本之前删除旧版本,这样就允许一个行上的UPDATE 不改变键)。

如果一个有冲突的行已经被还未提交的事务插入,这里做简单处理,直接abort 而不等待

类似的,如果一个有冲突的有效行被一个准备提交的事务删除,这里做简单处理,直接abort 而不等待

Summary by CodeRabbit

  • 新功能

    • 引入逻辑删除机制,支持通过删除位图标记记录为已删除,无需物理移除。
    • 新增 MVCC 索引扫描执行器,提升多版本并发控制下的索引查询能力。
    • 添加调试专用共享互斥锁类和 B+树结构调试打印功能,便于并发与索引调试。
    • 新增批量生成 SQL 脚本的工具脚本,便于测试大规模数据和索引。
    • 新增系统管理器接口支持索引条目的插入与删除,增强索引维护能力。
  • 功能优化

    • DML 操作(插入、更新、删除)全面切换为 MVCC 执行器,提升并发一致性。
    • 支持记录逻辑删除后索引的正确维护与查询。
    • 日志管理器优化,采用智能指针自动管理内存,提升稳定性。
    • 增强事务提交/回滚时对 gap lock 的解锁能力,保证锁资源释放完整。
    • 优化索引管理,区分关闭索引与删除索引操作。
    • 优化 MVCC 删除、插入、更新执行流程,细化锁管理与日志记录。
    • 精简和重构部分 MVCC 相关函数接口,提升可维护性。
    • 移除部分 MVCC 记录操作函数,改为显式步骤控制。
  • 性能与并发

    • B+树索引操作全面增加锁管理与并发控制,提升数据一致性与并发安全。
    • 页缓冲区管理器增加锁重置与详细日志,提升调试与并发处理能力。
    • 页面类新增读写锁支持,增强并发访问控制。
    • 索引扫描改进页锁定与释放机制,避免资源泄漏。
  • 修复与调整

    • 修正数据页结构与位图计算,兼容双位图(主位图+删除位图)。
    • 精简和重构事务管理中版本链更新接口,去除冗余参数。
    • 修正恢复管理器使用智能指针管理日志记录,提升安全性。
    • 优化索引扫描迭代器的页锁管理,防止死锁和资源泄漏。
    • 调整锁管理器返回值处理,简化代码。
  • 其他

    • 增加详细的调试与追踪日志,便于定位并发与索引相关问题。
    • 优化 SQL 执行流程,支持索引的创建与删除命令。
    • 移除部分无用代码和注释,提升代码整洁度。

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 19, 2025

Walkthrough

本次更新对MVCC执行、索引、日志恢复、锁管理等核心模块进行了大幅重构。主要包括:移除部分MVCC操作的封装函数,改为在执行器中手动管理记录插入、删除、更新及相关日志和索引操作;为索引和缓冲池管理器增加详细的锁和调试追踪;引入逻辑删除位图,实现软删除机制;日志管理器全面采用智能指针管理内存,相关接口返回值类型调整;并新增MVCC索引扫描执行器。部分接口和结构体的签名有调整或精简,增强了并发控制和调试能力。

Changes

文件/模块 变更简述
src/execution/execution_common.{cpp,h} 移除MVCC相关操作函数及消息输出函数,重构冲突检测为异常抛出式,调整部分函数签名。
src/execution/executor_mvcc_{delete,insert,update}.h MVCC执行器不再调用封装函数,改为手动管理记录、索引、undo log、日志等操作流程。
src/execution/executor_mvcc_index_scan.h 新增MVCC索引扫描执行器,支持MVCC可见性检查与索引范围扫描。
src/execution/executor_index_scan.h 扫描迭代器类型调整,增加unlatch操作。
src/execution/executor_seq_scan.h 移除冗余空行,无功能变化。
src/execution/execution_manager.cpp 启用索引创建/删除命令的实际执行。
src/portal.h DML执行器统一使用MVCC版本,Scan执行器选择逻辑调整。
src/record/rm_file_handle.{cpp,h} 引入delete_bitmap逻辑删除,相关接口增加软删除支持。
src/record/rm_manager.h 页布局注释与记录数/位图计算公式修正,适配双bitmap结构。
src/storage/buffer_pool_manager.cpp 增加TRACE_FUNCTION追踪,update_page增加latch重置。
src/storage/page.h Page类增加读写锁及相关操作方法,支持锁重置。
src/common/debug_shared_mutex.h 新增调试用共享互斥锁类DebugSharedMutex,支持详细锁状态追踪。
src/index/ix_index_handle.{cpp,h} 索引操作全面加锁,流程追踪,增加调试打印及辅助方法,find_leaf_page等接口签名调整。
src/index/ix_scan.{cpp,h} 扫描迭代器增加now指针管理,支持显式unlatch和页面锁管理。
src/index/ix_manager.h drop_index重命名为close_index_without_flush。
src/recovery/log_manager.{cpp,h} 日志管理器接口返回值由lsn_t改为void,日志读取改用unique_ptr,去除全局lsn成员。
src/recovery/log_recovery.cpp 恢复流程适配智能指针,checkpoint逻辑完善。
src/system/sm_manager.{cpp,h} 新增insert_index、insert_index_without_rollback、delete_index接口,create_index/drop_index逻辑调整。
src/transaction/transaction_manager.{cpp,h} undo/version link更新接口简化,增加add_insert/update/delete_undo_log接口,回滚时完善索引维护。
src/transaction/transaction.h UndoLog结构注释优化,移除未用成员,Transaction::clear增加gap lock清理。
src/transaction/concurrency/lock_manager.cpp get_gap_condition返回方式优化,移除std::move。
index_test/index_gen1.py 新增SQL生成脚本,批量生成插入/查询/建索引SQL。
index_test/index_gen2.py 新增SQL生成脚本,支持包含float字段的唯一索引建表、插入、查询SQL生成。

Sequence Diagram(s)

MVCC Delete 执行器新流程

sequenceDiagram
    participant Executor as MvccDeleteExecutor
    participant LockMgr as LockManager
    participant FH as RmFileHandle
    participant TxnMgr as TransactionManager
    participant LogMgr as LogManager

    Executor->>LockMgr: 获取gap锁
    loop 遍历待删记录
        Executor->>LockMgr: get_lock_and_check_conflict
        LockMgr-->>Executor: 成功/异常
        Executor->>FH: get_record_with_delete_tag
        alt 已删除
            Executor-->>Executor: 跳过
        else 未删除
            Executor->>FH: delete_record_tag
            Executor->>TxnMgr: add_delete_undo_log
            Executor->>TxnMgr: 写集追加delete
            Executor->>LogMgr: add_delete_log
        end
    end
Loading

MVCC Insert 执行器新流程

sequenceDiagram
    participant Executor as MvccInsertExecutor
    participant FH as RmFileHandle
    participant LockMgr as LockManager
    participant SmMgr as SmManager
    participant TxnMgr as TransactionManager
    participant LogMgr as LogManager

    Executor->>FH: insert_record
    Executor->>LockMgr: 加锁
    Executor->>SmMgr: insert_index
    alt 索引插入失败
        Executor->>FH: delete_record
        Executor->>TxnMgr: abort
        Executor-->>Executor: 抛异常
    else
        Executor->>TxnMgr: add_insert_undo_log
        Executor->>TxnMgr: 写集追加insert
        Executor->>LogMgr: add_insert_log
    end
Loading

逻辑删除位图机制

sequenceDiagram
    participant FH as RmFileHandle
    participant Page as Page

    FH->>Page: 物理插入记录
    FH->>Page: delete_bitmap置0(未删除)
    FH->>Page: delete_record_tag置1(逻辑删除)
    FH->>Page: delete_record物理删除(bitmap和delete_bitmap都置0)
Loading

Poem

🌱

锁影斑斓调试忙,
位图双映软删除,
索引加锁步步稳,
日志智能指针护。
MVCC新流程,数据更安全,
代码如诗,协作如画,
共赏数据库进化花!

🔒🌳📝


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Free

📥 Commits

Reviewing files that changed from the base of the PR and between 5b0fb1a and 3dbd892.

📒 Files selected for processing (1)
  • src/storage/buffer_pool_manager.cpp (15 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/storage/buffer_pool_manager.cpp

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Join our Discord community for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Masttf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on a substantial refactoring of the Multi-Version Concurrency Control (MVCC) implementation, particularly concerning data manipulation operations and their interaction with the transaction and logging subsystems. The changes aim to improve code organization, enhance memory safety, and streamline the process of managing undo logs, laying a foundation for more robust concurrency control.

Highlights

  • MVCC DML Refactoring: The core logic for handling MVCC (Multi-Version Concurrency Control) operations like insert, update, and delete has been significantly refactored. The creation and management of undo logs for these operations are now centralized within the TransactionManager through new dedicated helper functions (add_insert_undo_log, add_update_undo_log, add_delete_undo_log). This streamlines the MVCC executors by moving complex undo log logic out of them.
  • Code Cleanup and Simplification: Several unused or deprecated common MVCC helper functions (ReconstructTuple, IsWriteWriteConflict, mvcc_insert_record, mvcc_delete_record, mvcc_update_record, message_out) have been removed from execution_common.cpp and execution_common.h. The check_conflict function was also renamed to get_lock_and_check_conflict to better reflect its role in acquiring locks.
  • Improved Logging and Memory Safety: The LogManager has been updated to use std::unique_ptr for LogRecord objects when reading logs from disk, enhancing memory safety and simplifying memory management. Additionally, the log addition functions (add_log_to_buffer, add_insert_log, etc.) now return void instead of lsn_t, implying a change in how LSNs are managed or exposed.
  • Transaction Manager API Simplification: The UpdateUndoLink and UpdateVersionLink methods in TransactionManager have been simplified by removing the check function parameters, making their interfaces cleaner.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces MVCC concurrency control with index support. The changes include refactoring undo log management and integrating locking mechanisms. The code appears well-structured, but careful attention should be paid to LSN assignment and potential code duplication.

Comment thread src/recovery/log_manager.cpp
Comment thread src/transaction/transaction_manager.cpp Outdated
Comment on lines +400 to +401
std::vector<Value> values
) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding input validation to ensure values is not empty before moving it, to prevent potential issues if the vector is unexpectedly empty.

Comment on lines +64 to +71
auto rec = fh_->get_record(rid, context_);
// fh_->delete_record(rid, context_);
std::vector<Value> values = convert_record_to_values(rec, tab_.cols);
txn_mgr_->add_delete_undo_log(context_->txn_, rid, std::move(values));
context_->txn_->append_write_record(
std::make_unique<WriteRecord>(WType::DELETE_TUPLE, tab_.name, rid, *rec)
);
context_->log_mgr_->add_delete_log(context_->txn_->get_transaction_id(), *rec, rid, tab_.name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider extracting the common logging and write record appending logic into a separate helper function to reduce code duplication across different executors.

@Masttf
Copy link
Copy Markdown
Contributor Author

Masttf commented Jun 25, 2025

@coderabbitai

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 25, 2025

You are on the CodeRabbit Free Plan. In order to use the Chat feature, please upgrade to CodeRabbit Pro.

@Masttf
Copy link
Copy Markdown
Contributor Author

Masttf commented Jun 25, 2025

@coderabbitai code review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 25, 2025

You are on the CodeRabbit Free Plan. In order to use the Chat feature, please upgrade to CodeRabbit Pro.

@Masttf
Copy link
Copy Markdown
Contributor Author

Masttf commented Jun 25, 2025

/gemini code review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce significant features for MVCC and concurrent index support. The implementation of the crabbing protocol for B+Tree concurrency and the refactoring of DML executors for MVCC are well-executed. The code is generally of high quality with good structure and comments. I've identified a potential miscalculation in page capacity that could lead to page overflows, and a resource leak in a newly added debugging function for the B+Tree.

Comment thread src/record/rm_manager.h
Comment on lines 84 to 86
file_hdr.num_records_per_page =
(BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr)) + 1) / (1 + record_size * BITMAP_WIDTH);
(BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (1 + record_size * BITMAP_WIDTH);
file_hdr.bitmap_size = (file_hdr.num_records_per_page + BITMAP_WIDTH - 1) / BITMAP_WIDTH;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The formula for num_records_per_page appears to only account for the overhead of a single bitmap. However, the accompanying comment and the feature description mention the use of two bitmaps. If two bitmaps are indeed used, the current calculation underestimates the total overhead, which could lead to an incorrect number of records per page and potentially cause page overflows when the page is full. The overhead per record for bitmaps should be 2 bits, not 1.

Suggested change
file_hdr.num_records_per_page =
(BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr)) + 1) / (1 + record_size * BITMAP_WIDTH);
(BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (1 + record_size * BITMAP_WIDTH);
file_hdr.bitmap_size = (file_hdr.num_records_per_page + BITMAP_WIDTH - 1) / BITMAP_WIDTH;
file_hdr.num_records_per_page =
(BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (2 + record_size * BITMAP_WIDTH);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用单个位图

Comment on lines +290 to +292
child_page->rlatch();
page->runlatch();
buffer_pool_manager_->unpin_page(page->get_page_id(), false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The debug_print_leaf_chain function has a resource leak. The fetch_node function allocates a new IxNodeHandle on the heap and pins a page in the buffer pool. In the while loops within this function, the now pointer is reassigned without deleting the previously allocated IxNodeHandle or unpinning the page it holds. This will lead to memory leaks and will exhaust the buffer pool frames if the B+Tree is deep or has many leaf nodes, as pages will remain pinned indefinitely. To fix this, you should unpin_page and delete the IxNodeHandle object before reassigning the now pointer in each loop iteration.

Suggested change
child_page->rlatch();
page->runlatch();
buffer_pool_manager_->unpin_page(page->get_page_id(), false);
auto next_page = now->get_next_leaf();
buffer_pool_manager_->unpin_page(now->get_page_id(), false);
delete now;
now = fetch_node(next_page);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug函数无所谓,内部调试使用

@Masttf Masttf self-assigned this Jun 26, 2025
@Masttf Masttf marked this pull request as ready for review June 28, 2025 15:26
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Masttf! 👋

Your private repo does not have access to Sourcery.

Please upgrade to continue using Sourcery ✨

@Masttf Masttf merged commit 133b11b into main Jun 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant