Conversation
Walkthrough本次更新对MVCC执行、索引、日志恢复、锁管理等核心模块进行了大幅重构。主要包括:移除部分MVCC操作的封装函数,改为在执行器中手动管理记录插入、删除、更新及相关日志和索引操作;为索引和缓冲池管理器增加详细的锁和调试追踪;引入逻辑删除位图,实现软删除机制;日志管理器全面采用智能指针管理内存,相关接口返回值类型调整;并新增MVCC索引扫描执行器。部分接口和结构体的签名有调整或精简,增强了并发控制和调试能力。 Changes
Sequence Diagram(s)MVCC Delete 执行器新流程sequenceDiagram
participant Executor as MvccDeleteExecutor
participant LockMgr as LockManager
participant FH as RmFileHandle
participant TxnMgr as TransactionManager
participant LogMgr as LogManager
Executor->>LockMgr: 获取gap锁
loop 遍历待删记录
Executor->>LockMgr: get_lock_and_check_conflict
LockMgr-->>Executor: 成功/异常
Executor->>FH: get_record_with_delete_tag
alt 已删除
Executor-->>Executor: 跳过
else 未删除
Executor->>FH: delete_record_tag
Executor->>TxnMgr: add_delete_undo_log
Executor->>TxnMgr: 写集追加delete
Executor->>LogMgr: add_delete_log
end
end
MVCC Insert 执行器新流程sequenceDiagram
participant Executor as MvccInsertExecutor
participant FH as RmFileHandle
participant LockMgr as LockManager
participant SmMgr as SmManager
participant TxnMgr as TransactionManager
participant LogMgr as LogManager
Executor->>FH: insert_record
Executor->>LockMgr: 加锁
Executor->>SmMgr: insert_index
alt 索引插入失败
Executor->>FH: delete_record
Executor->>TxnMgr: abort
Executor-->>Executor: 抛异常
else
Executor->>TxnMgr: add_insert_undo_log
Executor->>TxnMgr: 写集追加insert
Executor->>LogMgr: add_insert_log
end
逻辑删除位图机制sequenceDiagram
participant FH as RmFileHandle
participant Page as Page
FH->>Page: 物理插入记录
FH->>Page: delete_bitmap置0(未删除)
FH->>Page: delete_record_tag置1(逻辑删除)
FH->>Page: delete_record物理删除(bitmap和delete_bitmap都置0)
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Join our Discord community for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Summary of Changes
Hello @Masttf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on a substantial refactoring of the Multi-Version Concurrency Control (MVCC) implementation, particularly concerning data manipulation operations and their interaction with the transaction and logging subsystems. The changes aim to improve code organization, enhance memory safety, and streamline the process of managing undo logs, laying a foundation for more robust concurrency control.
Highlights
- MVCC DML Refactoring: The core logic for handling MVCC (Multi-Version Concurrency Control) operations like insert, update, and delete has been significantly refactored. The creation and management of undo logs for these operations are now centralized within the
TransactionManagerthrough new dedicated helper functions (add_insert_undo_log,add_update_undo_log,add_delete_undo_log). This streamlines the MVCC executors by moving complex undo log logic out of them. - Code Cleanup and Simplification: Several unused or deprecated common MVCC helper functions (
ReconstructTuple,IsWriteWriteConflict,mvcc_insert_record,mvcc_delete_record,mvcc_update_record,message_out) have been removed fromexecution_common.cppandexecution_common.h. Thecheck_conflictfunction was also renamed toget_lock_and_check_conflictto better reflect its role in acquiring locks. - Improved Logging and Memory Safety: The
LogManagerhas been updated to usestd::unique_ptrforLogRecordobjects when reading logs from disk, enhancing memory safety and simplifying memory management. Additionally, the log addition functions (add_log_to_buffer,add_insert_log, etc.) now returnvoidinstead oflsn_t, implying a change in how LSNs are managed or exposed. - Transaction Manager API Simplification: The
UpdateUndoLinkandUpdateVersionLinkmethods inTransactionManagerhave been simplified by removing thecheckfunction parameters, making their interfaces cleaner.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
The pull request introduces MVCC concurrency control with index support. The changes include refactoring undo log management and integrating locking mechanisms. The code appears well-structured, but careful attention should be paid to LSN assignment and potential code duplication.
| std::vector<Value> values | ||
| ) { |
| auto rec = fh_->get_record(rid, context_); | ||
| // fh_->delete_record(rid, context_); | ||
| std::vector<Value> values = convert_record_to_values(rec, tab_.cols); | ||
| txn_mgr_->add_delete_undo_log(context_->txn_, rid, std::move(values)); | ||
| context_->txn_->append_write_record( | ||
| std::make_unique<WriteRecord>(WType::DELETE_TUPLE, tab_.name, rid, *rec) | ||
| ); | ||
| context_->log_mgr_->add_delete_log(context_->txn_->get_transaction_id(), *rec, rid, tab_.name); |
|
You are on the CodeRabbit Free Plan. In order to use the Chat feature, please upgrade to CodeRabbit Pro. |
|
@coderabbitai code review |
|
You are on the CodeRabbit Free Plan. In order to use the Chat feature, please upgrade to CodeRabbit Pro. |
|
/gemini code review |
There was a problem hiding this comment.
Code Review
The code changes introduce significant features for MVCC and concurrent index support. The implementation of the crabbing protocol for B+Tree concurrency and the refactoring of DML executors for MVCC are well-executed. The code is generally of high quality with good structure and comments. I've identified a potential miscalculation in page capacity that could lead to page overflows, and a resource leak in a newly added debugging function for the B+Tree.
| file_hdr.num_records_per_page = | ||
| (BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr)) + 1) / (1 + record_size * BITMAP_WIDTH); | ||
| (BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (1 + record_size * BITMAP_WIDTH); | ||
| file_hdr.bitmap_size = (file_hdr.num_records_per_page + BITMAP_WIDTH - 1) / BITMAP_WIDTH; |
There was a problem hiding this comment.
The formula for num_records_per_page appears to only account for the overhead of a single bitmap. However, the accompanying comment and the feature description mention the use of two bitmaps. If two bitmaps are indeed used, the current calculation underestimates the total overhead, which could lead to an incorrect number of records per page and potentially cause page overflows when the page is full. The overhead per record for bitmaps should be 2 bits, not 1.
| file_hdr.num_records_per_page = | |
| (BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr)) + 1) / (1 + record_size * BITMAP_WIDTH); | |
| (BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (1 + record_size * BITMAP_WIDTH); | |
| file_hdr.bitmap_size = (file_hdr.num_records_per_page + BITMAP_WIDTH - 1) / BITMAP_WIDTH; | |
| file_hdr.num_records_per_page = | |
| (BITMAP_WIDTH * (PAGE_SIZE - 1 - (int)sizeof(RmPageHdr) - (int)sizeof(lsn_t)) + 1) / (2 + record_size * BITMAP_WIDTH); |
| child_page->rlatch(); | ||
| page->runlatch(); | ||
| buffer_pool_manager_->unpin_page(page->get_page_id(), false); |
There was a problem hiding this comment.
The debug_print_leaf_chain function has a resource leak. The fetch_node function allocates a new IxNodeHandle on the heap and pins a page in the buffer pool. In the while loops within this function, the now pointer is reassigned without deleting the previously allocated IxNodeHandle or unpinning the page it holds. This will lead to memory leaks and will exhaust the buffer pool frames if the B+Tree is deep or has many leaf nodes, as pages will remain pinned indefinitely. To fix this, you should unpin_page and delete the IxNodeHandle object before reassigning the now pointer in each loop iteration.
| child_page->rlatch(); | |
| page->runlatch(); | |
| buffer_pool_manager_->unpin_page(page->get_page_id(), false); | |
| auto next_page = now->get_next_leaf(); | |
| buffer_pool_manager_->unpin_page(now->get_page_id(), false); | |
| delete now; | |
| now = fetch_node(next_page); |
对mvcc并发控制,支持索引并发,采用蟹行协议
在mvcc下的索引维护
因为 MVCC,必须允许重复的项在物理上存在于索引之中:这些项可能指向某个单一逻辑行的后继版本。实际想强制的行为是,任何 MVCC 快照都不能包含两个具有相同索引键的行。在向一个唯一索引中插入一个新行时需要被检查的情况可分解成:
如果一个有冲突的合法行已被当前事务删除,这是可以的(特别是因为一个 UPDATE 总是在插入新版本之前删除旧版本,这样就允许一个行上的UPDATE 不改变键)。
如果一个有冲突的行已经被还未提交的事务插入,这里做简单处理,直接abort 而不等待
类似的,如果一个有冲突的有效行被一个准备提交的事务删除,这里做简单处理,直接abort 而不等待
Summary by CodeRabbit
新功能
功能优化
性能与并发
修复与调整
其他