B+ trees#2
Merged
Merged
Conversation
…d fielddescriptors, and then additionally what should BPlusTree handle and what should RecordType handle
…tor started, need to think more on how exactly variable length records are going to be handled by slotted page. Also, currently the BTree will interface with slottedpage and diskmanager and record and fielddescriptors, perhaps later down the line BTree as a class needs to be simplified
… copy CMU's 'bustub' type system, since this project is focused on making a database not a type system
…tem, and instead I'm just going to make my own. May take a week or two, but that's okay. on the weekends of my IBM internship I'll go ahead and continue this project through the summer.
… minimum reqs for btree impl. If it is not fulfilled, perhaps I will make an arena allocator for temporary data for query processing... may take a while though...
…ensible design, done by my imagination first and then claude looking up what some contemporary methods are
…ty to deserialize and serialize records
…le/column, implemenet schema, implement btree, AND CONTINUERgit add -A! MWEHEHEHEHEHEHgit add -A!
There was a problem hiding this comment.
Pull request overview
This PR appears to be a first pass at introducing B+ tree indexing scaffolding plus a supporting “type/value” layer and a storage visualization/demo program, along with build-system updates to compile these pieces.
Changes:
- Adds an initial B+ tree API skeleton and related design notes under
index/. - Introduces multiple “Value/Type” implementations (
value.h,type/, andtype2/) plus schema/column scaffolding. - Updates CMake to build
index_logicand astorage_vizexecutable, and includes various generated build outputs.
Reviewed changes
Copilot reviewed 80 out of 93 changed files in this pull request and generated 28 comments.
Show a summary per file
| File | Description |
|---|---|
| value.h | Adds a BusTub-style Value class wrapper with type-driven operations and fmt formatters. |
| type2/value.h | Adds an alternate Value class stub for type2. |
| type2/type.h | Adds a Type interface + TypeId/CmpBool for type2. |
| type2/type.cpp | Adds (incomplete) Type registry initialization and default virtual method implementations for type2. |
| type2/integer_parent_type.h | Adds an (incomplete) integer type base helper for type2. |
| type/varchar_type.h | Adds a VarcharType implementation for the type/ type system. |
| type/value.h | Adds a simplified tagged-union Value for the type/ type system. |
| type/type.h | Adds the Type interface for the type/ type system. |
| type/type.cpp | Adds a singleton-based type registry for type/. |
| type/schema.h | Adds Schema API scaffolding and record layout notes. |
| type/schema.cpp | Adds partial Schema implementation (currently incomplete). |
| type/README.md | Notes about the simplified width-parameterized type/value approach. |
| type/numeric_type.h | Adds NumericType implementation (all integer widths). |
| type/float_type.h | Adds FloatType implementation (float/double widths). |
| type/column.h | Adds a simplified Column model for the type/ schema system. |
| type/boolean_type.h | Adds BooleanType implementation. |
| tests/storage_viz.cpp | Adds a demo executable for disk/page/record operations and visualization. |
| storage/manifest.md | Removes prior storage-layer design narrative. |
| storage/DiskManager.h | Removes an older DiskManager header. |
| storage/DiskManager.cpp | Removes an older DiskManager source stub. |
| storage/disk_manager2.h | Removes an incomplete alternative DiskManager header. |
| storage/disk_manager.cpp | Minor comment cleanup in readPage doc block. |
| scratch.txt | Adds scratch notes/prototypes for record serialization ideas. |
| plan.md | Adjusts B+ tree plan snippet (currently leaves an invalid class closing). |
| index/type_system_consideration.md | Adds notes about how the B+ tree should interact with the type system. |
| index/plan.md | Adds B+ tree planning notes. |
| index/disk_manager_consideration.md | Adds notes about buffer borrowing and DiskManager usage. |
| index/b_plus_tree.h | Adds the BPlusTree class skeleton/API. |
| index/b_plus_tree.cpp | Adds placeholder implementations for BPlusTree methods. |
| index/archive.txt | Adds archived prototype code snippets. |
| common/string_util.h | Adds a StringUtil utility header. |
| common/string_util.cpp | Adds implementations for StringUtil utilities. |
| common/README.md | Adds common-module TODO notes. |
| common/macros.h | Replaces old include guards with macros/utilities (assert/log/etc.). |
| common/exception.h | Adds an Exception type and exception categories. |
| common/column.h | Adds a BusTub-derived Column implementation (catalog-style). |
| common/column.cpp | Adds Column::ToString implementation. |
| CMakeLists.txt | Adds index_logic library and storage_viz executable; updates targets. |
| CMakeFiles/Makefile.cmake | Removes generated CMake file from repo root. |
| CMakeFiles/CMakeDirectoryInformation.cmake | Removes generated CMake file from repo root. |
| build/Testing/Temporary/LastTestsFailed.log | Removes generated test log. |
| build/Testing/Temporary/LastTest.log | Removes generated test log. |
| build/Testing/Temporary/CTestCostData.txt | Removes generated test cost data. |
| build/Makefile | Updates generated Makefile targets for new libs/executable. |
| build/CMakeFiles/TargetDirectories.txt | Updates generated target directory list. |
| build/CMakeFiles/Storage.dir/progress.make | Removes generated target progress file for old executable. |
| build/CMakeFiles/Storage.dir/link.txt | Removes generated link command file for old executable. |
| build/CMakeFiles/Storage.dir/cmake_clean.cmake | Removes generated clean script for old executable. |
| build/CMakeFiles/storage_viz.dir/tests/storage_viz.cpp.o.d | Adds generated dependency file for storage_viz. |
| build/CMakeFiles/storage_viz.dir/progress.make | Adds generated progress file for storage_viz. |
| build/CMakeFiles/storage_viz.dir/link.txt | Adds generated link command for storage_viz. |
| build/CMakeFiles/storage_viz.dir/flags.make | Adds generated compile flags for storage_viz. |
| build/CMakeFiles/storage_viz.dir/DependInfo.cmake | Adds generated dependency info for storage_viz. |
| build/CMakeFiles/storage_viz.dir/depend.make | Adds generated depend makefile for storage_viz. |
| build/CMakeFiles/storage_viz.dir/compiler_depend.ts | Adds generated compiler timestamp for storage_viz. |
| build/CMakeFiles/storage_viz.dir/compiler_depend.make | Adds generated compiler depend makefile for storage_viz. |
| build/CMakeFiles/storage_viz.dir/compiler_depend.internal | Adds generated compiler dependency internal file. |
| build/CMakeFiles/storage_viz.dir/cmake_clean.cmake | Adds generated clean script for storage_viz. |
| build/CMakeFiles/storage_viz.dir/build.make | Adds generated build rules for storage_viz. |
| build/CMakeFiles/progress.marks | Updates generated progress marks. |
| build/CMakeFiles/Makefile2 | Updates generated recursive make rules. |
| build/CMakeFiles/Makefile.cmake | Updates generated CMake makefile metadata. |
| build/CMakeFiles/index_logic.dir/progress.make | Adds generated progress file for index_logic. |
| build/CMakeFiles/index_logic.dir/link.txt | Adds generated link script for index_logic. |
| build/CMakeFiles/index_logic.dir/index/b_plus_tree.cpp.o.d | Updates generated dependency file for index_logic. |
| build/CMakeFiles/index_logic.dir/flags.make | Adds generated flags file for index_logic. |
| build/CMakeFiles/index_logic.dir/DependInfo.cmake | Updates generated dependency info for index_logic. |
| build/CMakeFiles/index_logic.dir/depend.make | Updates generated depend makefile for index_logic. |
| build/CMakeFiles/index_logic.dir/compiler_depend.ts | Updates generated compiler timestamp for index_logic. |
| build/CMakeFiles/index_logic.dir/compiler_depend.internal | Updates generated internal dependency list for index_logic. |
| build/CMakeFiles/index_logic.dir/cmake_clean.cmake | Adds generated clean script for index_logic. |
| build/CMakeFiles/index_logic.dir/cmake_clean_target.cmake | Adds generated target clean script for index_logic. |
| build/CMakeFiles/index_logic.dir/build.make | Adds generated build rules for index_logic. |
| build/CMakeFiles/3.28.3/CMakeSystem.cmake | Updates generated system info file (kernel version). |
| build/_deps/googletest-subbuild/CMakeFiles/CMakeConfigureLog.yaml | Updates generated configure log system version. |
| build/_deps/googletest-subbuild/CMakeFiles/3.28.3/CMakeSystem.cmake | Updates generated system info file in subbuild. |
| build/_deps/googletest-build/googletest/CMakeFiles/gtest.dir/progress.make | Updates generated progress numbers. |
| build/_deps/googletest-build/googletest/CMakeFiles/gtest_main.dir/progress.make | Updates generated progress numbers. |
| build/_deps/googletest-build/googlemock/CMakeFiles/gmock.dir/progress.make | Updates generated progress numbers. |
| build/_deps/googletest-build/googlemock/CMakeFiles/gmock_main.dir/progress.make | Updates generated progress numbers. |
| .vscode/c_cpp_properties.json | Adds VS Code IntelliSense configuration. |
Comments suppressed due to low confidence (1)
index/b_plus_tree.cpp:107
- This
.cppfile ends with#endifeven though it does not start with a matching#if. This will cause a preprocessor error. Remove the stray#endif(header guards should only be in headers).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+75
to
+76
| auto GetColumn() const -> Column; | ||
|
|
Comment on lines
+174
to
+190
| template <typename T> | ||
| struct fmt::formatter<T, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>> | ||
| : fmt::formatter<std::string> { | ||
| template <typename FormatCtx> | ||
| auto format(const bustub::Value &x, FormatCtx &ctx) const { | ||
| return fmt::formatter<std::string>::format(x.ToString(), ctx); | ||
| } | ||
| }; | ||
|
|
||
| template <typename T> | ||
| struct fmt::formatter<std::unique_ptr<T>, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>> | ||
| : fmt::formatter<std::string> { | ||
| template <typename FormatCtx> | ||
| auto format(const std::unique_ptr<bustub::Value> &x, FormatCtx &ctx) const { | ||
| return fmt::formatter<std::string>::format(x->ToString(), ctx); | ||
| } | ||
| }; |
Comment on lines
+173
to
+176
|
|
||
| template <typename T> | ||
| struct fmt::formatter<T, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>> | ||
| : fmt::formatter<std::string> { |
Comment on lines
+21
to
+26
| explicit Value(const TypeId type) : manage_data_(false), type_id_(type) { size_.len_ = BUSTUB_VALUE_NULL; } | ||
| Value(TypeId type, int8_t i); | ||
| // BOOLEAN and TINYINT | ||
| Value(TypeId type, int8_t i); | ||
| // DECIMAL | ||
| Value(TypeId type, double d); |
Comment on lines
+10
to
+46
| class Value{ | ||
| friend class Type; | ||
| friend class IntegerParentType; | ||
| friend class TinyintType; | ||
| friend class SmallintType; | ||
| friend class IntegerType; | ||
| friend class BigintType; | ||
| friend class DecimalType; | ||
| friend class BooleanType; | ||
| friend class VarcharType; | ||
|
|
||
| explicit Value(const TypeId type) : manage_data_(false), type_id_(type) { size_.len_ = BUSTUB_VALUE_NULL; } | ||
| Value(TypeId type, int8_t i); | ||
| // BOOLEAN and TINYINT | ||
| Value(TypeId type, int8_t i); | ||
| // DECIMAL | ||
| Value(TypeId type, double d); | ||
| Value(TypeId type, float f); | ||
| // SMALLINT | ||
| Value(TypeId type, int16_t i); | ||
| // INTEGER | ||
| Value(TypeId type, int32_t i); | ||
| // BIGINT | ||
| Value(TypeId type, int64_t i); | ||
| // TIMESTAMP | ||
| Value(TypeId type, uint64_t i); | ||
| // VARCHAR | ||
| Value(TypeId type, const char *data, uint32_t len, bool manage_data); | ||
| Value(TypeId type, const std::string &data); | ||
|
|
||
| Value() : Value(TypeId::INVALID) {} | ||
| Value(const Value &other); | ||
| auto operator=(Value other) -> Value &; | ||
| ~Value(); | ||
|
|
||
| inline auto GetTypeId() const -> TypeId { return type_id_; } | ||
|
|
Comment on lines
+42
to
+46
| #define ENSURE(expr, message) \ | ||
| if(!(expr)){ \ | ||
| std::cerr << "ERROR: " << (message) << std::endl; \ | ||
| std::terminate(); \ | ||
| } |
Comment on lines
+42
to
+53
| class Exception: public std::runtime_error{ | ||
| public: | ||
| explicit Exception(const std::string &message, bool print = True): std::runtime_error(message), type(ExceptionType::INVALID) { | ||
| #ifndef NDEBUG | ||
| if(print){ | ||
| std::string exception_message = "Message:: " + message + "\n"; | ||
| std::cerr << exception_message; | ||
| } | ||
| #endif | ||
| } | ||
| auto GetType() const -> ExceptionType { return type_; } | ||
| static auto ExceptionTypeToString(ExceptionType type) -> std::string { |
Comment on lines
+39
to
+65
| class BPlusTree { | ||
| //records are raw uint8_t* and interpreted via "Schema" interface | ||
| //uses sibling pointers + strict min-key | ||
| //will implement latch crabbing for concurrency | ||
| //database overall is IoT (Index Organized Tables) | ||
| //records stored as uint8_t*, interpreted via "schema" interface | ||
| bool insert(uint8_t* record); | ||
| bool remove(uint8_t* record); | ||
| uint8_t* get(Key target); | ||
| std::vector<uint8_t*> scan(Key start, Key end); | ||
| // range scan — returns all values where key is in [start, end] | ||
| private: | ||
| void splitChild(page_id_t parent_node, Key child); | ||
| //take child, split into two. | ||
| //remember to add/update key stuff to parent node (strict min-key) | ||
| void merge(page_id_t parent_node, Key left_child); //could also input right child | ||
| //take nodes left_child and left_child+1=right_child, and put keys into left_child. destroy right_child | ||
| //remember to delete right_child key, shouldn't affect left_child key(strict min-key) | ||
| void redistribute(page_id_t parent_node, Key child); | ||
| //take stuff in child, give to siblings (sibling pointers!) | ||
| //remember to update parent keys (strict min-key) | ||
| DiskManager* disk_manager_; | ||
| // uint16_t primary_key_index; | ||
| uint32_t root_page_id; | ||
| uint32_t schema_page_id; | ||
| Schema schema; | ||
| }; |
Comment on lines
+34
to
+64
| bool BPlusTree::insert(uint8_t* record){ | ||
| //find correct leaf page and slot_id_x | ||
| //get page via disk manager | ||
| //read page as slottedpage | ||
| //if enough space to insert | ||
| //insert record at slot_id_x - you have to shift the rest of slots through memmove (cheap) | ||
| //else | ||
| //split the current page, giving you a new page_id | ||
| //find if you should insert at this page or the new page, and then insert!! | ||
| //update keys in ancestral line via BTStack | ||
| } | ||
| bool BPlusTree::remove(uint8_t* record){ | ||
| //find correct leaf page and slot_id_x | ||
| //get page via disk manager | ||
| //read page as slottedpage | ||
| //delete slot_id_x | ||
| //if this + sibling (via sibling pointer) can fit in one page, merge | ||
| //not so sure if this is a good idea?? | ||
| //update keys in ancestral line via BTStack | ||
| } | ||
| uint8_t* BPlusTree::get(Key target){ | ||
| //find correct leaf page and slot_id_x | ||
| //get page via disk manager | ||
| //read page as slottedpage | ||
| //return slot_id_x's record as uint8_t* | ||
| } | ||
| std::vector<uint8_t*> BPlusTree::scan(Key start, Key end){ | ||
| //find correct start page and record | ||
| //find correct end page and record | ||
| //iterate from start to end via sibling pointers, scanning uint8_t* into vector | ||
| } |
Comment on lines
38
to
41
| void splitChild(page_id_t parent_id, int child_index); | ||
| void mergeOrRedistribute(page_id_t node_id); // called on underflow after delete | ||
| }; | ||
| } | ||
| The split and merge logic is where you'll spend most of your time. scan is important to implement here because it validates that your leaf-level sibling pointers are correct — a common place for bugs to hide. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TEST!!!