Skip to content

B+ trees#2

Merged
IsaacBurns7 merged 17 commits into
mainfrom
B+_Trees
May 21, 2026
Merged

B+ trees#2
IsaacBurns7 merged 17 commits into
mainfrom
B+_Trees

Conversation

@IsaacBurns7

Copy link
Copy Markdown
Owner

TEST!!!

IsaacBurns7 and others added 17 commits April 22, 2026 21:43
…d fielddescriptors, and then additionally what should BPlusTree handle and what should RecordType handle
…tor started, need to think more on how exactly variable length records are going to be handled by slotted page. Also, currently the BTree will interface with slottedpage and diskmanager and record and fielddescriptors, perhaps later down the line BTree as a class needs to be simplified
… copy CMU's 'bustub' type system, since this project is focused on making a database not a type system
…tem, and instead I'm just going to make my own. May take a week or two, but that's okay. on the weekends of my IBM internship I'll go ahead and continue this project through the summer.
… minimum reqs for btree impl. If it is not fulfilled, perhaps I will make an arena allocator for temporary data for query processing... may take a while though...
…ensible design, done by my imagination first and then claude looking up what some contemporary methods are
…le/column, implemenet schema, implement btree, AND CONTINUERgit add -A! MWEHEHEHEHEHEHgit add -A!
Copilot AI review requested due to automatic review settings May 21, 2026 23:54
@IsaacBurns7 IsaacBurns7 merged commit 8e39759 into main May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to be a first pass at introducing B+ tree indexing scaffolding plus a supporting “type/value” layer and a storage visualization/demo program, along with build-system updates to compile these pieces.

Changes:

  • Adds an initial B+ tree API skeleton and related design notes under index/.
  • Introduces multiple “Value/Type” implementations (value.h, type/, and type2/) plus schema/column scaffolding.
  • Updates CMake to build index_logic and a storage_viz executable, and includes various generated build outputs.

Reviewed changes

Copilot reviewed 80 out of 93 changed files in this pull request and generated 28 comments.

Show a summary per file
File Description
value.h Adds a BusTub-style Value class wrapper with type-driven operations and fmt formatters.
type2/value.h Adds an alternate Value class stub for type2.
type2/type.h Adds a Type interface + TypeId/CmpBool for type2.
type2/type.cpp Adds (incomplete) Type registry initialization and default virtual method implementations for type2.
type2/integer_parent_type.h Adds an (incomplete) integer type base helper for type2.
type/varchar_type.h Adds a VarcharType implementation for the type/ type system.
type/value.h Adds a simplified tagged-union Value for the type/ type system.
type/type.h Adds the Type interface for the type/ type system.
type/type.cpp Adds a singleton-based type registry for type/.
type/schema.h Adds Schema API scaffolding and record layout notes.
type/schema.cpp Adds partial Schema implementation (currently incomplete).
type/README.md Notes about the simplified width-parameterized type/value approach.
type/numeric_type.h Adds NumericType implementation (all integer widths).
type/float_type.h Adds FloatType implementation (float/double widths).
type/column.h Adds a simplified Column model for the type/ schema system.
type/boolean_type.h Adds BooleanType implementation.
tests/storage_viz.cpp Adds a demo executable for disk/page/record operations and visualization.
storage/manifest.md Removes prior storage-layer design narrative.
storage/DiskManager.h Removes an older DiskManager header.
storage/DiskManager.cpp Removes an older DiskManager source stub.
storage/disk_manager2.h Removes an incomplete alternative DiskManager header.
storage/disk_manager.cpp Minor comment cleanup in readPage doc block.
scratch.txt Adds scratch notes/prototypes for record serialization ideas.
plan.md Adjusts B+ tree plan snippet (currently leaves an invalid class closing).
index/type_system_consideration.md Adds notes about how the B+ tree should interact with the type system.
index/plan.md Adds B+ tree planning notes.
index/disk_manager_consideration.md Adds notes about buffer borrowing and DiskManager usage.
index/b_plus_tree.h Adds the BPlusTree class skeleton/API.
index/b_plus_tree.cpp Adds placeholder implementations for BPlusTree methods.
index/archive.txt Adds archived prototype code snippets.
common/string_util.h Adds a StringUtil utility header.
common/string_util.cpp Adds implementations for StringUtil utilities.
common/README.md Adds common-module TODO notes.
common/macros.h Replaces old include guards with macros/utilities (assert/log/etc.).
common/exception.h Adds an Exception type and exception categories.
common/column.h Adds a BusTub-derived Column implementation (catalog-style).
common/column.cpp Adds Column::ToString implementation.
CMakeLists.txt Adds index_logic library and storage_viz executable; updates targets.
CMakeFiles/Makefile.cmake Removes generated CMake file from repo root.
CMakeFiles/CMakeDirectoryInformation.cmake Removes generated CMake file from repo root.
build/Testing/Temporary/LastTestsFailed.log Removes generated test log.
build/Testing/Temporary/LastTest.log Removes generated test log.
build/Testing/Temporary/CTestCostData.txt Removes generated test cost data.
build/Makefile Updates generated Makefile targets for new libs/executable.
build/CMakeFiles/TargetDirectories.txt Updates generated target directory list.
build/CMakeFiles/Storage.dir/progress.make Removes generated target progress file for old executable.
build/CMakeFiles/Storage.dir/link.txt Removes generated link command file for old executable.
build/CMakeFiles/Storage.dir/cmake_clean.cmake Removes generated clean script for old executable.
build/CMakeFiles/storage_viz.dir/tests/storage_viz.cpp.o.d Adds generated dependency file for storage_viz.
build/CMakeFiles/storage_viz.dir/progress.make Adds generated progress file for storage_viz.
build/CMakeFiles/storage_viz.dir/link.txt Adds generated link command for storage_viz.
build/CMakeFiles/storage_viz.dir/flags.make Adds generated compile flags for storage_viz.
build/CMakeFiles/storage_viz.dir/DependInfo.cmake Adds generated dependency info for storage_viz.
build/CMakeFiles/storage_viz.dir/depend.make Adds generated depend makefile for storage_viz.
build/CMakeFiles/storage_viz.dir/compiler_depend.ts Adds generated compiler timestamp for storage_viz.
build/CMakeFiles/storage_viz.dir/compiler_depend.make Adds generated compiler depend makefile for storage_viz.
build/CMakeFiles/storage_viz.dir/compiler_depend.internal Adds generated compiler dependency internal file.
build/CMakeFiles/storage_viz.dir/cmake_clean.cmake Adds generated clean script for storage_viz.
build/CMakeFiles/storage_viz.dir/build.make Adds generated build rules for storage_viz.
build/CMakeFiles/progress.marks Updates generated progress marks.
build/CMakeFiles/Makefile2 Updates generated recursive make rules.
build/CMakeFiles/Makefile.cmake Updates generated CMake makefile metadata.
build/CMakeFiles/index_logic.dir/progress.make Adds generated progress file for index_logic.
build/CMakeFiles/index_logic.dir/link.txt Adds generated link script for index_logic.
build/CMakeFiles/index_logic.dir/index/b_plus_tree.cpp.o.d Updates generated dependency file for index_logic.
build/CMakeFiles/index_logic.dir/flags.make Adds generated flags file for index_logic.
build/CMakeFiles/index_logic.dir/DependInfo.cmake Updates generated dependency info for index_logic.
build/CMakeFiles/index_logic.dir/depend.make Updates generated depend makefile for index_logic.
build/CMakeFiles/index_logic.dir/compiler_depend.ts Updates generated compiler timestamp for index_logic.
build/CMakeFiles/index_logic.dir/compiler_depend.internal Updates generated internal dependency list for index_logic.
build/CMakeFiles/index_logic.dir/cmake_clean.cmake Adds generated clean script for index_logic.
build/CMakeFiles/index_logic.dir/cmake_clean_target.cmake Adds generated target clean script for index_logic.
build/CMakeFiles/index_logic.dir/build.make Adds generated build rules for index_logic.
build/CMakeFiles/3.28.3/CMakeSystem.cmake Updates generated system info file (kernel version).
build/_deps/googletest-subbuild/CMakeFiles/CMakeConfigureLog.yaml Updates generated configure log system version.
build/_deps/googletest-subbuild/CMakeFiles/3.28.3/CMakeSystem.cmake Updates generated system info file in subbuild.
build/_deps/googletest-build/googletest/CMakeFiles/gtest.dir/progress.make Updates generated progress numbers.
build/_deps/googletest-build/googletest/CMakeFiles/gtest_main.dir/progress.make Updates generated progress numbers.
build/_deps/googletest-build/googlemock/CMakeFiles/gmock.dir/progress.make Updates generated progress numbers.
build/_deps/googletest-build/googlemock/CMakeFiles/gmock_main.dir/progress.make Updates generated progress numbers.
.vscode/c_cpp_properties.json Adds VS Code IntelliSense configuration.
Comments suppressed due to low confidence (1)

index/b_plus_tree.cpp:107

  • This .cpp file ends with #endif even though it does not start with a matching #if. This will cause a preprocessor error. Remove the stray #endif (header guards should only be in headers).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread value.h
Comment on lines +75 to +76
auto GetColumn() const -> Column;

Comment thread value.h
Comment on lines +174 to +190
template <typename T>
struct fmt::formatter<T, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>>
: fmt::formatter<std::string> {
template <typename FormatCtx>
auto format(const bustub::Value &x, FormatCtx &ctx) const {
return fmt::formatter<std::string>::format(x.ToString(), ctx);
}
};

template <typename T>
struct fmt::formatter<std::unique_ptr<T>, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>>
: fmt::formatter<std::string> {
template <typename FormatCtx>
auto format(const std::unique_ptr<bustub::Value> &x, FormatCtx &ctx) const {
return fmt::formatter<std::string>::format(x->ToString(), ctx);
}
};
Comment thread value.h
Comment on lines +173 to +176

template <typename T>
struct fmt::formatter<T, std::enable_if_t<std::is_base_of<bustub::Value, T>::value, char>>
: fmt::formatter<std::string> {
Comment thread type2/value.h
Comment on lines +21 to +26
explicit Value(const TypeId type) : manage_data_(false), type_id_(type) { size_.len_ = BUSTUB_VALUE_NULL; }
Value(TypeId type, int8_t i);
// BOOLEAN and TINYINT
Value(TypeId type, int8_t i);
// DECIMAL
Value(TypeId type, double d);
Comment thread type2/value.h
Comment on lines +10 to +46
class Value{
friend class Type;
friend class IntegerParentType;
friend class TinyintType;
friend class SmallintType;
friend class IntegerType;
friend class BigintType;
friend class DecimalType;
friend class BooleanType;
friend class VarcharType;

explicit Value(const TypeId type) : manage_data_(false), type_id_(type) { size_.len_ = BUSTUB_VALUE_NULL; }
Value(TypeId type, int8_t i);
// BOOLEAN and TINYINT
Value(TypeId type, int8_t i);
// DECIMAL
Value(TypeId type, double d);
Value(TypeId type, float f);
// SMALLINT
Value(TypeId type, int16_t i);
// INTEGER
Value(TypeId type, int32_t i);
// BIGINT
Value(TypeId type, int64_t i);
// TIMESTAMP
Value(TypeId type, uint64_t i);
// VARCHAR
Value(TypeId type, const char *data, uint32_t len, bool manage_data);
Value(TypeId type, const std::string &data);

Value() : Value(TypeId::INVALID) {}
Value(const Value &other);
auto operator=(Value other) -> Value &;
~Value();

inline auto GetTypeId() const -> TypeId { return type_id_; }

Comment thread common/macros.h
Comment on lines +42 to +46
#define ENSURE(expr, message) \
if(!(expr)){ \
std::cerr << "ERROR: " << (message) << std::endl; \
std::terminate(); \
}
Comment thread common/exception.h
Comment on lines +42 to +53
class Exception: public std::runtime_error{
public:
explicit Exception(const std::string &message, bool print = True): std::runtime_error(message), type(ExceptionType::INVALID) {
#ifndef NDEBUG
if(print){
std::string exception_message = "Message:: " + message + "\n";
std::cerr << exception_message;
}
#endif
}
auto GetType() const -> ExceptionType { return type_; }
static auto ExceptionTypeToString(ExceptionType type) -> std::string {
Comment thread index/b_plus_tree.h
Comment on lines +39 to +65
class BPlusTree {
//records are raw uint8_t* and interpreted via "Schema" interface
//uses sibling pointers + strict min-key
//will implement latch crabbing for concurrency
//database overall is IoT (Index Organized Tables)
//records stored as uint8_t*, interpreted via "schema" interface
bool insert(uint8_t* record);
bool remove(uint8_t* record);
uint8_t* get(Key target);
std::vector<uint8_t*> scan(Key start, Key end);
// range scan — returns all values where key is in [start, end]
private:
void splitChild(page_id_t parent_node, Key child);
//take child, split into two.
//remember to add/update key stuff to parent node (strict min-key)
void merge(page_id_t parent_node, Key left_child); //could also input right child
//take nodes left_child and left_child+1=right_child, and put keys into left_child. destroy right_child
//remember to delete right_child key, shouldn't affect left_child key(strict min-key)
void redistribute(page_id_t parent_node, Key child);
//take stuff in child, give to siblings (sibling pointers!)
//remember to update parent keys (strict min-key)
DiskManager* disk_manager_;
// uint16_t primary_key_index;
uint32_t root_page_id;
uint32_t schema_page_id;
Schema schema;
};
Comment thread index/b_plus_tree.cpp
Comment on lines +34 to +64
bool BPlusTree::insert(uint8_t* record){
//find correct leaf page and slot_id_x
//get page via disk manager
//read page as slottedpage
//if enough space to insert
//insert record at slot_id_x - you have to shift the rest of slots through memmove (cheap)
//else
//split the current page, giving you a new page_id
//find if you should insert at this page or the new page, and then insert!!
//update keys in ancestral line via BTStack
}
bool BPlusTree::remove(uint8_t* record){
//find correct leaf page and slot_id_x
//get page via disk manager
//read page as slottedpage
//delete slot_id_x
//if this + sibling (via sibling pointer) can fit in one page, merge
//not so sure if this is a good idea??
//update keys in ancestral line via BTStack
}
uint8_t* BPlusTree::get(Key target){
//find correct leaf page and slot_id_x
//get page via disk manager
//read page as slottedpage
//return slot_id_x's record as uint8_t*
}
std::vector<uint8_t*> BPlusTree::scan(Key start, Key end){
//find correct start page and record
//find correct end page and record
//iterate from start to end via sibling pointers, scanning uint8_t* into vector
}
Comment thread plan.md
Comment on lines 38 to 41
void splitChild(page_id_t parent_id, int child_index);
void mergeOrRedistribute(page_id_t node_id); // called on underflow after delete
};
}
The split and merge logic is where you'll spend most of your time. scan is important to implement here because it validates that your leaf-level sibling pointers are correct — a common place for bugs to hide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants