Added columnar bruh format and some overall improvements#5
Merged
Conversation
…codebase general structure
There was a problem hiding this comment.
Pull request overview
This pull request adds a new columnar binary file format called "BRUH" (inspired by Apache Parquet) along with significant refactoring to improve code organization and performance. The PR introduces continuous memory storage for columns using BitVector for efficient null tracking and improves the CSV implementation by moving code from headers to source files.
Changes:
- Introduces the BRUH columnar binary format with reader/writer implementations supporting Int64, Double, Bool, and String data types
- Refactors column implementations to use BitVector for null masks and continuous memory storage for strings
- Moves CSV reader/writer implementations from header files to source files and renames SchemaReader to SchemaManager
- Adds a command-line converter application for converting between CSV and BRUH formats
Reviewed changes
Copilot reviewed 30 out of 32 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_bruh.cpp | Comprehensive test suite for BRUH format reader/writer functionality |
| tests/CMakeLists.txt | Adds test_bruh.cpp to test compilation |
| src/util/parse.cpp | Moves LowercaseEquals implementation from header to source file |
| src/csv/schema_manager.cpp | Implements SchemaManager methods (renamed from SchemaReader) |
| src/csv/csv_row_reader.cpp | Moves CSV row parsing logic from header to source file |
| src/csv/csv_batch_writer.cpp | Moves CSV batch writing logic from header to source file |
| src/csv/csv_batch_reader.cpp | Moves CSV batch reading logic from header to source file |
| src/bruh/bruh_batch_writer.cpp | Implements BRUH format writer with columnar data serialization |
| src/bruh/bruh_batch_reader.cpp | Implements BRUH format reader with metadata parsing and deserialization |
| include/util/stream_helper.h | Provides binary I/O utilities for reading/writing trivially copyable types |
| include/util/parse.h | Changes LowercaseEquals from inline to declared function |
| include/util/bit_vector.h | Adds BitVector class for compact boolean storage using uint64_t backing |
| include/csv/schema_reader.h | Deleted - renamed to schema_manager.h |
| include/csv/schema_manager.h | Declares SchemaManager class for schema serialization |
| include/csv/csv_row_reader.h | Converts ReadRow to declaration only |
| include/csv/csv_batch_writer.h | Converts Write/WriteHeader/WriteField to declarations, adds final specifier |
| include/csv/csv_batch_reader.h | Converts ReadNext to declaration only, adds final specifier |
| include/csv/csv.h | Updates include from schema_reader.h to schema_manager.h |
| include/core/columns/string_column.h | Refactors to use continuous char storage with offsets/lengths and BitVector for nulls |
| include/core/columns/numeric_column.h | Updates to use BitVector for null masks |
| include/core/columns/bool_column.h | Updates to use BitVector for both data and null masks |
| include/bruh/format.h | Defines BRUH file format structures and constants |
| include/bruh/bruh_batch_writer.h | Declares BruhBatchWriter for writing batches to BRUH format |
| include/bruh/bruh_batch_reader.h | Declares BruhBatchReader for reading batches from BRUH format |
| include/bruh/bruh.h | Main include file for BRUH format functionality |
| cmake/deps.cmake | Adds DOWNLOAD_EXTRACT_TIMESTAMP and ABSL_PROPAGATE_CXX_STD options |
| benchmarks/bench_bruh.cpp | Adds benchmarks for BRUH reader/writer performance |
| benchmarks/CMakeLists.txt | Adds bench_bruh.cpp to benchmark compilation |
| apps/converter/main.cpp | Implements CSV-to-BRUH and BRUH-to-CSV converter command-line tool |
| CMakeLists.txt | Adds new source files to columnar_lib compilation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.