Skip to content

Added columnar bruh format and some overall improvements#5

Merged
Irval1337 merged 2 commits intomainfrom
added-bruh-format
Jan 23, 2026
Merged

Added columnar bruh format and some overall improvements#5
Irval1337 merged 2 commits intomainfrom
added-bruh-format

Conversation

@Irval1337
Copy link
Owner

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new columnar binary file format called "BRUH" (inspired by Apache Parquet) along with significant refactoring to improve code organization and performance. The PR introduces continuous memory storage for columns using BitVector for efficient null tracking and improves the CSV implementation by moving code from headers to source files.

Changes:

  • Introduces the BRUH columnar binary format with reader/writer implementations supporting Int64, Double, Bool, and String data types
  • Refactors column implementations to use BitVector for null masks and continuous memory storage for strings
  • Moves CSV reader/writer implementations from header files to source files and renames SchemaReader to SchemaManager
  • Adds a command-line converter application for converting between CSV and BRUH formats

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/test_bruh.cpp Comprehensive test suite for BRUH format reader/writer functionality
tests/CMakeLists.txt Adds test_bruh.cpp to test compilation
src/util/parse.cpp Moves LowercaseEquals implementation from header to source file
src/csv/schema_manager.cpp Implements SchemaManager methods (renamed from SchemaReader)
src/csv/csv_row_reader.cpp Moves CSV row parsing logic from header to source file
src/csv/csv_batch_writer.cpp Moves CSV batch writing logic from header to source file
src/csv/csv_batch_reader.cpp Moves CSV batch reading logic from header to source file
src/bruh/bruh_batch_writer.cpp Implements BRUH format writer with columnar data serialization
src/bruh/bruh_batch_reader.cpp Implements BRUH format reader with metadata parsing and deserialization
include/util/stream_helper.h Provides binary I/O utilities for reading/writing trivially copyable types
include/util/parse.h Changes LowercaseEquals from inline to declared function
include/util/bit_vector.h Adds BitVector class for compact boolean storage using uint64_t backing
include/csv/schema_reader.h Deleted - renamed to schema_manager.h
include/csv/schema_manager.h Declares SchemaManager class for schema serialization
include/csv/csv_row_reader.h Converts ReadRow to declaration only
include/csv/csv_batch_writer.h Converts Write/WriteHeader/WriteField to declarations, adds final specifier
include/csv/csv_batch_reader.h Converts ReadNext to declaration only, adds final specifier
include/csv/csv.h Updates include from schema_reader.h to schema_manager.h
include/core/columns/string_column.h Refactors to use continuous char storage with offsets/lengths and BitVector for nulls
include/core/columns/numeric_column.h Updates to use BitVector for null masks
include/core/columns/bool_column.h Updates to use BitVector for both data and null masks
include/bruh/format.h Defines BRUH file format structures and constants
include/bruh/bruh_batch_writer.h Declares BruhBatchWriter for writing batches to BRUH format
include/bruh/bruh_batch_reader.h Declares BruhBatchReader for reading batches from BRUH format
include/bruh/bruh.h Main include file for BRUH format functionality
cmake/deps.cmake Adds DOWNLOAD_EXTRACT_TIMESTAMP and ABSL_PROPAGATE_CXX_STD options
benchmarks/bench_bruh.cpp Adds benchmarks for BRUH reader/writer performance
benchmarks/CMakeLists.txt Adds bench_bruh.cpp to benchmark compilation
apps/converter/main.cpp Implements CSV-to-BRUH and BRUH-to-CSV converter command-line tool
CMakeLists.txt Adds new source files to columnar_lib compilation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Irval1337 Irval1337 merged commit 534b67a into main Jan 23, 2026
3 checks passed
@Irval1337 Irval1337 deleted the added-bruh-format branch January 23, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant