Fast C CLI that reads maps from files or stdin, computes the largest empty square (dynamic programming), and prints them back with the largest empty square.
- Dynamic growth I/O buffer: reduces the number of
readsystem calls - Unified parsing and solving: single-pass computation combining map validation and
dpupdate - Flat memory layouts: reduce data access time, memory usage, and copy overhead
- Single-row array dynamic programming layout: improves memory footprint and L1 cache locality.
- In-place editing of the original map: avoids full map copies and checks
- One-time output: avoids multiple
writesystem calls - Integrated tests with the
make testcommand - Integrated benchmark (100 iterations) with the
make benchcommand
- A 10,000Γ10,000 map is processed in less than 80 ms
Measured on macOS (Apple M4) using
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
| Version | Description | 10kΓ10k (ms) |
|---|---|---|
| v1.1.0 | Baseline (42 Paris Piscine version) | ~37,000 ms |
| v1.2.0 | Output optimization β Added output buffer ( char **) |
~3,800 ms |
| v1.3.0 | Output optimization β Switched output to flat buffer ( char *) |
~3,600 ms |
| v1.4.0 | General optimization β Removed initialization loops |
~3,400 ms |
| v2.0.0 | Major refactor β Simplified data structures β Unified parsing and solving β Optimized flat I/O buffers β Flattened map and DP arrays |
~320 ms |
| v2.1.0 | Output optimization β In-place map editing (no full copy) β Only updates the required characters inside the map |
~250 ms |
| v2.1.1 | QoL update β Integrated benchmark mode (10 iterations) |
~200 ms |
| v2.2.0 | Input optimization β Reworked str_grow() to dereference pointer once before the loopβ Reduced redundant memory accesses during buffer reallocation |
~190 ms |
| v2.2.1 | Input optimization β Switched to native C types during file read operations β Implemented in-place reading to remove buffer duplication and reduce latency |
~180 ms |
| v2.3.0 | Parse optimization β Optimized DP minimum computation to reduce branch-misses |
~140 ms |
| v2.4.0 | Parse optimization β Reordered parser condition checks to reduce branch mispredictions β Implemented precomputation of all possible values β Minimized dereferencing in hot loops β Increased integrated benchmark from 10 to 100 iterations |
~100 ms |
| v2.5.0 | Code cleanup & build optimization β Removed unused fields, return values and redundant casts β Inlined hot functions β Added -fomit-frame-pointer and -fno-stack-protector flagsβ Introduced optional PGO build ( make sfast) |
~100 ms |
| v3.0.0 | Code cleanup, tests implementation, bug fixes, and branchless comparison investigation β Added make test and make bench commandsβ Fixed multiple issues β See CHANGELOG.md for more details |
~100 ms |
| v3.1.0 | Parse optimization β Implemented parse_col_0() to speed up parsing and solving of the first col of each row |
~87 ms |
| v3.2.0 | Parse optimization β Implemented single-row array dp for faster updates |
~77 ms |
The BSQ (Biggest Square) is the final algorithmic project of the 42 Paris Piscine.
Its goal is to parse a text-based map - from file(s) or stdin - and compute the largest possible empty square, following the official 42 C Norm v4.
This project is a deep dive into:
- Dynamic programming for 2D optimization problems
- Execution time optimization using low-level CPU profiling and micro-optimization techniques
- Memory management and I/O efficiency
- Strict compliance with the 42 Norm
- Read maps from one or more files, or directly from stdin
- Detect invalid or corrupted maps (missing lines, inconsistent width, invalid characters, etc.)
- Efficiently compute the largest empty square, even on large maps (e.g. 10kΓ10k)
- Output the map with the square marked using the
filledcharacter
The program implements a dynamic programming approach:
- Each cell represents the size of the largest square ending at that point.
- The recurrence relation:
if (map[row][col] == EMPTY)
dp[row][col] = 1 + min(dp[row-1][col], dp[row][col-1], dp[row-1][col-1]);
else
dp[row][col] = 0;- The largest value found indicates the size and position of the biggest square.
For performance reasons, the first row and the first column of each row are computed by dedicated functions:
parse_row_0()andparse_col_0().Additionally, the
dpis implemented using a single-row array and a temporary value (representing the up-left cell:dp[row-1][col-1]).The 2D relation above is shown for clarity, as the actual implementation uses a 1D array for better memory locality.
For experimental branchless versions of the DP computation, see branchless_comparison.md.
This file documents the bitmask-based and XOR-based approaches I tested to minimize branch mispredictions, and explains why it didn't produce the expected results.
bsq/
βββ README.md # This README file
βββ bsq.subject.en.pdf # Official 42 subject (English)
βββ Makefile # Build rules and compiler settings
βββ includes/ # Header files with type definitions and prototypes
βββ srcs/ # Source files (C code)
βΒ Β βββ main.c # Entry point
βΒ Β βββ objects/ # Constructor and destructor functions for custom structs
βΒ Β βββ utils/ # Utilities
βΒ Β βββ read.c # Reads content from filepath/stdin
βΒ Β βββ parse_rules.c # Parses rules
βΒ Β βββ parse_map.c # Simultaneously parses and solves the map
βΒ Β βββ result.c # Prints result
βββ tests/ # Sample maps and performance benchmarks
# Compile the standard binary
make
# Compile the optimized (fast) binary
make fast
# Compile, run the binary to collect profiling data, and then recompile it with PGO optimizations
make sfast./bsq file_path_1 file_path_2 ... file_path_ncat tests/basic_test | ./bsqmake test- Automatically runs 100 iterations with 10kx10k map as input
- Displays average timings to
stderr
stdoutis redirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
make bench# Only clean objects
make clean
# Clean objects and binary
make fcleanβThink out of the square β then fill it.β π§