Memory Optimization: Phase 3 - Streaming Patch Application

This document builds upon Phase 1 (Streaming Compression) and Phase 2 (Memory-Mapped IO), focusing on the final optimization to eliminate large heap allocations in bspatch.

1. Problem: The "1N" Heap Overhead

Even with the old file being memory-mapped (Phase 2), the standard bspatch algorithm still required allocating a full buffer for the new file before writing it to disk.

Baseline Memory Model: old + new ≈ 2N heap.
Phase 2 (mmap) Model: new ≈ 1N heap.
Issue: For a 100MB target file, the process still requires 100MB of contiguous physical RAM (RSS) for the new buffer, which is prohibitive for many embedded or mobile environments.

2. Solution: Streaming Patch Application

Since the new file is generated sequentially by applying diffs and extra data to the old file, we can refactor bspatch to operate in a Streaming mode.

Mechanism

Scratch Buffer: A small, fixed-size scratch buffer (128 KB) is allocated on the heap once.
Chunked Processing: Patch entries (diff and extra strings) are read from the patch_packer in 128 KB chunks.
Zero-Copy Addition: For diff data, the algorithm adds the corresponding chunk from the memory-mapped old file directly into the scratch buffer.
Incremental Writes: Each processed chunk is written immediately to the output newfile stream.
Heap Elimination: The malloc(newsize) call is completely removed.

3. Results (Node.js 68MB Dataset)

Metric	Phase 2 (mmap)	Phase 3 (Streaming)	REDUCTION
Peak Heap Memory	68.4 MB	144.2 KB	>99.8%
Memory Model	1N	~0N	~1N
Execution Time	0.51s (ZSTD)	0.54s (ZSTD)	~5.8% Overhead

Analysis

Memory: The heap usage is now constant regardless of the file size. The 144 KB peak includes the 128 KB scratch buffer plus internal state for the decompression libraries.
Speed: There is a small performance trade-off (~5%) due to the increased frequency of function calls and I/O operations (chunked vs. bulk).
Embedded Suitability: This optimization allows bspatch to apply updates to gigabyte-sized files on devices with only a few megabytes of available RAM.

4. Performance vs. Buffer Size

We tested various scratch buffer sizes to find the optimal balance between speed and memory:

Buffer Size	Execution Time (ZSTD)	Peak Heap
16 KB	67.8 ms	32 KB
64 KB	69.4 ms	80 KB
128 KB	71.3 ms	144 KB
256 KB	69.6 ms	272 KB

Note: Results show that beyond 16KB, the buffer size has negligible impact on speed. 128KB was selected as a safe, efficient default.

5. Technical Implementation

source/bspatch.c: Main loop refactored from a single-pass buffer approach to a chunked-pass approach.

6. Conclusion

With the completion of Phase 3, bspatch has evolved from a 2N memory model to a near-zero heap model. When combined with mmap, the process footprint is dominated by the OS's efficient page cache management rather than fixed heap allocations, providing maximum scalability and stability for all target environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Optimization: Phase 3 - Streaming Patch Application

1. Problem: The "1N" Heap Overhead

2. Solution: Streaming Patch Application

Mechanism

3. Results (Node.js 68MB Dataset)

Analysis

4. Performance vs. Buffer Size

5. Technical Implementation

6. Conclusion

FilesExpand file tree

memory_optimization_3.md

Latest commit

History

memory_optimization_3.md

File metadata and controls

Memory Optimization: Phase 3 - Streaming Patch Application

1. Problem: The "1N" Heap Overhead

2. Solution: Streaming Patch Application

Mechanism

3. Results (Node.js 68MB Dataset)

Analysis

4. Performance vs. Buffer Size

5. Technical Implementation

6. Conclusion