Skip to content

Latest commit

 

History

History
56 lines (38 loc) · 3.28 KB

File metadata and controls

56 lines (38 loc) · 3.28 KB

Memory Optimization: Phase 3 - Streaming Patch Application

This document builds upon Phase 1 (Streaming Compression) and Phase 2 (Memory-Mapped IO), focusing on the final optimization to eliminate large heap allocations in bspatch.

1. Problem: The "1N" Heap Overhead

Even with the old file being memory-mapped (Phase 2), the standard bspatch algorithm still required allocating a full buffer for the new file before writing it to disk.

  • Baseline Memory Model: old + new2N heap.
  • Phase 2 (mmap) Model: new1N heap.
  • Issue: For a 100MB target file, the process still requires 100MB of contiguous physical RAM (RSS) for the new buffer, which is prohibitive for many embedded or mobile environments.

2. Solution: Streaming Patch Application

Since the new file is generated sequentially by applying diffs and extra data to the old file, we can refactor bspatch to operate in a Streaming mode.

Mechanism

  • Scratch Buffer: A small, fixed-size scratch buffer (128 KB) is allocated on the heap once.
  • Chunked Processing: Patch entries (diff and extra strings) are read from the patch_packer in 128 KB chunks.
  • Zero-Copy Addition: For diff data, the algorithm adds the corresponding chunk from the memory-mapped old file directly into the scratch buffer.
  • Incremental Writes: Each processed chunk is written immediately to the output newfile stream.
  • Heap Elimination: The malloc(newsize) call is completely removed.

3. Results (Node.js 68MB Dataset)

Metric Phase 2 (mmap) Phase 3 (Streaming) REDUCTION
Peak Heap Memory 68.4 MB 144.2 KB >99.8%
Memory Model 1N ~0N ~1N
Execution Time 0.51s (ZSTD) 0.54s (ZSTD) ~5.8% Overhead

Analysis

  • Memory: The heap usage is now constant regardless of the file size. The 144 KB peak includes the 128 KB scratch buffer plus internal state for the decompression libraries.
  • Speed: There is a small performance trade-off (~5%) due to the increased frequency of function calls and I/O operations (chunked vs. bulk).
  • Embedded Suitability: This optimization allows bspatch to apply updates to gigabyte-sized files on devices with only a few megabytes of available RAM.

4. Performance vs. Buffer Size

We tested various scratch buffer sizes to find the optimal balance between speed and memory:

Buffer Size Execution Time (ZSTD) Peak Heap
16 KB 67.8 ms 32 KB
64 KB 69.4 ms 80 KB
128 KB 71.3 ms 144 KB
256 KB 69.6 ms 272 KB

Note: Results show that beyond 16KB, the buffer size has negligible impact on speed. 128KB was selected as a safe, efficient default.

5. Technical Implementation

  • source/bspatch.c: Main loop refactored from a single-pass buffer approach to a chunked-pass approach.

6. Conclusion

With the completion of Phase 3, bspatch has evolved from a 2N memory model to a near-zero heap model. When combined with mmap, the process footprint is dominated by the OS's efficient page cache management rather than fixed heap allocations, providing maximum scalability and stability for all target environments.