Skip to content

Commit bd8c4ac

Browse files
committed
fix(processing): adapt is_padding to fix potential MemoryError
If an unknown chunk size is larger than available RAM on the system where unblob is run, the previous is_padding implementation could lead to MemoryError as it tries to load everything in memory. Fixed by iterating over the unknown chunk with iterate_file while using an early return optimization so we return as soon as we have different bytes.
1 parent c8c903b commit bd8c4ac

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

python/unblob/processing.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,18 @@ def _iterate_directory(self, extract_dirs, processed_paths):
462462

463463

464464
def is_padding(file: File, chunk: UnknownChunk):
465-
return len(set(file[chunk.start_offset : chunk.end_offset])) == 1
465+
chunk_bytes = set()
466+
467+
for small_chunk in iterate_file(
468+
file, chunk.start_offset, chunk.end_offset - chunk.start_offset
469+
):
470+
chunk_bytes.update(small_chunk)
471+
472+
# early return optimization
473+
if len(chunk_bytes) > 1:
474+
return False
475+
476+
return len(chunk_bytes) == 1
466477

467478

468479
def process_patterns(

0 commit comments

Comments
 (0)