Refactor converter to use pyvips streaming and multiprocessing for large files #159

dt-yuhui · 2025-09-28T05:00:33Z

Hi there,

Thanks for the encouragement to create a PR on this! This pull request addresses the core issue of memory errors (like loci.formats.FormatException: Image plane too large) that occur when trying to convert very large Whole-Slide Images using BioFormats library.

The main changes are:

Switched to a Streaming Approach: The core conversion logic in Converter.py has been refactored to use pyvips's streaming capabilities (access="sequential"). Instead of loading the entire image into RAM, it now processes the file in chunks. This completely resolves the memory bottleneck and allows for the conversion of arbitrarily large files.
Added Multiprocessing for Batch Conversion: I've integrated Python's multiprocessing library into the process_all method. This allows the script to leverage multiple CPU cores to process files in parallel, dramatically reducing the time required to convert a large directory of images.

Important Note on a Design Choice:

In implementing this, I've focused on making the primary use case (handling large WSI files like .svs, .ndpi, etc.) as robust and efficient as possible. To simplify the logic and dependencies, I have removed the fallback mechanism that used BioFormatsSlideReader.

My reasoning is that I'm not very familiar with the Bio-Formats library and, more importantly, I don't have access to many of the files listed in BIOFORMAT_EXTENSIONS (like .ome.tif, .lif, etc.) to properly test and validate a fallback implementation. The current pyvips-based solution already handles the most common large-file formats exceptionally well.

Given this change, I wanted to check with you if this contribution is still desired for the project. I'm happy to discuss this further or make any adjustments you see fit.

Thanks for your consideration!

Feat: Refactor converter to use pyvips streaming for large files

86e6b26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor converter to use pyvips streaming and multiprocessing for large files #159

Refactor converter to use pyvips streaming and multiprocessing for large files #159

Uh oh!

dt-yuhui commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor converter to use pyvips streaming and multiprocessing for large files #159

Are you sure you want to change the base?

Refactor converter to use pyvips streaming and multiprocessing for large files #159

Uh oh!

Conversation

dt-yuhui commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant