Skip to content

Commit 2509cde

Browse files
C-Achardderuyter92Copilot
authored
Add lazy_imread function to read list of images with mismatched extensions (#154)
* Update __init__.py * add `lazy_imread` function that accepts list of images This commit changes image reading to be more flexible: input can be a single path, a glob pattern or a list of image paths. Images of different sizes are stacked using dask delayed arrays. (Mostly inspired by napari's own default implementation.) Previous implementation only allowed for a glob pattern when loading multiple images. This has the disadvantage that files with different extensions cannot be loaded. Fixes issue #3160 with multiple images that have different extensions. * Update lazy_imread Simplifies and unifies image reading from folders and lists by reintroducing lazy_imread function and updating read_images to handle both cases. Improves extension filtering, error handling, and metadata construction. * Update test_reader.py * Update _reader.py * Refactor image reading and error handling in _reader.py Improved error handling for image reading by raising an OSError when images cannot be read. Refactored Dask delayed lambda usage for image normalization, and simplified file path sorting and variable naming for clarity and consistency. * Improve image and annotation file handling in reader Refactored get_folder_parser to process .h5 files directly in the loop and improved error messages. Enhanced read_images to raise errors for missing or multiple matches, ensuring only a single image is processed as expected. * Refactor lazy image reading and normalization logic Introduces helper functions for image reading and normalization, improving clarity and maintainability. Refactors the lazy_imread function to use these helpers and partial application, and makes minor docstring and metadata improvements throughout the file. * Improve error handling and variable naming in reader Replaces silent failures with explicit exceptions in lazy_imread, clarifies shape ordering for OpenCV streams, and renames variables for consistency. The return value of read_video now includes an additional 'image' string for clarity. * Clarify error message for multiple image matches Improved the ValueError message in read_images to better explain the expectation when multiple files match a pattern for non-list path inputs. * Expand and refactor video and image reader tests Replaces and extends previous video-related tests with more comprehensive coverage, including property checks, frame reading, error handling, and output structure validation. Adds new tests for image reading, metadata ordering, grayscale/RGBA handling, and lazy loading behaviors. * Refactor lazy_imread and improve metadata handling Renamed internal variables in lazy_imread for clarity and updated delayed array creation for better readability. Enhanced metadata returned by _load_superkeypoints_diagram to include diagram name. Updated test comment for cv2.imwrite to clarify behavior. * Refactor image reading logic and improve path handling Introduces _expand_image_paths to robustly resolve input paths, directories, and globs into valid image files. Replaces lazy_imread with _lazy_imread, updating logic for Dask-based lazy loading and stacking. Refactors read_images to use the new path expansion and loading functions, improving consistency and error handling for single and multiple image inputs. * Add tests for mixed image extensions and glob patterns Introduces tests to verify that the image reader correctly handles lists, tuples, and glob patterns with mixed file extensions, ensuring only supported formats are included and order is preserved. * Update src/napari_deeplabcut/_reader.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/napari_deeplabcut/_reader.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test_reader.py * Refactor lazy image reading and update tests Refactored _lazy_imread to pass first_shape and first_dtype to make_delayed_array for improved clarity and consistency. Removed unnecessary import and assertion in test_reader, and ensured tests properly check unsupported image extensions. * Fix image extension filtering and update tests Improved directory image filtering in _expand_image_paths to only include files with supported extensions. Updated tests to use a fake unsupported extension instead of .tif, ensuring unsupported files are correctly ignored. * Use is_numeric_dtype for image_paths type check (pandas >=3.0) Replaces np.issubdtype with pandas' is_numeric_dtype to more robustly check if image_paths is numeric in read_hdf. This improves compatibility with different pandas index types. * Update _reader.py * Revert "Update _reader.py" This reverts commit 8c276c5. * Update _reader.py * Revert - handle missing superkeypoints diagram with FileNotFoundError again Wraps the image loading in a try-except block and raises a FileNotFoundError with a descriptive message if the diagram for the specified super_animal is not found. * Fix directory extension check and update test variable names for clarity --------- Co-authored-by: Jaap de Ruyter <deruyter92@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent f8bf1d3 commit 2509cde

File tree

2 files changed

+489
-50
lines changed

2 files changed

+489
-50
lines changed

src/napari_deeplabcut/_reader.py

Lines changed: 212 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import json
2-
from collections.abc import Sequence
2+
from collections.abc import Callable, Sequence
33
from pathlib import Path
44

55
import cv2
@@ -58,57 +58,225 @@ def get_config_reader(path):
5858
return read_config
5959

6060

61+
def _filter_extensions(
62+
image_paths: list[str | Path],
63+
valid_extensions: tuple[str] = SUPPORTED_IMAGES,
64+
) -> list[Path]:
65+
"""
66+
Filter image paths by valid extensions.
67+
"""
68+
return [Path(p) for p in image_paths if Path(p).suffix.lower() in valid_extensions]
69+
70+
6171
def get_folder_parser(path):
6272
if not path or not Path(path).is_dir():
6373
return None
64-
6574
layers = []
66-
files = Path(path).iterdir()
67-
images = ""
68-
for file in files:
69-
if any(file.name.lower().endswith(ext) for ext in SUPPORTED_IMAGES):
70-
images = str(Path(path) / f"*{Path(file.name).suffix}")
71-
break
75+
76+
images = _filter_extensions(Path(path).iterdir(), valid_extensions=SUPPORTED_IMAGES)
77+
7278
if not images:
73-
raise OSError(f"No supported images were found in {path}.")
79+
raise OSError(f"No supported images were found in {path} with extensions {SUPPORTED_IMAGES}.")
7480

75-
layers.extend(read_images(images))
81+
image_layer = read_images(images)
82+
layers.extend(image_layer)
7683
for file in Path(path).iterdir():
7784
if file.name.endswith(".h5"):
78-
layers.extend(read_hdf(str(file)))
79-
break # one h5 per annotated video
80-
85+
try:
86+
layers.extend(read_hdf(str(file)))
87+
break # one h5 per annotated video
88+
except Exception as e:
89+
raise RuntimeError(f"Could not read annotation data from {file}") from e
8190
return lambda _: layers
8291

8392

84-
def read_images(path):
85-
if isinstance(path, list):
86-
first_path = Path(path[0])
87-
suffixes = first_path.suffixes
88-
ext = "".join(suffixes) if suffixes else ""
89-
pattern = f"*{ext}" if ext else "*"
90-
path = str(first_path.parent / pattern)
91-
# Retrieve filepaths exactly as parsed by pims
92-
filepaths = []
93-
for filepath in Path(path).parent.glob(Path(path).name):
94-
relpath = Path(filepath).parts[-3:]
95-
filepaths.append(str(Path(*relpath)))
93+
# Helper functions for lazy image reading and normalization
94+
# NOTE : forced keyword-only arguments for clarity
95+
def _read_and_normalize(*, filepath: Path, normalize_func: Callable[[np.ndarray], np.ndarray]) -> np.ndarray:
96+
arr = cv2.imread(str(filepath), cv2.IMREAD_UNCHANGED)
97+
if arr is None:
98+
raise OSError(f"Could not read image: {filepath}")
99+
return normalize_func(arr)
100+
101+
102+
def _normalize_to_rgb(arr: np.ndarray) -> np.ndarray:
103+
if arr.ndim == 2:
104+
return cv2.cvtColor(arr, cv2.COLOR_GRAY2RGB)
105+
if arr.ndim == 3 and arr.shape[2] == 4:
106+
return cv2.cvtColor(arr, cv2.COLOR_BGRA2RGB)
107+
return cv2.cvtColor(arr, cv2.COLOR_BGR2RGB)
108+
109+
110+
def _expand_image_paths(path: str | Path | list[str | Path] | tuple[str | Path, ...]) -> list[Path]:
111+
# Normalize input to list[Path]
112+
raw_paths = [Path(p) for p in path] if isinstance(path, (list, tuple)) else [Path(path)]
113+
114+
expanded: list[Path] = []
115+
for p in raw_paths:
116+
if p.is_dir() and p.suffix.lower() != ".zarr":
117+
file_matches: list[Path] = []
118+
for ext in SUPPORTED_IMAGES:
119+
file_matches.extend(p.glob(f"*{ext}"))
120+
expanded.extend(x for x in natsorted(file_matches, key=str) if x.is_file())
121+
else:
122+
matches = list(p.parent.glob(p.name))
123+
expanded.extend(matches or [p])
124+
125+
return [p for p in expanded if p.is_file() and p.suffix.lower() in SUPPORTED_IMAGES]
126+
127+
128+
# Lazy image reader that supports directories and lists of files
129+
def _lazy_imread(
130+
filenames: str | Path | list[str | Path],
131+
use_dask: bool | None = None,
132+
stack: bool = True,
133+
) -> np.ndarray | da.Array | list[np.ndarray | da.Array]:
134+
"""Lazily reads one or more images with optional Dask support.
135+
136+
Resolves file paths using `_expand_image_paths`, ensuring consistent
137+
handling of directories, glob patterns, and lists/tuples of paths.
138+
Images are normalized to RGB and may be wrapped in Dask delayed
139+
objects for lazy loading.
140+
141+
Behavior:
142+
* If a single image is resolved:
143+
- The image is read eagerly and returned as a NumPy array.
144+
* If multiple images are resolved:
145+
- The first image is read eagerly to determine shape and dtype.
146+
- Subsequent images are loaded lazily via Dask unless
147+
`use_dask=False`.
148+
- Stacking behavior is controlled by `stack`.
149+
150+
Args:
151+
filenames (str | Path | list[str | Path]):
152+
File path(s), directory, or glob pattern(s) to load.
153+
use_dask (bool | None, optional):
154+
Whether to load images lazily using Dask.
155+
Defaults to `True` when multiple files are found, otherwise
156+
`False`.
157+
stack (bool, optional):
158+
If True, stack images along axis 0 into a single array.
159+
If False, return a list of arrays or delayed arrays.
160+
Defaults to True.
161+
162+
Returns:
163+
np.ndarray | da.Array | list[np.ndarray | da.Array]:
164+
Loaded image data. The return type depends on the number of
165+
images found, the `use_dask` flag, and the `stack` option.
166+
167+
Raises:
168+
ValueError: If no supported images are found.
169+
"""
170+
expanded = _expand_image_paths(filenames)
171+
172+
if not expanded:
173+
raise ValueError(f"No supported images were found for input: {filenames}")
174+
175+
if use_dask is None:
176+
use_dask = len(expanded) > 1
177+
178+
images = []
179+
first_shape = None
180+
first_dtype = None
181+
182+
def make_delayed_array(fp: Path, first_shape: tuple[int, ...], first_dtype: np.dtype) -> da.Array:
183+
"""Create a dask array for a single file."""
184+
return da.from_delayed(
185+
delayed(_read_and_normalize)(filepath=fp, normalize_func=_normalize_to_rgb),
186+
shape=first_shape,
187+
dtype=first_dtype,
188+
)
189+
190+
for fp in expanded:
191+
if first_shape is None:
192+
arr0 = _read_and_normalize(filepath=fp, normalize_func=_normalize_to_rgb)
193+
first_shape = arr0.shape
194+
first_dtype = arr0.dtype
195+
196+
if use_dask:
197+
images.append(make_delayed_array(fp, first_shape, first_dtype))
198+
else:
199+
images.append(arr0)
200+
continue
201+
202+
if use_dask:
203+
images.append(make_delayed_array(fp, first_shape, first_dtype))
204+
else:
205+
images.append(_read_and_normalize(filepath=fp, normalize_func=_normalize_to_rgb))
206+
207+
if len(images) == 1:
208+
return images[0]
209+
210+
try:
211+
return da.stack(images) if use_dask and stack else (np.stack(images) if stack else images)
212+
except ValueError as e:
213+
raise ValueError(
214+
"Cannot stack images with different shapes using NumPy. "
215+
"Ensure all images have the same shape or set stack=False."
216+
) from e
217+
218+
219+
# Read images from a list of files or a glob/string path
220+
def read_images(path: str | Path | list[str | Path]):
221+
"""Reads one or multiple images and returns a Napari Image layer.
222+
223+
Uses `_expand_image_paths` to resolve the input into a list of valid
224+
image files. Supports single paths, glob expressions, directories,
225+
and lists or tuples of such paths.
226+
227+
Behavior:
228+
* If one file is found:
229+
- Loaded using `dask_image.imread.imread`.
230+
* If multiple files are found:
231+
- Loaded lazily using `lazy_imread` into a stacked image
232+
layer.
233+
234+
Args:
235+
path (str | Path | list[str | Path]):
236+
Input path(s), directory, or glob pattern(s) to expand into
237+
supported image files.
238+
239+
Returns:
240+
list[LayerData]:
241+
A list containing one Napari layer tuple of the form
242+
`(data, metadata, "image")`.
243+
244+
Raises:
245+
OSError: If no supported images are found after expansion.
246+
"""
247+
filepaths = _expand_image_paths(path)
248+
249+
if not filepaths:
250+
raise OSError(f"No supported images were found in {path}")
251+
252+
filepaths = natsorted(filepaths, key=str)
253+
254+
# Multiple images → lazy-imread stack
255+
if len(filepaths) > 1:
256+
relative_paths = [str(Path(*fp.parts[-3:])) for fp in filepaths]
257+
params = {
258+
"name": "images",
259+
"metadata": {
260+
"paths": relative_paths,
261+
"root": str(filepaths[0].parent),
262+
},
263+
}
264+
data = _lazy_imread(filepaths, use_dask=True, stack=True)
265+
return [(data, params, "image")]
266+
267+
# Single image → old behavior
268+
image_path = filepaths[0]
96269
params = {
97270
"name": "images",
98271
"metadata": {
99-
"paths": natsorted(filepaths),
100-
"root": str(Path(path).parent),
272+
"paths": [str(Path(*image_path.parts[-3:]))],
273+
"root": str(image_path.parent),
101274
},
102275
}
103-
104-
# https://github.com/soft-matter/pims/issues/452
105-
if len(filepaths) == 1:
106-
path = next(Path(path).parent.glob(Path(path).name), None)
107-
if path is None:
108-
raise FileNotFoundError(f"No files found for pattern: {path}")
109-
return [(imread(path), params, "image")]
276+
return [(imread(str(image_path)), params, "image")]
110277

111278

279+
# Helper to populate keypoint layer metadata
112280
def _populate_metadata(
113281
header: misc.DLCHeader,
114282
*,
@@ -175,6 +343,7 @@ def _load_config(config_path: str):
175343
return yaml.safe_load(file)
176344

177345

346+
# Read config file and create keypoint layer metadata
178347
def read_config(configname: str) -> list[LayerData]:
179348
config = _load_config(configname)
180349
header = misc.DLCHeader.from_config(config)
@@ -196,6 +365,7 @@ def read_config(configname: str) -> list[LayerData]:
196365
return [(None, metadata, "points")]
197366

198367

368+
# Read HDF file and create keypoint layers
199369
def read_hdf(filename: str) -> list[LayerData]:
200370
config_path = misc.find_project_config_path(filename)
201371
layers = []
@@ -228,7 +398,7 @@ def read_hdf(filename: str) -> list[LayerData]:
228398
nrows = df.shape[0]
229399
data = np.empty((nrows, 3))
230400
image_paths = df["level_0"]
231-
if np.issubdtype(image_paths.dtype, np.number):
401+
if pd.api.types.is_numeric_dtype(getattr(image_paths, "dtype", np.asarray(image_paths).dtype)):
232402
image_inds = image_paths.values
233403
paths2inds = []
234404
else:
@@ -254,6 +424,7 @@ def read_hdf(filename: str) -> list[LayerData]:
254424
return layers
255425

256426

427+
# Video reader using OpenCV
257428
class Video:
258429
def __init__(self, video_path):
259430
if not Path(video_path).is_file():
@@ -297,13 +468,14 @@ def close(self):
297468
def read_video(filename: str, opencv: bool = True):
298469
if opencv:
299470
stream = Video(filename)
300-
shape = stream.width, stream.height, 3
471+
# NOTE construct output shape tuple in (H, W, C) order to match read_frame() data
472+
shape = stream.height, stream.width, 3
301473

302474
def _read_frame(ind):
303475
stream.set_to_frame(ind)
304476
return stream.read_frame()
305477

306-
lazy_imread = delayed(_read_frame)
478+
lazy_reader = delayed(_read_frame)
307479
else: # pragma: no cover
308480
from pims import PyAVReaderIndexed
309481

@@ -313,9 +485,9 @@ def _read_frame(ind):
313485
raise ImportError("`pip install av` to use the PyAV video reader.") from None
314486

315487
shape = stream.frame_shape
316-
lazy_imread = delayed(stream.get_frame)
488+
lazy_reader = delayed(stream.get_frame)
317489

318-
movie = da.stack([da.from_delayed(lazy_imread(i), shape=shape, dtype=np.uint8) for i in range(len(stream))])
490+
movie = da.stack([da.from_delayed(lazy_reader(i), shape=shape, dtype=np.uint8) for i in range(len(stream))])
319491
elems = list(Path(filename).parts)
320492
elems[-2] = "labeled-data"
321493
elems[-1] = Path(elems[-1]).stem # + Path(filename).suffix
@@ -326,4 +498,4 @@ def _read_frame(ind):
326498
"root": root,
327499
},
328500
}
329-
return [(movie, params)]
501+
return [(movie, params, "image")]

0 commit comments

Comments
 (0)