Skip to content

image_dedup in Labels.merge() produce mismatched LabeledFrame, Video pair.  #239

Description

@tom21100227

Expected Behavior

When using labels.merge() to merge two slp files with ImageVideo backends, if there's intersecting annotations (same image has been labelled on both slp files). merge() should either combine, or delete one of them.

Actual Behavior

# assumes: 
this: sio.Labels; other: sio.Labels;
this.merge(other)

Annotations (LabeledFrames) from the other sleap file that already exists in this would be pointing at other images in the deduplicated ImageVideo. This does not throw an error normally but corrupts the annotation.

In special cases where len(deduped_video) < the original index of those duplicated annotation, this would throw an error when accessing those images (e.g. in case where calling .create_training_split() on the merged slp files.

for labelled_frame in merged_labels: 
    image = labelled_frame.video[labelled_frame.frame_idx]  # Would throw an out of range error. 

See graph:

Image

Steps to recreate

See graph above. I can also provide two small slp files I made to recreate the issue if that's

Screenshot

Image

bad here is Duplication case 2 where an OOB is thrown.

Hypothesis

sleap_io.model.labesl.py:L1787-L1805: /sleap_io/model/labels.py

self.videos.append(deduped_video)
video_map[other_video] = deduped_video # <== Here other_video gets mapped to deduped video
# Build frame index mapping for remaining frames
if isinstance(
    other_video.filename, list
) and isinstance(deduped_video.filename, list):
    other_basenames = [
        Path(f).name for f in other_video.filename
    ]
    deduped_basenames = [
        Path(f).name for f in deduped_video.filename
    ]
    for old_idx, basename in enumerate(other_basenames):
        if basename in deduped_basenames: # <== Here, if `basename` is not a part of deduped, it will not be added to `frame_idx_map`. 
            new_idx = deduped_basenames.index(basename)
            frame_idx_map[(other_video, old_idx)] = (
                deduped_video,
                new_idx,
            )

So that in L1860-L1879:

            for frame_idx, other_frame in enumerate(other.labeled_frames):
                if progress_callback:
                    progress_callback(
                        frame_idx,
                        total_frames,
                        f"Merging frame {frame_idx + 1}/{total_frames}",
                    )

                # Check if frame index needs remapping (for deduplicated/merged videos)
                if (other_frame.video, other_frame.frame_idx) in frame_idx_map: # <== It's not in frame_idx_map so it will go to else. 
                    mapped_video, mapped_frame_idx = frame_idx_map[
                        (other_frame.video, other_frame.frame_idx)
                    ]
                else:
                    # Map video to self
                    mapped_video = video_map.get(other_frame.video, other_frame.video) # <== map to deduped_video
                    mapped_frame_idx = other_frame.frame_idx # <== but still maintains the original index

Suggested Fix:

Delete duplicate annotation or add a potential option for user to choose if they want to merge those intersecting annotation or just pick the best one?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions