Skip to content

Commit 9e303ac

Browse files
talmoclaude
andauthored
Implement comprehensive merging system for annotation files (#216)
* Implement comprehensive merging system for annotation files (#212) This implements a complete merging system for SLEAP-IO to handle combining multiple annotation files, with support for human-in-the-loop workflows and flexible matching strategies. ## Core Features ### Comparison Methods - Added comparison methods to all model classes (Instance, Skeleton, Track, Video, LabeledFrame) - Spatial matching, identity matching, IoU overlap, and structure matching - Methods: same_pose_as(), same_identity_as(), overlaps_with(), matches(), etc. ### Unified Matcher System (sleap_io/model/matching.py) - Configurable matchers for all data types with enum-based methods - SkeletonMatcher: exact, structure, overlap, subset matching - InstanceMatcher: spatial, identity, IoU matching - TrackMatcher: name and identity matching - VideoMatcher: path, basename, content, auto matching - Pre-configured matchers for common use cases ### Frame-Level Merging - LabeledFrame.merge() with multiple strategies: - smart: preserves user labels over predictions - keep_original: keeps instances from base frame - keep_new: keeps instances from new frame - keep_both: keeps all instances - Configurable instance matching - Conflict tracking and resolution ### Labels-Level Merging - Comprehensive Labels.merge() method - Skeleton/video/track/frame merging - Provenance tracking for merge history - Error handling modes: continue, strict, warn - Detailed merge result reporting ## Testing - 30 tests for matcher system (test_matching.py) - 12 integration tests for merging workflows (test_merging_integration.py) - Enhanced existing tests for model classes - Coverage improvements: - matching.py: 94.3% coverage (new file) - instance.py: 95.1% (was 78.3%) - skeleton.py: 91.5% (was 83.1%) - video.py: 93.3% (was 84.9%) - labeled_frame.py: 68.4% (was 35.4%) - labels.py: 93.1% (was 83.6%) ## Documentation - Comprehensive user guide in docs/merging.md - Example script in examples/merge_annotations.py - Covers HITL workflows, custom matching, and common use cases ## Key Use Cases - Human-in-the-loop: Merge predictions back into manual annotations - Multi-annotator: Combine annotations from different team members - Partial annotations: Consolidate incomplete annotations - Update predictions: Replace old predictions with new ones Fixes #212 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add comprehensive test for preventing duplicate Symmetry edges - Test direct duplicates (A,B then A,B again) - Test reversed duplicates (B,A after A,B) - Test batch operations with add_symmetries() - Test mixed new and duplicate symmetries - Test using Node objects directly - Verify Symmetry set behavior 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add missing merging features: MergeProgressBar, fallback directories, and pre-configured matchers - Add MergeProgressBar class for visual progress tracking during merge operations - Implement fallback directory logic in VideoMatcher for cross-platform path resolution - Add missing pre-configured matchers: IOU_MATCHER, IDENTITY_INSTANCE_MATCHER, OVERLAP_SKELETON_MATCHER, PATH_VIDEO_MATCHER, BASENAME_VIDEO_MATCHER - Add comprehensive tests for all new features - Fix linting issues in test_skeleton.py 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add comprehensive API documentation for merging module - Add all enums: SkeletonMatchMethod, InstanceMatchMethod, TrackMatchMethod, VideoMatchMethod, FrameStrategy, ErrorMode - Add all matcher classes: SkeletonMatcher, InstanceMatcher, TrackMatcher, VideoMatcher, FrameMatcher - Add all pre-configured matchers (13 total) - Add result and error classes: MergeResult, ConflictResolution, MergeError, SkeletonMismatchError, VideoNotFoundError - Add MergeProgressBar for progress tracking - Add Labels.merge method reference - Organize API section with clear subsections 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove temporary planning and example files - Remove PLAN.md (implementation planning document - no longer needed) - Remove examples/merge_annotations.py (placeholder file that was never implemented) These files were used during development but are not needed in the final implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve test coverage for matching.py from 84.6% to 87.5% - Add tests for invalid match method error handling - Add tests for edge cases in instance matching (no overlap, no bounding box) - Add tests for video matcher fallback directories - Add tests for MergeProgressBar callback functionality * Further improve test coverage for matching.py to 89.4% - Add tests for find_matches score calculation edge cases - Add tests for video matcher fallback directory scenarios - Add tests for complex relative path resolution - Overall improvement from 84.6% to 89.4% coverage * Fix linting issues with ruff - Auto-format code with ruff format - Fix import ordering - Remove unused variable assignment * Improve test coverage for labeled_frame.py from 68.4% to 95.2% - Add comprehensive tests for LabeledFrame.matches() method - Add tests for LabeledFrame.similarity_to() method covering all edge cases - Add tests for LabeledFrame.merge() method edge cases - Coverage now exceeds project threshold of 92.5% - Overall project coverage improved to 93.2% * Achieve 100% test coverage for matching.py - Add tests for all NaN points in spatial matching (lines 148-149) - Add tests for IoU calculation with no intersection (lines 170-171) - Add tests for null bounding boxes in IoU matching (lines 172-173) - Add tests for video fallback directory matching (lines 241-245) - Add tests for base_path file resolution (lines 263-267) - Add tests for exception handling in relative path resolution (lines 268-269) - Add tests for same relative path structure matching (lines 258-260) These tests cover all previously uncovered edge cases in the matching module, bringing coverage from 89.4% to 100%. * Fix test for video matcher exception handling - Use mock objects for testing None filename condition to avoid Video initialization errors - Adjust test for relative paths to use different basenames to properly test the false case * Add comprehensive tests for Labels.merge() functionality - Add 15 test functions covering all merge scenarios - Test skeleton mismatch handling (strict/warn modes) - Test progress callbacks and provenance tracking - Test video/track/instance matching with custom matchers - Test conflict resolution and frame strategies - Test suggestions merging and error handling - Achieve 97% coverage for labels.py - Follow pytest conventions (no unittest.mock) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add tests for VideoMatchMethod.RESOLVE merging scenarios - Add test for fallback directories resolution - Add test for base_path resolution - Add test for complex resolution scenarios - Tests cover previously uncovered code paths in matching.py - Use actual video files from test fixtures for realistic testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix test expectation for VideoMatchMethod.RESOLVE base_path test The test was expecting videos not to match, but they correctly match when the same basename exists in base_path. This is the intended behavior of the RESOLVE method. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add targeted test for VideoMatchMethod.RESOLVE coverage This test specifically targets the uncovered code paths in matching.py: - Fallback directory iteration and file existence checks - Base path file existence checks - Exception handling for relative_to failures The test ensures we hit lines 241-269 in matching.py that were previously uncovered. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix unreachable code in VideoMatchMethod.RESOLVE The fallback_directories and base_path logic (lines 241-269) was previously unreachable due to a logic error. The code was checking if basenames match only after matches_path(strict=False) returned False, but matches_path(strict=False) returns True whenever basenames match, creating a logical contradiction. This fix restructures the RESOLVE method to: 1. First check if videos are the same object 2. Then check if basenames match, and if so, try fallback resolution 3. Only check matches_path after confirming paths don't already match This makes the fallback directories and base_path code paths reachable and testable. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add additional test coverage for VideoMatchMethod.RESOLVE edge cases - Test same object matching (line 228) - Test fallback directories with non-existent file (lines 245-248) - Test base_path only without fallback_directories (line 253) - Test exception handling in relative_to (lines 273-274) - Test non-matching basenames with path check (line 278) These tests improve coverage for the previously unreachable code paths that were fixed in the previous commit. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add test coverage for skeleton and video edge cases - Add test for skeleton edge mismatch (line 731) - Add test for skeleton symmetry count mismatch (line 735) - Add test for video backend type comparison (line 494) - Test different backend types (MediaVideo vs HDF5Video vs ImageVideo) These tests improve coverage for the merging PR by testing edge cases in skeleton matching and video content comparison. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add test coverage for labeled_frame merge edge cases - Test predictions without score attributes (lines 315-316) - Test matched predictions that should be removed (lines 329-330) - Handle edge cases when merging predictions with missing scores - Test prediction replacement logic in smart merge strategy These tests improve coverage for the LabeledFrame.merge() method by testing edge cases in conflict resolution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix logic bug in LabeledFrame.merge() method This commit fixes a logic bug in the merge method where matched instances weren't consistently added to used_indices in conflict scenarios, causing unreachable defensive code and coverage issues. ## Changes - Add used_indices.add(self_idx) for user vs user conflicts (line 293) - Add used_indices.add(self_idx) for user vs prediction conflicts (line 308) - Add documentation clarifying defensive logic is now unreachable - Add test to verify the fix works correctly - Clean up obsolete tests that targeted the previously unreachable code ## Impact - Improves test coverage for labeled_frame.py - Makes merge logic consistent and predictable - Maintains defensive programming with documented safety net - All existing tests continue to pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused variable in test_video.py Removed unused `video_seq2` variable from test_video_matches_path_image_sequences_strict_false test to fix linting error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add test coverage for InstanceMatcher.find_matches() IoU edge cases This test specifically covers the missing lines (170-173) in the IoU score calculation within find_matches() method: - When bounding boxes don't intersect (score = 0.0) - When one or both instances have no valid bounding box (score = 0.0) Coverage for sleap_io/model/matching.py improved from 86.7% to 87.9%. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove redundant RESOLVE video matching method The RESOLVE method was functionally identical to BASENAME, creating unnecessary confusion. This change simplifies the video matching API while maintaining all functionality. Changes: - Remove VideoMatchMethod.RESOLVE enum value entirely - Update all references to use BASENAME instead - Remove misleading "backward compatibility" comments - Fix documentation to accurately describe content matching (shape + backend type) - Add comprehensive attribute documentation to all enums and classes for better API docs - Update docs/merging.md with detailed, real-world examples including directory trees The simplified matching system now offers: - PATH: Exact path matching - BASENAME: Filename-only matching (what RESOLVE used to do) - CONTENT: Video shape and backend type matching - AUTO: Smart fallback (BASENAME → CONTENT) All tests pass with 99.6% coverage on the matching module. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor mkdocs.yml: remove cookie consent and analytics sections, reorganize navigation structure * Update pyproject.toml: add Python 3.13 to classifiers * Fix skeleton remapping for overlapping frames during merge When merging Labels with overlapping frames, instances from the merged frames were retaining references to their original skeleton objects instead of being remapped to the matching skeleton in the target Labels. This caused a ValueError during save because the instance skeletons weren't in the Labels.skeletons list. The fix adds remapping logic after frame-level merge to ensure all instances reference the correct skeleton from the target Labels object. - Added skeleton/track remapping for instances in merged overlapping frames - Added test to verify skeleton references are correctly updated - Verified fix with real-world data from notebook example Fixes the issue where merging predictions back into a project would fail during save with "Skeleton ... is not in list" error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix linting issues (line length) * Remove neural recording references from merging docs Neural recordings don't make sense in the context of tracking animals, so removed those references from the video matching examples. --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent fe9ab9f commit 9e303ac

16 files changed

Lines changed: 6065 additions & 25 deletions

docs/merging.md

Lines changed: 669 additions & 0 deletions
Large diffs are not rendered by default.

mkdocs.yml

Lines changed: 6 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -116,22 +116,6 @@ markdown_extensions:
116116
- pymdownx.tilde
117117

118118
extra:
119-
consent:
120-
title: Cookie consent
121-
description: >-
122-
We use cookies to recognize your repeated visits and preferences, as well
123-
as to measure the effectiveness of our documentation and whether users
124-
find what they're searching for. With your consent, you're helping us to
125-
make our documentation better.
126-
actions:
127-
- accept
128-
- reject
129-
- manage
130-
131-
analytics:
132-
provider: google
133-
property: G-V7MWLE7LXW
134-
135119
version:
136120
provider: mike
137121

@@ -145,15 +129,16 @@ extra_css:
145129
- css/mkdocstrings.css
146130

147131
copyright: >
148-
Copyright &copy; 2022 - 2024 Talmo Lab –
149-
<a href="#__consent">Change cookie settings</a>
132+
Copyright &copy; 2022 - 2025 Talmo Lab
150133
151134
nav:
152135
- Overview: index.md
153-
- Examples: examples.md
154136
- Changelog: changelog.md
155137
- Releases: https://github.com/talmolab/sleap-io/releases
156-
- Core API:
138+
- Guides:
139+
- Examples: examples.md
140+
- Merging: merging.md
141+
- Reference:
157142
- Model: model.md
158143
- Formats: formats.md
159-
- Full API: reference/
144+
- Full API: reference/

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ classifiers = [
1717
"Programming Language :: Python :: 3.9",
1818
"Programming Language :: Python :: 3.10",
1919
"Programming Language :: Python :: 3.11",
20-
"Programming Language :: Python :: 3.12"]
20+
"Programming Language :: Python :: 3.12",
21+
"Programming Language :: Python :: 3.13"]
2122
dependencies = [
2223
"numpy",
2324
"attrs",

sleap_io/model/instance.py

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,51 @@ class Track:
331331

332332
name: str = ""
333333

334+
def matches(self, other: "Track", method: str = "name") -> bool:
335+
"""Check if this track matches another track.
336+
337+
Args:
338+
other: Another track to compare with.
339+
method: Matching method - "name" (match by name) or "identity"
340+
(match by object identity).
341+
342+
Returns:
343+
True if the tracks match according to the specified method.
344+
"""
345+
if method == "name":
346+
return self.name == other.name
347+
elif method == "identity":
348+
return self is other
349+
else:
350+
raise ValueError(f"Unknown matching method: {method}")
351+
352+
def similarity_to(self, other: "Track") -> dict[str, any]:
353+
"""Calculate similarity metrics with another track.
354+
355+
Args:
356+
other: Another track to compare with.
357+
358+
Returns:
359+
A dictionary with similarity metrics:
360+
- 'same_name': Whether the tracks have the same name
361+
- 'same_identity': Whether the tracks are the same object
362+
- 'name_similarity': Simple string similarity score (0-1)
363+
"""
364+
# Calculate simple string similarity
365+
if self.name and other.name:
366+
# Simple character overlap similarity
367+
common_chars = set(self.name.lower()) & set(other.name.lower())
368+
all_chars = set(self.name.lower()) | set(other.name.lower())
369+
name_similarity = len(common_chars) / len(all_chars) if all_chars else 0
370+
else:
371+
name_similarity = 1.0 if self.name == other.name else 0.0
372+
373+
return {
374+
"same_name": self.name == other.name,
375+
"same_identity": self is other,
376+
"name_similarity": name_similarity,
377+
}
378+
334379

335380
@attrs.define(auto_attribs=True, slots=True, eq=False)
336381
class Instance:
@@ -611,6 +656,144 @@ def replace_skeleton(
611656
self.points = new_points
612657
self.points["name"] = self.skeleton.node_names
613658

659+
def same_pose_as(self, other: "Instance", tolerance: float = 5.0) -> bool:
660+
"""Check if this instance has the same pose as another instance.
661+
662+
Args:
663+
other: Another instance to compare with.
664+
tolerance: Maximum distance (in pixels) between corresponding points
665+
for them to be considered the same.
666+
667+
Returns:
668+
True if the instances have the same pose within tolerance, False otherwise.
669+
670+
Notes:
671+
Two instances are considered to have the same pose if:
672+
- They have the same skeleton structure
673+
- All visible points are within the tolerance distance
674+
- They have the same visibility pattern
675+
"""
676+
# Check skeleton compatibility
677+
if not self.skeleton.matches(other.skeleton):
678+
return False
679+
680+
# Get visible points for both instances
681+
self_visible = self.points["visible"]
682+
other_visible = other.points["visible"]
683+
684+
# Check if visibility patterns match
685+
if not np.array_equal(self_visible, other_visible):
686+
return False
687+
688+
# Compare visible points
689+
if not self_visible.any():
690+
# Both instances have no visible points
691+
return True
692+
693+
# Calculate distances between corresponding visible points
694+
self_pts = self.points["xy"][self_visible]
695+
other_pts = other.points["xy"][other_visible]
696+
697+
distances = np.linalg.norm(self_pts - other_pts, axis=1)
698+
699+
return np.all(distances <= tolerance)
700+
701+
def same_identity_as(self, other: "Instance") -> bool:
702+
"""Check if this instance has the same identity (track) as another instance.
703+
704+
Args:
705+
other: Another instance to compare with.
706+
707+
Returns:
708+
True if both instances have the same track identity, False otherwise.
709+
710+
Notes:
711+
Instances have the same identity if they share the same Track object
712+
(by identity, not just by name).
713+
"""
714+
if self.track is None or other.track is None:
715+
return False
716+
return self.track is other.track
717+
718+
def overlaps_with(self, other: "Instance", iou_threshold: float = 0.5) -> bool:
719+
"""Check if this instance overlaps with another based on bounding box IoU.
720+
721+
Args:
722+
other: Another instance to compare with.
723+
iou_threshold: Minimum IoU (Intersection over Union) value to consider
724+
the instances as overlapping.
725+
726+
Returns:
727+
True if the instances overlap above the threshold, False otherwise.
728+
729+
Notes:
730+
Overlap is computed using the bounding boxes of visible points.
731+
If either instance has no visible points, they don't overlap.
732+
"""
733+
# Get visible points for both instances
734+
self_visible = self.points["visible"]
735+
other_visible = other.points["visible"]
736+
737+
if not self_visible.any() or not other_visible.any():
738+
return False
739+
740+
# Calculate bounding boxes
741+
self_pts = self.points["xy"][self_visible]
742+
other_pts = other.points["xy"][other_visible]
743+
744+
self_bbox = np.array(
745+
[
746+
[np.min(self_pts[:, 0]), np.min(self_pts[:, 1])], # min x, y
747+
[np.max(self_pts[:, 0]), np.max(self_pts[:, 1])], # max x, y
748+
]
749+
)
750+
751+
other_bbox = np.array(
752+
[
753+
[np.min(other_pts[:, 0]), np.min(other_pts[:, 1])],
754+
[np.max(other_pts[:, 0]), np.max(other_pts[:, 1])],
755+
]
756+
)
757+
758+
# Calculate intersection
759+
intersection_min = np.maximum(self_bbox[0], other_bbox[0])
760+
intersection_max = np.minimum(self_bbox[1], other_bbox[1])
761+
762+
if np.any(intersection_min >= intersection_max):
763+
# No intersection
764+
return False
765+
766+
intersection_area = np.prod(intersection_max - intersection_min)
767+
768+
# Calculate union
769+
self_area = np.prod(self_bbox[1] - self_bbox[0])
770+
other_area = np.prod(other_bbox[1] - other_bbox[0])
771+
union_area = self_area + other_area - intersection_area
772+
773+
# Calculate IoU
774+
iou = intersection_area / union_area if union_area > 0 else 0
775+
776+
return iou >= iou_threshold
777+
778+
def bounding_box(self) -> Optional[np.ndarray]:
779+
"""Get the bounding box of visible points.
780+
781+
Returns:
782+
A numpy array of shape (2, 2) with [[min_x, min_y], [max_x, max_y]],
783+
or None if there are no visible points.
784+
"""
785+
visible = self.points["visible"]
786+
if not visible.any():
787+
return None
788+
789+
pts = self.points["xy"][visible]
790+
return np.array(
791+
[
792+
[np.min(pts[:, 0]), np.min(pts[:, 1])],
793+
[np.max(pts[:, 0]), np.max(pts[:, 1])],
794+
]
795+
)
796+
614797

615798
@attrs.define(eq=False)
616799
class PredictedInstance(Instance):

0 commit comments

Comments
 (0)