Skip to content

Labels.update() skeleton dedup uses identity, not structure (direct-construction edge) #427

@talmo

Description

@talmo

Summary

Labels.merge() correctly dedupes skeletons structurally via SkeletonMatcher (default method="structure"), so the standard inference→merge flow used by sleap 1.5+ (sleap/gui/learning/runners.py:767) and similar consumers produces correctly-canonicalized projects.

However, Labels.update() (triggered by __attrs_post_init__ on direct Labels(...) construction, and by manual labeled_frames mutation) still dedupes by Python-object identity, which leaks content-equal-but-distinct Skeleton objects into labels.skeletons.

Where

sleap_io/model/labels.py:446-447, same pattern at :847-848 and :876-877:

for inst in lf:
    if inst.skeleton not in self.skeletons:
        self.skeletons.append(inst.skeleton)

Skeleton is @define(eq=False) (sleap_io/model/skeleton.py:96), so __eq__ falls back to object.__eq__ and in is identity. Two content-equal Skeletons both get appended even though Skeleton.matches() (skeleton.py:691) would identify them as structurally equivalent.

Reproduction

skel_a == skel_b       -> False   # identity __eq__
skel_a in [skel_b]     -> False
skel_a.matches(skel_b) -> True
len(labels.skeletons)  -> 2       # expected 1 after Labels.update()

Who's affected

  • Direct Labels(labeled_frames=[...]) construction (post-init → update())
  • Manual labels.labeled_frames.append(lf) followed by labels.update()
  • Notebook/script users assembling inference outputs without merge()

Not affected: the canonical Labels.merge() path used by sleap GUI for inference output ingestion (verified at sleap/gui/learning/runners.py:767-769).

Suggested fix

Mirror what merge() already does internally. In update() and the two parallel paths, use a matches()-based structural lookup and canonicalize inst.skeleton:

canonical = next((s for s in self.skeletons if s.matches(inst.skeleton)), None)
if canonical is None:
    self.skeletons.append(inst.skeleton)
elif canonical is not inst.skeleton:
    inst.skeleton = canonical  # safe — Instance.points is positional

Reassignment is safe because Instance.points is a positional PointsArray ndarray (instance.py:413), not the legacy node-identity-keyed dict that blocked talmolab/sleap#2075.

Priority

Lower than it would have looked from talmolab/sleap#2025 (the user-visible bug there is the inference→merge path, since fixed by talmolab/sleap#2282). This is principled-correctness hygiene; the leak only surfaces for direct-construction edge cases.

Touch sites: labels.py:435-450, :833-850, :870-880. Related: a mirror edge in the sleap-io.js SLP loader is being filed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions