Skip to content

Commit fb6f1a3

Browse files
authored
feat: virtual on-read video cropping (CropVideoBackend) (#460)
Virtual on-read video crops via CropVideoBackend (Video.crop/from_crop), byte-identical to crop_frame, with mosaic shared-decode, SLP round-trip (/video_crops, format 2.3), crop-aware matching, and HDF5 pushdown. Adds apply_crop/apply_crops + 'sio apply-crops' to materialize crops, and imports DLC video_sets crops as provenance (closes #424 item 2). Uncropped files stay byte-identical; old readers degrade gracefully. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
1 parent 18aba05 commit fb6f1a3

27 files changed

Lines changed: 4614 additions & 102 deletions

docs/cli.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ sio render --help
7272
sio trim --help
7373
sio reencode --help
7474
sio transform --help
75+
sio apply-crops --help
7576

7677
# Check version and installed plugins
7778
sio --version
@@ -2523,6 +2524,26 @@ See the [Transforms Guide](transforms.md#config-file-format) for config file for
25232524

25242525
---
25252526

2527+
## Apply Crops
2528+
2529+
`sio apply-crops` materializes [virtual crops](cropping.md) (created via `Video.crop` and
2530+
stored in a `.slp`'s `/video_crops`) into real video files, updating the labels to point at
2531+
the baked files. Unlike `sio transform --crop` (which applies a *new* crop and adjusts
2532+
coordinates), this bakes an *existing* virtual crop and is coordinate-neutral.
2533+
2534+
```bash
2535+
# Bake every virtually-cropped video; baked files go next to the output SLP.
2536+
sio apply-crops mosaic.slp -o baked.slp
2537+
2538+
# Choose the output video directory and filename suffix.
2539+
sio apply-crops mosaic.slp -o baked.slp --video-dir baked_videos/ --suffix _crop
2540+
```
2541+
2542+
Each baked video keeps `source_video` provenance to the uncropped original; uncropped videos
2543+
are left untouched. See the [Virtual cropping guide](cropping.md#applying-baking-a-crop-to-disk).
2544+
2545+
---
2546+
25262547
## Use Cases
25272548

25282549
### Inspecting an Unknown Labels File

docs/cropping.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Virtual cropping
2+
3+
sleap-io can expose a **virtual, on-read crop** of a video — a cropped view whose
4+
frames are produced by decoding the source and slicing in memory, without copying or
5+
re-encoding any pixels on disk. It is the lazy, non-destructive counterpart of the
6+
materializing [Transforms](transforms.md) pipeline: a virtually-cropped frame is
7+
byte-identical to what baking a `Transform(crop=...)` would write.
8+
9+
---
10+
11+
## Quick start
12+
13+
```python
14+
import sleap_io as sio
15+
16+
full = sio.load_video("session.mp4") # (1000, 1080, 1920, 3)
17+
18+
# A cropped view. crop = (x1, y1, x2, y2), with x2/y2 EXCLUSIVE.
19+
view = full.crop((320, 200, 576, 456))
20+
view.shape # (1000, 256, 256, 3) -- cropped
21+
view[0].shape # (256, 256, 3) -- a cropped frame
22+
view.crop # not a thing; use view._crop_tuple() -> (320, 200, 576, 456)
23+
view.source_video is full # True -- provenance to the uncropped original
24+
```
25+
26+
`Video.from_crop` opens a file and crops it in one call:
27+
28+
```python
29+
view = sio.Video.from_crop("session.mp4", crop=(320, 200, 576, 456))
30+
```
31+
32+
The returned object is a normal [`Video`](model/video.md): `shape`, `len()`, `grayscale`,
33+
NumPy-style indexing, and matching all report the **cropped** view.
34+
35+
---
36+
37+
## The crop convention
38+
39+
A crop is `(x1, y1, x2, y2)` in **source pixel coordinates**, with `x2`/`y2`
40+
**exclusive** — exactly the convention used by [`Transform`](transforms.md) and
41+
`crop_frame`. The cropped size is `(y2 - y1, x2 - x1)`.
42+
43+
Coordinates may be **negative or extend past the source** — out-of-bounds regions are
44+
**padded** with `fill` (default `0`), never clamped, so the output shape is always
45+
exactly `(y2 - y1, x2 - x1)`. This makes fixed-size, centroid-following windows easy:
46+
47+
```python
48+
# Fixed 128x128 window centered on a point (may run off the frame edge -> padded).
49+
view = full.crop(center=(cx, cy), size=(128, 128), fill=0)
50+
view.shape # (n_frames, 128, 128, 3)
51+
```
52+
53+
`Video.crop` accepts one region spec — an explicit `crop` rect, a `bbox=(x1,y1,x2,y2)`,
54+
an `roi` (anything exposing shapely-style `.bounds`, expanded by `margin`), or a
55+
`center`/`size` pair:
56+
57+
```python
58+
full.crop((x1, y1, x2, y2)) # explicit rect
59+
full.crop(bbox=(x1, y1, x2, y2)) # same, named
60+
full.crop(roi=my_roi, margin=8) # axis-aligned bounds of an ROI + margin
61+
full.crop(center=(cx, cy), size=(w, h)) # fixed-size window
62+
```
63+
64+
---
65+
66+
## Coordinates
67+
68+
A crop is a pure integer translation by `(x1, y1)`, so mapping landmark coordinates
69+
between source and cropped frames is exact and NaN-preserving:
70+
71+
```python
72+
pts_crop = view.to_crop_coords(pts_source) # subtract (x1, y1)
73+
pts_source = view.to_source_coords(pts_crop) # add (x1, y1)
74+
```
75+
76+
On an uncropped video these are identity passthroughs, so the same call works
77+
regardless of whether a video happens to be cropped. The underlying functions live in
78+
`sleap_io.transform.points` as `crop_points` / `uncrop_points`.
79+
80+
!!! note "Coordinates are never rewritten on disk"
81+
Virtual cropping never mutates stored `instance.points`. These helpers are
82+
read-time conveniences for presenting/ingesting coordinates in cropped-frame space.
83+
84+
---
85+
86+
## Mosaics: many crops, one decode
87+
88+
Multiple differently-cropped views of one physical file can share a single decoder, so
89+
the source frame is decoded once per read rather than once per tile:
90+
91+
```python
92+
full = sio.load_video("session.mp4")
93+
tiles = [
94+
full.crop((x, y, x + 128, y + 128)) # share_decode=True (default)
95+
for y in range(0, 1080 - 128, 128)
96+
for x in range(0, 1920 - 128, 128)
97+
]
98+
labels = sio.Labels(videos=tiles)
99+
```
100+
101+
Each tile reuses `full`'s backend as its inner reader. The tiles do **not** own that
102+
shared decoder, so closing one tile does not tear down its siblings; the owning source
103+
`Video` manages the decoder's lifetime. (Decoder sharing is intentionally not preserved
104+
across `pickle`/`deepcopy`/`open()` — each reconstruction rebuilds its own reader.)
105+
106+
Two crops of the same file with **different** crops are kept distinct through merge,
107+
append, and matching; two crops with the **same** rect dedup to one view.
108+
109+
---
110+
111+
## Saving & loading (SLP round-trip)
112+
113+
Crops round-trip through `.slp` without breaking older readers:
114+
115+
```python
116+
sio.save_file(labels, "mosaic.slp")
117+
labels2 = sio.load_file("mosaic.slp")
118+
labels2.videos[0]._crop_tuple() # (0, 0, 128, 128) -- preserved
119+
labels2.videos[0].shape # (1000, 128, 128, 3)
120+
labels2.videos[0].source_video.shape # (1000, 1080, 1920, 3)
121+
len(labels2.videos) # all tiles preserved (not collapsed)
122+
```
123+
124+
- The crop rects are stored in a dedicated top-level `/video_crops` dataset, written
125+
**only when a crop is present**; the `videos_json` entry describes the **uncropped
126+
source**.
127+
- An older reader that does not understand `/video_crops` simply loads the uncropped
128+
source video — a graceful, lossy degrade, never an error.
129+
- Files with no crops are byte-identical to before this feature existed (no
130+
`/video_crops`, no format-version bump).
131+
132+
---
133+
134+
## Applying (baking) a crop to disk
135+
136+
A virtual crop can be **materialized** to a real video file — the cropped pixels become
137+
physical and the crop is no longer a read-time view. This is coordinate-neutral: a virtual
138+
crop already presents cropped-frame coordinates, so baking the pixels leaves all point
139+
coordinates unchanged.
140+
141+
`Video.apply_crop` bakes one cropped video and returns a new `Video` for the baked file,
142+
preserving provenance (`source_video` is the uncropped original):
143+
144+
```python
145+
view = full.crop((320, 200, 576, 456))
146+
baked = view.apply_crop("crop.mp4")
147+
baked.shape # (1000, 256, 256, 3) — cropped, now physical
148+
baked.source_video.shape # (1000, 1080, 1920, 3) — uncropped original
149+
baked._crop_tuple() # None — the crop is materialized, not virtual
150+
```
151+
152+
`Labels.apply_crops` bakes every virtually-cropped video in a `Labels` and rewires all
153+
references (labeled frames, ROIs, suggestions) to the baked files; uncropped videos are
154+
untouched and coordinates are unchanged:
155+
156+
```python
157+
labels.apply_crops(video_dir="baked_videos/") # one file per tile, unique names
158+
```
159+
160+
From the command line, `sio apply-crops` materializes every virtual crop in an SLP,
161+
writing baked videos to a directory next to the output and updating the references:
162+
163+
```bash
164+
sio apply-crops mosaic.slp -o baked.slp --video-dir baked_videos/
165+
```
166+
167+
!!! note "`apply_crop` vs `sio transform --crop`"
168+
`apply_crop` materializes an **existing** virtual crop (no coordinate change).
169+
`sio transform --crop` applies a **new** crop and adjusts coordinates — that is the
170+
materializing [`transform_video`](transforms.md) / `transform_labels` path:
171+
172+
```python
173+
sio.transform_video(full, "baked.mp4", sio.Transform(crop=(320, 200, 576, 456)))
174+
```
175+
176+
!!! info "Encoder padding"
177+
The H.264 encoder pads frame dimensions up to a multiple of 16 (bottom/right only,
178+
preserving the top-left content and coordinate alignment). A baked video whose cropped
179+
width/height are not multiples of 16 is padded on those edges.
180+
181+
---
182+
183+
## Performance expectations
184+
185+
The crop is applied **after** a full-frame decode for every backend except raw,
186+
sub-frame-chunked HDF5, where it can push the region read down to the storage layer:
187+
188+
| Backend | Strategy | I/O effect |
189+
|---|---|---|
190+
| `MediaVideo` (mp4/H.264/…) | decode full frame, slice | **No decode/I/O savings** — inter-frame codecs must decode the whole frame; the slice is a free in-memory view. Saves resident array size only. |
191+
| `HDF5Video` raw rank-4, **sub-frame chunked** | hyperslab region read (`ds[i, y1:y2, x1:x2, :]`) | **Real I/O reduction** — only the overlapping chunks are read/decompressed. The one case where a crop saves disk work. |
192+
| `HDF5Video` raw rank-4, per-frame chunked | region read (whole chunk still fetched) | Modest — skips chunk reassembly, not I/O. |
193+
| `HDF5Video` embedded PNG/JPEG (`.pkg.slp`) | decode full image, slice | **No savings** — the whole image must be decoded before any spatial selection. |
194+
| `ImageVideo`, `TiffVideo`, `SeqVideo` | decode full frame, slice | **No savings** with the current decoders. |
195+
196+
Pushdown for raw HDF5 is automatic and gated on the dataset's actual chunking; it falls
197+
back to a full decode plus slice (byte-identical) whenever it would not help.
198+
199+
---
200+
201+
## Non-goals
202+
203+
Virtual cropping is a pure translate-and-clip view. It deliberately does **not** do:
204+
205+
- **Rotation, scale, pad, or flip on read** — those remain the domain of the
206+
materializing [`Transform`](transforms.md) pipeline.
207+
- **Decode-cost savings for compressed video** — only sub-frame-chunked raw HDF5 sees
208+
real I/O savings; everywhere else the crop is a free post-decode view.
209+
- **Lossless export through non-SLP writers** (NWB, COCO, JABS, Ultralytics) — those
210+
formats have no crop concept; exporting a cropped `Labels` through them is acceptably
211+
lossy (the cropped frame and its coordinates are emitted as-is).
212+
- **Rewriting on-disk point coordinates** — the source labels are never mutated.
213+
214+
---
215+
216+
## See also
217+
218+
- [Transforms](transforms.md): the materializing crop/scale/rotate/pad/flip pipeline.
219+
- [Video](model/video.md): the `Video` facade and its backends.

docs/examples.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1007,6 +1007,73 @@ sio.save_video(sio.load_video("input.mp4"), "output.mp4")
10071007
!!! note "See also"
10081008
[`save_video`](formats/#sleap_io.save_video): Video saving options and codec settings
10091009

1010+
### Virtual cropping and batch autocrop
1011+
1012+
Expose a virtual, on-read crop of a video — frames are decoded and sliced in memory, with no pixels copied or re-encoded ([`Video.crop`](model/video.md#sleap_io.Video.crop) / [`Video.from_crop`](model/video.md#sleap_io.Video.from_crop)). The crop is `(x1, y1, x2, y2)` in source pixels (`x2`/`y2` exclusive); out-of-bounds regions are padded.
1013+
1014+
```python title="virtual_crop.py" linenums="1"
1015+
import sleap_io as sio
1016+
1017+
full = sio.load_video("session.mp4") # (1000, 1080, 1920, 3)
1018+
view = full.crop((320, 200, 576, 456)) # virtual view, no decode yet
1019+
view.shape # (1000, 256, 256, 3)
1020+
view.is_cropped, view.crop_rect # True, (320, 200, 576, 456)
1021+
view.source_video is full # True - provenance preserved
1022+
frame = view[0] # decode-then-slice (256, 256, 3)
1023+
1024+
# Other region specs: a bbox, an ROI (+ margin), or a fixed-size centered window.
1025+
view = full.crop(bbox=(320.0, 200.0, 576.0, 456.0))
1026+
view = full.crop(roi=my_shapely_poly, margin=8)
1027+
view = full.crop(center=(cx, cy), size=(128, 128)) # fixed shape; off-frame is padded
1028+
```
1029+
1030+
**Batch autocrop (e.g. a multi-chamber rig).** Apply a fixed set of per-chamber rects across many recordings and write one cropped file per `(video x chamber)`. `apply_crop` bakes the virtual crop to disk and keeps `source_video` pointing at the uncropped original.
1031+
1032+
```python title="batch_autocrop.py" linenums="1"
1033+
import sleap_io as sio
1034+
from pathlib import Path
1035+
1036+
# Chamber layout, defined once (x1, y1, x2, y2). 16-aligned dims avoid encoder padding.
1037+
chambers = {
1038+
"A": (0, 0, 640, 480),
1039+
"B": (640, 0, 1280, 480),
1040+
"C": (0, 480, 640, 960),
1041+
"D": (640, 480, 1280, 960),
1042+
}
1043+
1044+
out_dir = Path("crops")
1045+
out_dir.mkdir(exist_ok=True)
1046+
for path in Path("recordings").glob("*.mp4"):
1047+
full = sio.load_video(path.as_posix())
1048+
for name, rect in chambers.items():
1049+
crop = sio.Video.from_crop(full, rect)
1050+
crop.apply_crop((out_dir / f"{path.stem}_{name}.mp4").as_posix())
1051+
```
1052+
1053+
Prefer to stay lazy (no re-encode) and carry the crops in a labels file? Build the views into a `Labels`, save (crops ride a `/video_crops` dataset; pixels are untouched), and bake them all later in one call with [`Labels.apply_crops`](model/labels.md#sleap_io.Labels.apply_crops):
1054+
1055+
```python title="virtual_crop_slp.py" linenums="1"
1056+
import sleap_io as sio
1057+
1058+
full = sio.load_video("session.mp4")
1059+
tiles = [sio.Video.from_crop(full, rect) for rect in chambers.values()]
1060+
sio.save_file(sio.Labels(videos=tiles), "session.slp") # virtual; no re-encode
1061+
1062+
# Later - materialize every virtual crop to real files and update references:
1063+
sio.load_file("session.slp").apply_crops(video_dir="crops/")
1064+
```
1065+
1066+
The same step is available from the command line for an SLP that already carries virtual crops:
1067+
1068+
```bash
1069+
sio apply-crops session.slp -o baked.slp --video-dir crops/
1070+
```
1071+
1072+
!!! note "See also"
1073+
- [Virtual cropping guide](cropping.md): conventions, mosaics, coordinates, performance, and non-goals.
1074+
- [`Video.apply_crop`](model/video.md#sleap_io.Video.apply_crop) / [`Labels.apply_crops`](model/labels.md#sleap_io.Labels.apply_crops): materialize virtual crops to disk.
1075+
- [Transforms](transforms.md): the materializing crop/scale/rotate/pad/flip pipeline (`sio transform --crop` applies a *new* crop and adjusts coordinates).
1076+
10101077
### Switch video and image backends
10111078

10121079
Control which backend is used for video reading and embedded frame encoding.

docs/formats/dlc.md

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,32 @@ up from the CSV — the following extra metadata is imported:
1616
Pass `config=False` to disable config use entirely and reproduce the legacy,
1717
config-free output.
1818

19-
!!! note "Cropping is not yet applied"
19+
!!! note "Cropping (`video_sets[...].crop`)"
2020
DeepLabCut's `video_sets[...].crop` is a *virtual* read-time crop (an ROI
21-
that DLC's video reader slices out of each full frame on the fly). When a
22-
project uses cropping, the images under `labeled-data/<video>/` are the
23-
cropped region and the labels are stored in **cropped-frame coordinates**,
24-
whereas the linked `source_video` points at the original, **uncropped**
25-
video. sleap-io does not yet apply this crop, so for cropped projects the
26-
labels are offset from the source video by the crop origin `(x1, y1)`.
27-
Reconciling the two requires virtual ROI-cropping of a `Video` on read,
28-
which is planned future work. For the common case of no cropping (the DLC
29-
default is the full frame), there is no offset and the link is exact.
21+
that DLC's video reader slices out of each full frame). The images under
22+
`labeled-data/<video>/` are the cropped region and the labels are stored in
23+
**cropped-frame coordinates**, while the linked `source_video` is the
24+
original, **uncropped** video. sleap-io now imports this crop:
25+
26+
- The crop rect is parsed from `video_sets` (DLC stores it width-range-first
27+
as `x1, x2, y1, y2`; sleap-io reorders it to its `(x1, y1, x2, y2)`
28+
convention, `x2`/`y2` exclusive) and recorded under
29+
`labels.provenance["dlc_crops"]`, keyed by source-video path. This record
30+
**persists through an SLP round-trip**.
31+
- Labels are left **verbatim in cropped-frame coordinates** on the uncropped
32+
`labeled-data` `ImageVideo` — no offset is applied (and the already-cropped
33+
images are never cropped again). To map a label into the full source frame,
34+
use [`Video.to_source_coords`](../model/video.md#sleap_io.Video.to_source_coords)
35+
with the recorded rect (it adds the crop origin `(x1, y1)`).
36+
- When the source video file is available, `source_video` is set to a
37+
[`Video.from_crop`](../model/video.md#sleap_io.Video.from_crop) view of it,
38+
so `source_video.crop_rect` / `to_source_coords` work in memory (this view's
39+
crop is in-memory only; the persistent record is `provenance["dlc_crops"]`).
40+
When the source is absent, `source_video` is a closed `Video` as before.
41+
- Identity crops at the origin (`0, W, 0, H` — the DLC no-cropping default)
42+
record no crop and leave the link exact.
43+
44+
See the [virtual cropping guide](../cropping.md) for the crop conventions.
3045

3146
```python
3247
import sleap_io as sio

0 commit comments

Comments
 (0)