Commit 64207d1
refactor: extract per-episode video recorder into shared module (#52)
* refactor: extract per-episode video recorder into shared module
#53 (RoboMME) wires episode mp4 saving directly into `RoboMMEBenchmark`
— a buffered list of frames + an `imageio.mimsave` at the end of every
episode. That pattern is going to recur in every benchmark that wants
agent-view captures (failure-case debugging, demo browsing, public
showcase, qualitative analysis), so lift it out of RoboMME and into
`vla_eval.benchmarks.recording.EpisodeVideoRecorder`.
Improvements over the inline implementation in #53:
- **Streaming write** via `imageio.get_writer().append_data()` —
O(1) memory regardless of episode length. A 1300-step 256×256×3
episode that previously held ~250 MB of pending frames now holds one.
A side benefit: a partially-written mp4 is left on disk if the
process is killed mid-episode, playable up to the last completed
frame, useful for debugging crashes.
- **Atomic finalize** via `tempfile.mkstemp` in the output dir +
`os.replace` on `save()`. Concurrent jobs sharing a directory
don't collide on partially-written files.
- **Logging-style filename templating**: `str.format` template (or
callable) over a context dict the caller passes at `start()`, plus
a `status` key injected at `save()`. `filename` and
`required_context` are required at construction time — every
benchmark identifies tasks differently (`env_id`, `task_id`,
`suite/task` …) so there is no universal default. Required-context
validation catches mismatched keys at `start()` rather than as a
silent dropped mp4 at `save()` time.
- **Latched failure**: first `record()` failure disables the recorder
for the rest of the episode so a wedged writer subprocess doesn't
flood the log with one warning per step.
`RoboMMEBenchmark` now constructs an `EpisodeVideoRecorder` (when
`save_episode_video=True`) and routes the same `reset → step → save`
hooks through it. The previous public flags (`save_episode_video`,
`video_dir`) and behavior are preserved; the internal
`_episode_frames` buffer + `_save_episode_video` method are removed.
`imageio[ffmpeg]` is added as a runtime dep so the recorder works on
a base install (it was previously an undeclared transitive of the
RoboMME image only).
`tests/test_recording.py` (16 tests, 0.8 s) covers the lifecycle:
happy path, str + callable filename templating, subdirectory
templates, missing template key at save, required-context
validation, no-op semantics for record/save/discard before start,
writer-open failure, mid-episode start-again cleanup, lazy
output-dir creation, str path support.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(recording): cut overengineering, fix UX papercuts
Followups after a self-review pass and reviewer feedback:
## Drop overclaimed atomic-write machinery
The previous code used `tempfile.mkstemp` + scattered cleanup paths
across every save() / discard() failure mode, justified by claims
that mostly didn't hold:
- "Concurrent jobs sharing a directory don't collide" — atomic write
doesn't actually prevent collisions when two writers target the same
final filename; it just controls *when* the overwrite happens.
- "Partially-written mp4 left on disk if killed mid-episode" —
arguably worse than the alternative: the leftover is a random
`.recorder-XXXX.mp4` with no indication of which episode it was.
- "Atomic vs readers" — there are no concurrent readers; eval mp4s
are post-run artifacts.
Renaming at save() *is* still necessary because the filename
encodes status (success/fail), which only resolves at episode end.
That part is kept — but the working file is now a single
deterministic per-instance path (`.recorder-<uid>.mp4` set at
__init__) and cleanup goes through a single `_safe_unlink` helper.
Result: ~30 fewer lines, simpler error paths, same behaviour for
the cases that actually matter.
## Real collision detection (overwrite=False)
Previously, two episodes that resolved to the same final filename
silently overwrote — a programmer-error (e.g. a duplicate
`episode_idx`) lost frames with no warning. `save()` now checks
the final path before renaming and raises `FileExistsError` by
default, leaving the working file on disk so the caller can
recover. `EpisodeVideoRecorder(..., overwrite=True)` keeps the
old behaviour for callers that want it.
## Auto-derive `required_context` for str templates
Repeating `("env_id", "episode_idx")` next to a
`"{env_id}_ep{episode_idx:04d}_{status}.mp4"` template was pure
duplication. When `required_context` is omitted on a str template,
it's now derived from the template's field names via
`string.Formatter().parse()` (status excluded; format specs and
attribute/index access stripped). Callable templates still
require an explicit value — there's no way to introspect a
callable's key dependencies.
Explicit `required_context` is still allowed and is treated as
deliberate, so the existing "subset = some keys are optional"
semantics still work.
## Filename UX papercuts
- Zero-pad `episode_idx` in the docstring example and the RoboMME
wiring (`{episode_idx:04d}` not `{episode_idx}`). Without this,
ep10 sorts before ep2 in any directory listing.
- New "Filename layout" section calling out the two field-ordering
cases that bite users: episode-first for multi-camera
(front/wrist of the same episode adjacent), task-first for
multi-task single-camera.
## Tests
- Updated existing tests to use `:04d` template + auto-derived
required_context where appropriate.
- New tests: `_fields_from_template` direct tests, callable
filename rejects `required_context=None`, save raises on
collision by default, save overwrites with `overwrite=True`.
- 20 tests passing in 1.05 s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(recording): drop writer_kwargs YAGNI; widen docstring wraps
## Drop `writer_kwargs`
The constructor accepted ``writer_kwargs: Mapping[str, Any]`` and
forwarded it verbatim to ``imageio.get_writer``. Two reasons to
remove it:
1. **It's the only imageio-shaped knob in the public API.** Every
other recorder method (`start`, `record`, `save`, `discard`) is
library-agnostic — only ``writer_kwargs`` would break callers if
we ever swapped the encode backend. Removing it makes the
surface fully encapsulated, so the imageio dep is a pure
implementation detail.
2. **Currently unused.** RoboMME's wiring doesn't pass it; tests
don't pass it; no caller in the tree depends on it. Speculative
API surface that doesn't carry weight.
If a real codec/quality-tuning need lands later, it's better added
back with a deliberate design (e.g. a backend selector + a
neutralized kwarg shape) than left as an imageio leak.
## Widen docstring line wraps
Project ``[tool.ruff].line-length`` is 119 but the recording.py
docstrings were wrapping at ~70 chars (commit-message habit). That
made the docstrings noisier than they needed to be — every paragraph
broke 3-4 times where it could have been 1-2. Reflowed to land near
~100 chars (well under the 119 limit, leaving a margin for further
edits). Same content, ~30% fewer lines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4e89c6d commit 64207d1
4 files changed
Lines changed: 666 additions & 34 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
0 commit comments