Skip to content

fix(cli): fix reencode deadlock at 0 frames; default .mp4 output + --replace#507

Merged
talmo merged 5 commits into
mainfrom
fix/reencode-ffmpeg-stderr-deadlock
Jun 23, 2026
Merged

fix(cli): fix reencode deadlock at 0 frames; default .mp4 output + --replace#507
talmo merged 5 commits into
mainfrom
fix/reencode-ffmpeg-stderr-deadlock

Conversation

@talmo

@talmo talmo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a hard hang in sio reencode and adds two requested UX enhancements. Three logical changes (separate commits):

  1. fix: sio reencode (ffmpeg path) could hang forever at 0/X frames — spinner still animating, Ctrl+C unresponsive.
  2. feat: default output is now always {stem}.reencoded.mp4 (no longer keeps .mov/.mkv/.avi).
  3. feat: new --replace flag for destructive in-place reencode.

1. The deadlock (root cause)

_run_ffmpeg_with_progress ran ffmpeg with stderr=subprocess.PIPE but, during the encode, only read stdout (-progress pipe:1); stderr was read only after the process exited. ffmpeg still writes its banner, stream info, and periodic frame= fps= stats to stderr even with -progress pipe:1. Once the stderr pipe buffer fills, ffmpeg blocks on the stderr write → it stops emitting stdout progress → stdout.readline() blocks forever. The spinner keeps spinning because rich animates on a separate thread, masking the dead main thread; Ctrl+C can't be delivered while the main thread is parked in a blocking pipe read.

Measured (real drone-clip.mov, HEVC 1080p60, 11.46s)

Probe Value
Windows anonymous-pipe buffer 4096 bytes
ffmpeg startup stderr (banner + input info) 2443 bytes
total stderr for the full encode 5838 bytes (> 4096 → blocks before first progress flush → stuck at frame 0)
Old code on the real file (watchdog) DEADLOCK at frame 0
Fixed code ✅ 688 frames in ~2.7s

Hit on essentially any real-length video; worse on Windows (4 KB pipe buffer vs Linux's 64 KB), which is why short CI fixtures never tripped it.

Fix

  • Split out _run_subprocess_with_progress (testable core) from _run_ffmpeg_with_progress (which just inserts the progress flags and delegates).
  • Drain stderr concurrently on a background thread (kept for error reporting) so ffmpeg never blocks.
  • Tear down the child on KeyboardInterrupt (+ finally safety net) so Ctrl+C aborts cleanly instead of orphaning ffmpeg.

2. Default output is always .mp4

Reencoding always produces H.264/MP4, so keeping the source container extension on the output was misleading. The default is now {stem}.reencoded.mp4 regardless of input (the .reencoded infix keeps it distinct even for .mp4 inputs). Explicit -o is still honored as-is.

3. --replace (destructive in-place reencode)

sio reencode foo.mov --replace   -> foo.mp4   (deletes foo.mov)
sio reencode foo.mp4 --replace   -> foo.mp4   (replaced in place)

Encodes to a temp file in the destination directory, then atomically os.replaces it over {stem}.mp4, deleting the original when the extension changes (ffmpeg can't read+write the same file in one pass). The temp file is cleaned up on failure/interrupt. --replace is mutually exclusive with -o/--output and unsupported for SLP inputs; clobbering an unrelated existing {stem}.mp4 still requires --overwrite.

Testing

  • Deadlock regression (test_run_subprocess_with_progress_*): feed a stand-in child emitting more stderr than any OS pipe buffer while streaming frame=N, run in a guard thread so a regression fails (not hangs) the test. Verified these deadlock under the old code and pass under the fix.
  • --replace / .mp4 default: dry-run path/messaging, -o conflict, SLP rejection, existing-target --overwrite guard, plus real in-place (.mp4) and extension-change (.mov.mp4, original deleted) encodes (the two real encodes are skipif win32 like the other slow video tests, but were verified manually on Windows here).
  • Full reencode suite green; lint clean. (One unrelated CSV-export test fails locally on a terminal-width rich-panel quirk — pre-existing on main.)
  • End-to-end re-verified on the real drone-clip.mov: 688/688 frames in ~3s.

Design decisions

  • Drain (not suppress) stderr: keeps ffmpeg's error text available on failure; we just read it concurrently. -nostats/-loglevel error would only reduce, not eliminate, the deadlock risk.
  • Refactor for testability: the helper takes an already-built command so tests inject a fake frame=-emitting child without ffmpeg-specific flag insertion poisoning the command — no mocking/monkeypatching.
  • Separate --replace (not overloading --overwrite): --overwrite keeps its existing "clobber an existing output" meaning; --replace is the distinct in-place operation.

🤖 Generated with Claude Code

`sio reencode <video>` (ffmpeg fast path) could hang forever at `0/X frames`
with the rich spinner still animating and Ctrl+C unresponsive.

Root cause: `_run_ffmpeg_with_progress` ran ffmpeg with `stderr=PIPE` but only
read stdout (`-progress pipe:1`) during the encode, draining stderr only after
the process exited. ffmpeg still writes its banner, stream info, and periodic
stats to stderr; once that pipe buffer fills (~4 KB on Windows, ~64 KB on
Linux) ffmpeg blocks on the stderr write, stops emitting stdout progress, and
`stdout.readline()` parks forever. The spinner kept spinning because rich
animates on its own thread, masking the dead main thread.

Fix:
- Split the run logic into `_run_subprocess_with_progress`, which drains stderr
  concurrently on a background thread (collected for error reporting) so the
  child never blocks. `_run_ffmpeg_with_progress` now just inserts the progress
  flags and delegates.
- Tear down the ffmpeg child on `KeyboardInterrupt` (and via a `finally` safety
  net) so Ctrl+C aborts cleanly instead of orphaning ffmpeg.

Add regression tests that feed a stand-in child emitting >pipe-buffer stderr
with `frame=N` progress; these deadlock under the old code and pass now.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KpHzNYmZ7gzJmhck23ZS2M
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.76068% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.98%. Comparing base (f13450c) to head (4ef4c25).

Files with missing lines Patch % Lines
sleap_io/io/cli.py 83.76% 18 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #507      +/-   ##
==========================================
- Coverage   93.04%   92.98%   -0.06%     
==========================================
  Files          54       54              
  Lines       19532    19588      +56     
  Branches     4412     4424      +12     
==========================================
+ Hits        18173    18214      +41     
- Misses        647      661      +14     
- Partials      712      713       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions Bot pushed a commit that referenced this pull request Jun 23, 2026
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Docs Preview

Preview has been removed.

talmo and others added 2 commits June 23, 2026 11:07
…ace)

Two reencode UX enhancements:

- Default output is now always `{stem}.reencoded.mp4` regardless of the input
  container. Reencoding always produces H.264/MP4, so keeping a `.mov`/`.mkv`/
  `.avi` extension on the output was misleading; the `.reencoded` infix keeps it
  distinct from the source even when the input is already `.mp4`.

- New `--replace` flag for a destructive in-place reencode: encodes to a temp
  file in the destination directory, atomically moves it over `{stem}.mp4`, and
  deletes the original if the extension changed (e.g. `foo.mov` -> `foo.mp4`,
  removing `foo.mov`). ffmpeg can't read and write the same file in one pass, so
  the temp+rename is required; the temp is cleaned up on failure/interrupt.
  `--replace` is mutually exclusive with `-o/--output` and unsupported for SLP
  inputs. Clobbering an unrelated existing `{stem}.mp4` still requires
  `--overwrite`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KpHzNYmZ7gzJmhck23ZS2M
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KpHzNYmZ7gzJmhck23ZS2M
@talmo talmo changed the title fix(cli): drain ffmpeg stderr to stop reencode deadlocking at 0 frames fix(cli): fix reencode deadlock at 0 frames; default .mp4 output + --replace Jun 23, 2026
github-actions Bot pushed a commit that referenced this pull request Jun 23, 2026
github-actions Bot pushed a commit that referenced this pull request Jun 23, 2026
github-actions Bot pushed a commit that referenced this pull request Jun 23, 2026
@talmo talmo merged commit 943d53d into main Jun 23, 2026
14 of 15 checks passed
github-actions Bot added a commit that referenced this pull request Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant