Skip to content

fix(pptx): add retry, timeout, and lock cleanup to thumbnail soffice conversion#982

Open
voidborne-d wants to merge 1 commit intoanthropics:mainfrom
voidborne-d:fix/pptx-thumbnail-soffice-retry
Open

fix(pptx): add retry, timeout, and lock cleanup to thumbnail soffice conversion#982
voidborne-d wants to merge 1 commit intoanthropics:mainfrom
voidborne-d:fix/pptx-thumbnail-soffice-retry

Conversation

@voidborne-d
Copy link
Copy Markdown

Summary

Fixes #886thumbnail.py's convert_to_images() calls soffice --headless --convert-to pdf once with no retry, no timeout, no --norestore flag, and no stderr capture. In containerized environments (Docker, sandboxed VMs), LibreOffice fails intermittently due to stale lock files, concurrent processes, or profile corruption. The only error message was a bare "PDF conversion failed" with zero diagnostics.

Root Cause

The xlsx/scripts/recalc.py in this same repo already uses --norestore and timeouts for its soffice calls, but pptx/scripts/thumbnail.py was written without these safeguards. The discrepancy means thumbnail generation is fragile in exactly the environments where the PPTX skill is most commonly used (containerized Claude Code sessions).

Changes

skills/pptx/scripts/thumbnail.py

  1. Retry loop (3 attempts): on soffice failure or timeout, clean up stale lock files and retry before giving up
  2. --norestore flag: prevents LibreOffice from trying to recover a previous session (parity with recalc.py)
  3. 120s timeout: prevents soffice from hanging indefinitely on corrupted profiles or deadlocked processes; also added to the pdftoppm call
  4. stderr in error messages: RuntimeError messages now include the actual soffice/pdftoppm stderr output for debugging
  5. _cleanup_soffice_lock() helper: removes stale .~lock.* files from both the temp dir and the default LibreOffice user profile between retries

skills/pptx/scripts/tests/test_thumbnail_soffice_retry.py (new)

18 regression tests organized in 3 test classes:

  • TestSourceAudit (7 tests): AST-level verification that --norestore is present, timeout= is set on both subprocess calls, retry/timeout constants are defined, lock cleanup function exists, no bare error messages without stderr detail remain, and pdftoppm has timeout
  • TestConvertToImagesRetry (8 tests): functional tests via stubbed module loading — success on first/second attempt, all retries exhausted with stderr detail, timeout triggers retry, all timeouts exhausted, pdftoppm timeout/error with stderr, --norestore flag presence in actual command list
  • TestCleanupLockFunction (3 tests): lock file removal, no-op on empty dirs, non-lock file preservation

All tests use lightweight stubs (no PIL/defusedxml/soffice required), suitable for CI.

Testing

python3 -m pytest skills/pptx/scripts/tests/test_thumbnail_soffice_retry.py -v
# 18 passed

…conversion

thumbnail.py's convert_to_images() called soffice once with no retry,
no timeout, no --norestore, and no stderr capture. In containerized
environments (Docker, sandboxed VMs), LibreOffice fails intermittently
due to stale lock files, concurrent processes, or profile corruption.
The only error message was 'PDF conversion failed' with zero diagnostics.

Changes:
- Add retry loop (3 attempts) with lock file cleanup between retries
- Add --norestore flag (parity with xlsx/scripts/recalc.py)
- Add 120s timeout to both soffice and pdftoppm subprocess calls
- Surface stderr in RuntimeError messages for debugging
- Add _cleanup_soffice_lock() helper to remove stale .~lock.* files

New tests: 18 regression tests covering retry logic, timeout handling,
lock file cleanup, error message quality, and source-level audits.

Closes anthropics#886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pptx skill: LibreOffice PDF conversion fails intermittently with no retry

1 participant