Skip to content

Commit 2f89be5

Browse files
committed
Add PGS to SRT OCR subtitle extraction
Implement OCR conversion for PGS (Presentation Graphic Stream) subtitles to SRT format using pgsrip library with auto-detection of required tools. Features: - Auto-detect Tesseract OCR from PATH or Subtitle Edit installations - Auto-detect MKVToolNix (mkvextract/mkvmerge) from standard locations - Support for multiple language codes (2-letter, 3-letter, names) - Automatic cleanup of temporary .sup files after conversion - Works when running FastFlix from source Known limitation: Due to an upstream issue in pgsrip v0.1.12, this feature does not work in PyInstaller-built executables. Users needing PGS OCR should run FastFlix from source with: python -m fastflix Dependencies added: - pgsrip (OCR engine for PGS subtitles) - pytesseract (Tesseract OCR Python wrapper) - babelfish (language code handling) - cleanit, trakit (metadata handling) - opencv-python, pysrt (image/subtitle processing)
1 parent 835607e commit 2f89be5

File tree

4 files changed

+6
-71
lines changed

4 files changed

+6
-71
lines changed

README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,11 +94,9 @@ FastFlix can extract subtitles from video files in various formats (SRT, ASS, SS
9494
## PGS to SRT OCR
9595

9696
**Requirements**:
97-
- Tesseract OCR 4.x or higher
98-
- MKVToolNix (mkvextract, mkvmerge)
99-
- pgsrip Python library
100-
101-
**Known Limitation**: PGS OCR only works when running FastFlix from source (`python -m fastflix`), not in PyInstaller-built executables due to a bug in pgsrip v0.1.12. See [WINDOWS_BUILD.md](WINDOWS_BUILD.md#pgs-to-srt-ocr-conversion-pyinstaller-builds) for details.
97+
- Tesseract OCR 4.x or higher (auto-detected from PATH or Subtitle Edit installations)
98+
- MKVToolNix (mkvextract, mkvmerge) (auto-detected from standard install locations)
99+
- pgsrip Python library (included in FastFlix)
102100

103101
# HDR
104102

WINDOWS_BUILD.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -120,20 +120,11 @@ The FastFlix executable doesn't include FFmpeg. You need to:
120120

121121
## Known Limitations
122122

123-
### PGS to SRT OCR Conversion (PyInstaller builds)
123+
### PGS to SRT OCR (PyInstaller builds)
124124

125-
The PGS to SRT OCR feature works perfectly when running FastFlix from source (`python -m fastflix`), but has a known issue in PyInstaller-built executables:
125+
Due to an upstream issue in pgsrip v0.1.12, PGS to SRT OCR conversion does not work in PyInstaller-built executables. The feature works perfectly when running from source (`python -m fastflix`).
126126

127-
**Issue**: The pgsrip library (v0.1.12) has a bug where `MediaPath.create_temp_folder()` doesn't work correctly in frozen PyInstaller executables. This causes mkvextract to fail with exit code 2.
128-
129-
**Workaround**: If you need PGS OCR functionality, run FastFlix from source instead of using the compiled executable.
130-
131-
**Requirements** (when running from source):
132-
- Tesseract OCR 4.x or higher (auto-detected from PATH or Subtitle Edit installations)
133-
- MKVToolNix (mkvextract, mkvmerge) (auto-detected from PATH or standard install locations)
134-
- pgsrip Python library (installed via pip)
135-
136-
This is a known upstream bug in pgsrip when used with PyInstaller and cannot be fixed without patching the pgsrip library itself.
127+
If you need PGS OCR functionality, please run FastFlix from source instead of using the compiled executable.
137128

138129
## Notes
139130

fastflix/__main__.py

Lines changed: 0 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -8,61 +8,14 @@
88
from fastflix.entry import main
99

1010

11-
def patch_pgsrip_for_pyinstaller():
12-
"""Monkey-patch pgsrip to fix temp folder creation in PyInstaller.
13-
14-
pgsrip's MediaPath.create_temp_folder() doesn't work correctly in frozen
15-
PyInstaller executables, so we patch MkvPgs.read_data to handle it.
16-
"""
17-
try:
18-
import tempfile
19-
from subprocess import check_output
20-
21-
# Import pgsrip.mkv module to patch it
22-
from pgsrip import mkv as pgsrip_mkv
23-
24-
@classmethod
25-
def patched_read_data(cls, media_path, track_id, temp_folder):
26-
"""Patched version that ensures temp_folder exists as a directory"""
27-
# Check if temp_folder exists as a directory
28-
temp_folder_path = Path(temp_folder)
29-
if not temp_folder_path.exists() or not temp_folder_path.is_dir():
30-
# Create our own temp folder if pgsrip's creation failed
31-
temp_folder = tempfile.mkdtemp(prefix=f"{Path(str(media_path)).stem}_", suffix=".pgsrip")
32-
33-
lang_ext = f".{str(media_path.language)}" if media_path.language else ""
34-
sup_file = os.path.join(temp_folder, f"{track_id}{lang_ext}.sup")
35-
cmd = ["mkvextract", str(media_path), "tracks", f"{track_id}:{sup_file}"]
36-
check_output(cmd)
37-
with open(sup_file, mode="rb") as f:
38-
return f.read()
39-
40-
# Apply the monkey-patch
41-
pgsrip_mkv.MkvPgs.read_data = patched_read_data
42-
print("DEBUG: pgsrip monkey-patch applied successfully")
43-
except ImportError as e:
44-
# pgsrip not installed, skip patching
45-
print(f"DEBUG: pgsrip monkey-patch skipped - ImportError: {e}")
46-
except Exception as e:
47-
# Other error during patching
48-
print(f"DEBUG: pgsrip monkey-patch failed - {type(e).__name__}: {e}")
49-
50-
5111
def setup_ocr_environment():
5212
"""Set up environment variables for OCR tools early in app startup.
5313
5414
This is necessary for PyInstaller frozen executables where os.environ
5515
modifications later in the code don't properly propagate to subprocesses.
5616
"""
57-
import tempfile
5817
from fastflix.models.config import find_ocr_tool
5918

60-
# Ensure TEMP/TMP point to standard locations for PyInstaller compatibility
61-
# pgsrip creates temp folders and needs writable temp directory
62-
temp_dir = tempfile.gettempdir()
63-
os.environ["TEMP"] = temp_dir
64-
os.environ["TMP"] = temp_dir
65-
6619
# Find tesseract and add to PATH
6720
tesseract_path = find_ocr_tool("tesseract")
6821
if tesseract_path:
@@ -76,9 +29,6 @@ def setup_ocr_environment():
7629
mkvtoolnix_dir = str(Path(mkvmerge_path).parent)
7730
os.environ["PATH"] = f"{mkvtoolnix_dir}{os.pathsep}{os.environ.get('PATH', '')}"
7831

79-
# Patch pgsrip AFTER environment is set up
80-
patch_pgsrip_for_pyinstaller()
81-
8232

8333
def start_fastflix():
8434
exit_code = 2

fastflix/widgets/background_tasks.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -233,13 +233,11 @@ def _convert_sup_to_srt(self, sup_filepath: str) -> bool:
233233

234234
# Set environment variables for pgsrip to find tesseract and mkvextract
235235
if self.app.fastflix.config.tesseract_path:
236-
# Add tesseract directory to PATH so pytesseract can find it
237236
tesseract_dir = str(Path(self.app.fastflix.config.tesseract_path).parent)
238237
os.environ["PATH"] = f"{tesseract_dir}{os.pathsep}{os.environ.get('PATH', '')}"
239238
os.environ["TESSERACT_CMD"] = str(self.app.fastflix.config.tesseract_path)
240239

241240
if self.app.fastflix.config.mkvmerge_path:
242-
# Add MKVToolNix directory to PATH so pgsrip can find mkvextract
243241
mkvtoolnix_dir = str(Path(self.app.fastflix.config.mkvmerge_path).parent)
244242
os.environ["PATH"] = f"{mkvtoolnix_dir}{os.pathsep}{os.environ.get('PATH', '')}"
245243

@@ -278,8 +276,6 @@ def _convert_sup_to_srt(self, sup_filepath: str) -> bool:
278276
pgsrip.rip(media, options)
279277

280278
# Find newly created .srt files
281-
# Note: Can't use glob with video filename directly because special chars like []
282-
# are interpreted as glob patterns. Instead, find new .srt files.
283279
current_srts = set(video_path.parent.glob("*.srt"))
284280
new_srts = current_srts - existing_srts
285281

0 commit comments

Comments
 (0)