Troubleshooting

This document covers common issues: GPU OOM, HuggingFace gated models, provider API keys, and dependency conflicts. See also Optional dependency conflicts for module-specific workarounds.

1. GPU out-of-memory (OOM)

Symptoms: CUDA/ROCm out-of-memory errors, crash during detection/OCR/inpainting, or "CUDA error: out of memory".

What to try	Notes
Reduce batch / size	Use smaller detect_size (e.g. 1024), inpaint_size, or disable parallel translation. Config → DL Module → set device or module params.
Tiled inpainting	Config → Inpainting → Inpaint tile size (e.g. 512 or 1024) and overlap (e.g. 64). Reduces peak VRAM for large images.
Load model on demand	Config → DL Module → Load model on demand. Models load only when a pipeline runs; frees VRAM when idle.
Unload after idle	Config → DL Module → Unload models after idle (e.g. 5–10 min). Frees VRAM when you leave the app idle.
PYTORCH_ALLOC_CONF	Already set by `launch.py`: `max_split_size_mb:512` to reduce fragmentation. You can tune or set before launch: `set PYTORCH_ALLOC_CONF=max_split_size_mb:512` (Windows) or `export PYTORCH_ALLOC_CONF=max_split_size_mb:512` (Linux/macOS).
Close other GPU apps	Browsers, other ML tools, or games can hold VRAM. Close them and retry.
CPU for some stages	Set device to CPU for detector, OCR, or inpainter in Config → DL Module to move that stage off GPU. On 11 GB GPUs with GPU-heavy OCR (e.g. qwen35), running the detector (hf_object_det) on CPU avoids OOM; the detector will also auto-fallback to CPU for YOLO if GPU runs out of memory during load or predict.
Inpaint full image	If per-block inpainting OOMs, try Config → Inpainting → Inpaint full image (uses one big pass; can still be heavy but sometimes more stable).

2. HuggingFace gated models

Symptoms: "401 Unauthorized", "Repository not found", or prompt to log in when downloading models (e.g. LLaMA, some OCR/detector models).

What to do	Notes
Accept terms	On huggingface.co, open the model page (e.g. `meta-llama/Llama-2-7b`) and accept the license/terms if required.
HF token	Create a token: Hugging Face → Settings → Access Tokens → New. Use Read (or higher if you need write).
Where to set	Config → General → HuggingFace token, or set env var `HF_TOKEN` (or `HUGGING_FACE_HUB_TOKEN`) before running. Prefer env var so the token is not stored in config.
Faster downloads (Xet)	If `HF_TOKEN` is set, `launch.py` can enable `HF_XET_HIGH_PERFORMANCE=1` for faster first-time downloads. See `utils/model_manager.py`.
Gated model in optional module	Some detectors/OCRs (e.g. HF object detection, certain VLMs) pull gated models. Same steps: accept terms, set token, then run again.

2.1. PaddleOCR-VL: no network or "model not in cache"

Symptoms: OCR fails with "getaddrinfo failed", "No model hoster is available", or "Can't load processor for 'PaddlePaddle/PaddleOCR-VL'" when using paddleocr_vl_hf and HuggingFace is unreachable (offline, firewall, or no DNS).

The app will retry using the local HuggingFace cache only. If the model was never downloaded, you get a clear error.

What to do	Notes
Download once when online	Connect to the internet, run a page with OCR so the app can download PaddleOCR-VL, then you can run offline later.
Use another OCR	Config → DL Module → OCR → choose e.g. mit48px, manga_ocr, or easyocr if you need to work offline without the HF model cached.
Force offline (cache only)	Set env var `HF_HUB_OFFLINE=1` so the app never tries the network; useful if you already have the model cached.

3. Provider API keys and translator/OCR

Symptoms: "Invalid API key", "401", "403 Forbidden", or "quota exceeded" when using a translator or cloud OCR.

Provider / area	What to check
Where to set	Config → DL Module → Translator (or OCR) → open the module (e.g. LLM_API_Translator, ChatGPT, Google) → fill API key / Key in params. Keys are stored in `config.json`; keep the file private.
Format	Paste the key as given (no extra spaces). OpenAI keys often start with `sk-`; Google/others vary. If the UI has a "Test" button (e.g. Test translator), use it to verify.
OpenAI	Key from platform.openai.com. Ensure the account has credits and the model you chose is available.
Google / Gemini	Use an API key from Google AI Studio or Cloud. Check quota and that the model name in the dropdown matches your access.
OpenRouter	Key from openrouter.ai. Free models (ID ending in `:free`) have strict rate limits; see §4 OpenRouter free-tier 429 below.
DeepL / others	Key from the provider’s dashboard; set in the same Translator/OCR params.
Proxy	If behind a proxy: Config → Translator (or module) → Proxy (e.g. `http://127.0.0.1:7897`). See README "Translation context" for proxy format.
Rate limits / quota	"Too many requests" or "quota exceeded" → wait, or switch model/provider, or check the provider’s usage page. For OpenRouter free models, see §4.

4. OpenRouter free-tier 429 (rate limit)

Symptoms: Error code: 429, "temporarily rate-limited upstream", or "Provider returned error" when using an OpenRouter model whose ID ends in :free (e.g. meta-llama/llama-3.3-70b-instruct:free).

Documented limits (as of 2024–2025):

Limit	Value	Source
Requests per minute (RPM)	20	OpenRouter docs
Requests per day (RPD)	200 (no credits); higher if you have purchased credits	Same
429 from "upstream"	Even under 20 RPM, the upstream provider (e.g. Venice, Z.AI) can be globally rate-limited; OpenRouter returns 429 in that case too	OpenRouter examples #11

Recommended LLM_API_Translator settings for free models:

Param	Recommended	Reason
Delay	3.5–5 s	60 ÷ 20 RPM = 3 s minimum between requests; 3.5–5 s leaves headroom and reduces upstream 429s.
Max requests per minute	6–10	Stays under OpenRouter’s 20 RPM; lower = safer.
Rate limit delay	60–90 s	When you get 429, wait this long before retry so upstream can recover.

Where to set: Config → DL Module → Translator → LLM_API_Translator → Delay, Max requests per minute, Rate limit delay.

Other options: Use a paid model (no :free), add your own OpenRouter key and purchase credits for higher limits, or use model fallbacks (multiple models) so the app can switch when one is rate-limited.

5. Dependency conflicts

Symptoms: pip reports conflicting versions when installing, or a specific detector/OCR/inpainter fails to import or run.

What to do	Notes
Optional modules	See docs/OPTIONAL_DEPENDENCIES.md for known conflicts (e.g. craft_det with opencv, simple_lama with Pillow). Use the suggested alternatives or a separate venv.
Don’t install everything	Install only the dependencies for the modules you use. Extra pip packages (e.g. `craft-text-detector`, `simple-lama-inpainting`) can conflict with main `requirements.txt`.
Fresh venv	`python -m venv venv`, activate, then `pip install -r requirements.txt` and `python launch.py`. Reduces conflicts from other projects.
Torch version	`launch.py` installs PyTorch (CUDA or ROCm) automatically. To force a version, set TORCH_COMMAND (e.g. `pip install torch==... torchvision==... --index-url ...`) before running. See README or "Portable setup" for platform notes.

Windows NVIDIA: torch/torchvision entry-point errors after auto-install

Symptoms: First launch detects an NVIDIA GPU, installs CUDA PyTorch, then Windows shows python.exe - Entry Point Not Found dialogs mentioning torch_cuda.dll, torchvision\_C.pyd, or operator torchvision::nms does not exist.

Cause: A CPU-only or mismatched PyTorch/torchvision wheel was already installed in the active Python. Older launchers imported torch while checking CUDA, so Windows could keep PyTorch DLLs loaded during pip replacement and leave a mixed install for the rest of that run.

Fix: Use the current launcher, which probes PyTorch in a child process before reinstalling. If the environment is already mixed from a previous run, close BallonsTranslator-Pro and run:

python -m pip uninstall -y torch torchvision torchaudio
python -m pip install --upgrade --force-reinstall torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu118
python launch.py

Delete any leftover ~orch, ~orchvision, or ~umpy temporary folders under Lib\site-packages only after Python is closed; pip may leave these behind when files were locked.

6. First run seems stuck or very slow

Symptoms: After "Choose models to download" you see "Checking connectivity to the model hosters..." and the app appears to hang for one or more minutes; or downloads are slow.

What to do	Notes
Normal on first run	The first launch downloads model files (hundreds of MB to over 1 GB depending on packages). You should see progress lines like "downloading data/models/...". Let it finish.
Skip connectivity check	If the connectivity check takes too long (e.g. firewalled or slow DNS), set DISABLE_MODEL_SOURCE_CHECK=True before running: `set DISABLE_MODEL_SOURCE_CHECK=True` (Windows CMD) or `export DISABLE_MODEL_SOURCE_CHECK=True` (Linux/macOS), then `python launch.py`. Some download backends use this to skip pre-download reachability checks.
Text style warning	If you see "Text style does not exist" on first run, it is harmless: the app creates `config/textstyles/default.json` and continues.

7. Pipeline caches, CBR, batch report, manual mode

OCR and translation caches (Config → DL Module):

Option	What it does
Enable OCR cache	Reuses OCR results for the same image/model/language in the current session. Reduces redundant OCR runs when re-running or changing only translation.
Translation cache	Reuses translation results for the same source text and settings (when deterministic). Saves API cost on re-runs.
Clear OCR and translation caches	Tools → Models → Clear OCR and translation caches. Clears in-session caches so the next run recomputes.
Release model caches	Tools → Models → Release model caches. Unloads detector/OCR/inpainter/translator models and frees GPU/RAM.
Release model caches after batch	Config → General. When on, models are unloaded automatically after each full pipeline run.
Manual mode	Config → General. When on, Run processes only the current page (comic-translate style).

Opening CBR (RAR comic archives): Use File → Open CBR ... for .cbr/.rar files. Requires pip install rarfile and WinRAR or 7-Zip (with UnRAR) in your system PATH. If it fails, the app shows a message with these requirements.

Batch report: If pages were skipped during a run (e.g. soft translation failure), a Batch report may open automatically. Use Tools → Project → Show last batch report to open it again; double-click a row to jump to that page.

Run OCR or translation on selected pages: In the page list (left), right-click selected pages → Run OCR on selected pages, Run translation on selected pages, or Run inpainting on selected pages. Runs only that stage on the selected pages; uses caches.

8. Tips: comic-style bubbles and detector

For comic-style speech bubbles (bubble + text regions), you can use the Hugging Face object-detection detector with a model that outputs both bubbles and text:

Config → DL Module → Detector → choose hf_object_det (or similar).
Set Model ID to e.g. ogkalu/comic-text-and-bubble-detector (or another model that predicts both bubble and text regions).
In the detector params, set Labels to include so that both bubble and text_bubble (or the model’s label names) are included. This lets the pipeline treat bubbles as first-class regions for layout and inpainting.

See COMIC_TRANSLATE_RESEARCH.md for more detector and layout notes.

Quick reference

Issue	First step
OpenRouter 429 / free tier	Config → Translator → LLM_API_Translator: Delay 3.5–5 s, Max requests per minute 6–10, Rate limit delay 60–90 s. See §4.
Translation overflows bubble	Config → General → Typesetting: Text in box = Auto fit to box, Auto layout on. See §9.
GPU OOM	Load model on demand, unload after idle, or lower detect_size / inpaint_size / use tiled inpainting.
HF 401 / gated	Accept model terms on huggingface.co, create HF token, set in Config → General or `HF_TOKEN`.
Translator/OCR "invalid key"	Set API key in Config → DL Module → that module’s params; use Test button; check proxy if needed.
Pip conflict / import error	See OPTIONAL_DEPENDENCIES.md; use a clean venv and only install deps for modules you use.
First run "stuck" / slow	Downloads take several minutes; set `DISABLE_MODEL_SOURCE_CHECK=True` to skip long connectivity check if needed. See §6 above.
CBR open fails	Install `pip install rarfile` and add WinRAR or 7-Zip (UnRAR) to PATH. See §7.
Batch report / skipped pages	Tools → Project → Show last batch report; double-click row to open page. See §7.

9. Translation text overflows bubble or formats badly

Symptoms: After translation, the text box resizes and extends outside the speech bubble, or text is poorly formatted (wrong line breaks, too big/small). Text may also be cropped at the bottom or form a narrow vertical column.

What helps: Use Auto layout and Text in box = Auto fit to box so layout uses the balloon region for line breaks and font scaling. The layout system prefers fewer, longer lines and fuller width usage; when Constrain text box to bubble is on, it scales font down if needed so text fits without cropping.

Settings that help:

Where	Setting	Recommendation
Config → General → Typesetting	Text in box	Set to Auto fit to box so the program scales font size to fit the balloon.
Config → General → Typesetting	Auto layout	Leave on so translation is split into lines according to the balloon region.
Config → General → Typesetting	Font Size	Decide by program lets layout choose font size; use use global setting only if you want a fixed size.
Config → General → Typesetting	Constrain text box to bubble	Keep on so the box stays inside the bubble; layout will scale font down if content would overflow.
config.json (optional)	`module.layout_optimal_breaks`	Keep `true` (default) for better line breaks (fewer, longer lines).
config.json (optional)	`module.layout_collision_check`	Keep `true` (default) so layout retries when text would overflow.

Per-block: Select one or more text blocks → right-click → Format → Auto fit font size to box to scale font so text fits the current box.

If it still overflows: The bubble region comes from the detector mask. Try a different Text detection module or increase box_padding slightly so the detected region fully contains the bubble; then re-run Detect and Translate.

10. Text boxes in wrong position, stacked at top-left, or outside the image

Symptoms: After translation or layout, text boxes are all at the top-left, or some boxes appear far outside the image.

What to check	Notes
Constrain text box to bubble	Config → General → Typesetting → Constrain text box to bubble. When on, the box is forced to the detected bubble region (with correct image coordinates). If issues persist, try turning it off to see if layout without constrain works.
Initial upscale	Config → General → Initial upscale (image_upscale_initial). When on, detection runs on a 2× (or larger) image; block coordinates are scaled back at the end of the pipeline. If the run was interrupted (e.g. before inpainting finished), blocks can stay in upscaled coordinates and appear in the wrong place. Try turning Initial upscale off to test whether positions fix.
center_text_in_bubble	If present in `config.json` under `module`, it is ignored (the feature was removed). Safe to delete or leave as-is.
Merge gap ratio	In `config.json`, `module.merge_nearby_blocks_gap_ratio` should be a normal value (e.g. `1.5`). A value like `0.999...` can be a float artifact; set to `1.5` if you use merge nearby blocks.

Code safeguard: When Constrain text box to bubble is on, the layout now clamps the text box to the image bounds so it never extends outside the panel, even if the bubble region is wrong or there is an upscale/coordinate mismatch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

1. GPU out-of-memory (OOM)

2. HuggingFace gated models

2.1. PaddleOCR-VL: no network or "model not in cache"

3. Provider API keys and translator/OCR

4. OpenRouter free-tier 429 (rate limit)

5. Dependency conflicts

Windows NVIDIA: torch/torchvision entry-point errors after auto-install

6. First run seems stuck or very slow

7. Pipeline caches, CBR, batch report, manual mode

8. Tips: comic-style bubbles and detector

Quick reference

9. Translation text overflows bubble or formats badly

10. Text boxes in wrong position, stacked at top-left, or outside the image

FilesExpand file tree

TROUBLESHOOTING.md

Latest commit

History

TROUBLESHOOTING.md

File metadata and controls

Troubleshooting

1. GPU out-of-memory (OOM)

2. HuggingFace gated models

2.1. PaddleOCR-VL: no network or "model not in cache"

3. Provider API keys and translator/OCR

4. OpenRouter free-tier 429 (rate limit)

5. Dependency conflicts

Windows NVIDIA: torch/torchvision entry-point errors after auto-install

6. First run seems stuck or very slow

7. Pipeline caches, CBR, batch report, manual mode

8. Tips: comic-style bubbles and detector

Quick reference

9. Translation text overflows bubble or formats badly

10. Text boxes in wrong position, stacked at top-left, or outside the image