You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document maps the curated list of 20+ models per category (from community recommendations) to what BallonsTranslator already supports and what is optional or potential. Use it to choose modules and plan integrations.
Target: Windows + NVIDIA GPU. VRAM notes are approximate. For quality/accuracy rankings (best to worst) of all detection, OCR, and translation modules, see docs/QUALITY_RANKINGS.md.
For tier labels (Stable/Beta/Experimental/External dependency heavy) and cross-platform compatibility, use the canonical matrix: docs/MODULE_COMPATIBILITY_MATRIX.md.
Would require a new detector that loads this Hugging Face model. ysgyolo can load Ultralytics RT-DETR .pt checkpoints if the model path contains rtdetr (e.g. ysgyolo_rtdetr_something.pt in data/models), but the ogkalu transformer-based “comic-text-and-bubble-detector” is not a drop-in .pt.
2) ogkalu/comic-speech-bubble-detector-yolov8m
✅ Supported via ysgyolo
Download the YOLOv8 medium weights from Hugging Face, save as data/models/ysgyolo_comic_speech_bubble_v8m.pt (or any name starting with ysgyolo), then select that path in ysgyolo detector. This is not an YSG (淫書館) model; YSG refers only to YSGforMTL/YSGYoloDetector.
3) mayocream/comic-text-detector-onnx
⚠️Optional
Use ctd with device=CPU and custom_onnx_path to your ONNX file. Default CTD ONNX when empty.
Use model ysgyolo_comic_speech_bubble_v8m.pt in data/models (from ogkalu; not YSG series). YSG (淫書館) = YSGforMTL/YSGYoloDetector only.
CTD (ComicTextDetector)
✅ ctd
Built-in. detect_size up to 2400. Optional custom_onnx_path for alternate ONNX (e.g. mayocream).
PaddleOCR det
✅ paddle_det, paddle_det_v5
Full pipeline with paddle_rec / paddle_rec_v5.
Surya detection
✅ surya_det
Pair with surya_ocr.
EasyOCR detection
✅ easyocr_det
Pair with easyocr_ocr.
MMOCR (DBNet etc.)
✅ mmocr_det
Pair with mmocr_ocr. Same deps: mim install mmengine mmcv mmdet mmocr.
HunyuanOCR spotting
✅ hunyuan_ocr_det
Full-image spotting; use with none_ocr or hunyuan_ocr.
Stariver (API)
✅ stariver_ocr
Detector returns boxes+text; use with none_ocr.
Magi, TextMamba, YSG (淫书馆)
✅ magi_det, textmamba_det, ysgyolo
textmamba_det: stub (official code not yet released; raises clear error). magi_det, ysgyolo: detection only; pair with any OCR. YSG (淫书馆) = YSGforMTL/YSGYoloDetector by lhj5426 (19 months from data to training). Same author’s earlier YOLOv8: ogkalu/manga-text-detector-yolov8s. The ysgyolo detector can also load other comic YOLO .pt (ogkalu, Kiuyha, etc.); those are not "YSG series".
OpenRouter vision models for LLM OCR: When using llm_ocr with provider OpenRouter (API key from openrouter.ai), you can pick any vision-capable model. Free vision models (image in, text out, $0; full list): openrouter/free, google/gemma-3-4b-it:free, google/gemma-3-12b-it:free, google/gemma-3-27b-it:free, mistralai/mistral-small-3.1-24b-instruct:free, nvidia/nemotron-nano-12b-v2-vl:free, qwen/qwen3-vl-30b-a3b-thinking, qwen/qwen3-vl-235b-a22b-thinking. Paid examples: openai/gpt-4o, openai/gpt-4o-mini, google/gemini-2.0-flash-001, google/gemini-1.5-flash, google/gemini-1.5-pro, qwen/qwen2.5-vl-72b-instruct, qwen/qwen3.5-flash-02-23, anthropic/claude-sonnet-4, anthropic/claude-3-5-sonnet. Image inputs, API models.
3. Translation
Recommended model
In BallonsTranslator
Notes
GPT-4o / OpenAI
✅ LLM_API_Translator (provider OpenAI)
Best contextual translation; API key.
Claude / Gemini
✅ LLM_API_Translator (OpenRouter) or ChatGPT
Use OpenRouter or provider endpoints. Free models: provider OpenRouter, then pick a free model from the dropdown (e.g. openrouter/free, meta-llama/llama-3.3-70b-instruct:free, stepfun/step-3.5-flash:free). Full list.
Google Translate API
✅ google
DeepL
✅ DeepL, DeepL Free, DeepLX API
M2M-100
✅ m2m100
Local CTranslate2; many languages.
Sakura
✅ Sakura
Japanese↔English.
Sugoi
✅ Sugoi
NLLB-200 / OPUS-MT / T5 MT
✅ nllb200, opus_mt, t5_mt
nllb200: 200 languages (HF); opus_mt: Helsinki-NLP per-pair; t5_mt: prompt-based T5.
qwen_image_edit: Diffusers. repaint: RePaint DDPM. mat: MAT (CVPR 2022) via repo + checkpoint (github.com/fenglinglwb/MAT). SAM3 not integrated.
5. Recommendation strategy
Detection (priority): Use ctd or paddle_det_v5 for manga; surya_det for general docs; ysgyolo with comic bubble model for balloon-only. For SOTA spotting, swintextspotter_v2 or hunyuan_ocr_det + none_ocr (when compatible).
OCR (priority):paddle_rec_v5 or hunyuan_ocr for quality; surya_ocr, florence2_ocr, internvl2_ocr for alternatives. Use none_ocr only with spotters that fill text.
Translation (priority):LLM_API_Translator with GPT-4o/Claude/Gemini for best context; Sakura for JP↔EN; DeepL or google for API; m2m100 for local multilingual.
Inpainting:lama_large_512px for most manga; flux_fill or aot if you prefer.
6. Adding new models
Detectors: Implement TextDetectorBase, _detect() returning (mask, blk_list). Register with @register_textdetectors('name'). See modules/textdetector/detector_*.py and docs/INSTALL_EXTRA_DETECTORS.md.
OCR: Implement OCRBase, _ocr_blk_list() and optionally ocr_img(). Register with @register_OCR("name"). See modules/ocr/ocr_*.py.
Translators: Implement BaseTranslator, _translate(). Register with @register_translator('name'). See doc/how_to_add_new_translator.md.
Inpainters: Implement InpainterBase. Register with @register_inpainter('name'). See modules/inpaint/.
VRAM: small OCR/detection ~2–6 GB; large VLMs (Qwen2-VL 7B, InternVL 8B) ~16–24 GB; translation LLMs depend on size and quantization.
7. Not integrated (reference)
The following remain not integrated by design or feasibility:
Item
Reason
ogkalu/comic-text-and-bubble-detector
✅ Integrated as default model_id for hf_object_det.
TextHawk2, TextMonkey
LVLM/OCR-free spotters; would require new spotter/OCR modules and different API.
MAT (inpainting)
✅ Integrated as mat (set repo_path + checkpoint_path to MAT repo and .pth).
SAM3 (inpainting)
Segmentation model, not a drop-in inpainter; no integration.
SAM-backboned, DocLAYNET (detection)
No detector module yet. TextSnake via mmocr_det (det_model=TextSnake).
Manga OCR Mobile
✅ Integrated as manga_ocr_mobile (TFLite; optional deps).
Nemotron Parse, NuMarkdown
Nemotron ✅ Integrated as nemotron_ocr (full-page, assigns by bbox overlap). NuMarkdown = full-page doc→Markdown; not integrated.