MarkPoly
is a smart polyglot Markdown translation that preserves meaning in text and code.
- Handles Markdown documents
- Supports multiple languages
- Preserves formatting while translating
- Intelligently handles code blocks and comments
In short, an LLM powered Markdown translator that understands both prose and code.
Features:
-
Code Block Translation:
- Handles Markdown paragraphs, lists, tables and code blocks
- Handles code blocks separately from other markdown content
- Extracts and identifies code comments
- Preserves code structure and only translates comments
- Maintains comment markers and indentation
- Supports both single-line and multi-line comments
-
Regular Markdown Translation:
- Different handling for tables, lists, headers, and paragraphs
- Preserves markdown formatting for each type
- Maintains special characters and structure
- Translate between two languages
-
Language Detection and Handling:
- Uses specified source language if provided
- Falls back to automatic detection
- Skips translation for English content
- Handles unknown language cases
- Fallback to original content on error
- Preservation of markdown formatting
As prerequisite a polyglot LLM like granite3.1-dense:latest
should be available over an OpenAI compatible API endpoint (e.g. Ollama, vLLM, llama_cpp, etc.)
# example translating markdown file from Spanish to English
python markpoly.py --source-lang es ejemplo.md
# after running it will create a translated file `ejemplo_translated.md`
Additional arguments and their usage:
markpoly.py [-h] [-o OUTPUT] [--source-lang SOURCE_LANG] [--api-key API_KEY] [--base-url BASE_URL] [--model MODEL] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--quiet] input_file
Translate markdown files between languages using AI.
positional arguments:
input_file Path to the input markdown file
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Path to save the translated markdown file. If not provided, will use input_file_translated.md (default: None)
--source-lang SOURCE_LANG
Source language code or name (e.g., "es" or "spanish") (default: None)
--target-lang TARGET_LANG
Target language code or name (e.g., "fr" or "french"), defaults to English (default: en)
--api-key API_KEY API key for the translation service (default: no-key)
--base-url BASE_URL Base URL for the API endpoint (default: http://localhost:11434/v1/)
--model MODEL Model name to use for translation (default: granite3.1-dense:latest)
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level (default: INFO)
--quiet Suppress progress information (default: False)
--list-languages List all supported languages and exit (default: False)
- It has been developed and tested with
granite-3.1
. - It can work with
Phi4
,qwen2
, and other multilingual models that supportsystem
anduser
roles but performance varies. - It DOES NOT work with
Mistral
/Mixtral
family of models as they do not use asystem
prompt role and the logic for combining thesystem
anduser
prompts needs to be coded.