MediaWiki extension for importing documents/webpages into wiki pages and exporting wiki pages to external formats — powered by Pandoc.
- Import: convert DOCX, ODT, PDF, DOC, or a webpage URL into a wiki page (with images)
- Export: download wiki pages as DOCX, ODT, EPUB, PDF, HTML, RTF, or TXT
- AI cleanup: optional LLM-powered post-conversion wikitext polish (OpenAI or Claude)
- Confluence migration: mass-import an entire Confluence space (Cloud or Server) into the wiki
MediaWiki page: https://www.mediawiki.org/wiki/Extension:PandocUltimateConverter
Supported on MediaWiki 1.42–1.45, Windows and Linux.
- Install Pandoc
- Download the extension into your
extensions/folder - Add to
LocalSettings.php:
wfLoadExtension( 'PandocUltimateConverter' );
$wgEnableUploads = true;
$wgFileExtensions[] = 'docx';
$wgFileExtensions[] = 'odt';
$wgFileExtensions[] = 'pdf';
$wgFileExtensions[] = 'doc';
// Only needed if Pandoc is not in PATH:
// $wgPandocUltimateConverter_PandocExecutablePath = 'C:\Program Files\Pandoc\pandoc.exe';Optional dependencies (only needed for specific formats):
- PDF import: poppler (
pdftohtml) — see Installing poppler - Scanned PDF / OCR: Tesseract — see Installing Tesseract
- DOC import and PDF export: LibreOffice — see Installing LibreOffice
Go to Special:PandocUltimateConverter to convert a file or URL into a wiki page.
- Choose source: file upload or URL
- Enter the target page name
- Click convert — you'll be redirected to the new page
What happens during conversion:
- Images are extracted and uploaded to the wiki automatically (duplicates are skipped)
- The uploaded source file is removed after conversion
- Temporary files are cleaned up
A legacy (non-Codex) form is available at
Special:PandocUltimateConverter?codex=0.
The extension can optionally run an LLM (OpenAI or Claude) to clean up wikitext after conversion — fixing formatting issues, removing artefacts, and improving readability.
Add to LocalSettings.php:
$wgPandocUltimateConverter_LlmProvider = 'openai'; // or 'claude'
$wgPandocUltimateConverter_LlmApiKey = 'sk-...';
// Optional: override the default model
// $wgPandocUltimateConverter_LlmModel = 'gpt-5.4-nano'; // OpenAI default; or 'claude-3-5-haiku-20241022' for ClaudeThere are two ways to use AI cleanup:
- Batch mode — check the "Polish with AI" checkbox before clicking Convert all. Each item is converted first, then automatically queued for AI cleanup. The conversion queue and the AI cleanup queue run in parallel.
- Per-item — click the ✨ button on any already-converted item to run AI cleanup on demand.
If AI cleanup fails, a per-item error is shown with a Retry button.
| Parameter | Default | Description |
|---|---|---|
PandocUltimateConverter_LlmProvider |
null |
"openai" or "claude". Leave null to disable. |
PandocUltimateConverter_LlmApiKey |
null |
API key for the configured provider. |
PandocUltimateConverter_LlmModel |
null |
Model override. Defaults to gpt-5.4-nano (OpenAI) or claude-3-5-haiku-20241022 (Claude). |
PandocUltimateConverter_LlmPrompt |
null |
Custom system prompt for the cleanup step. |
Export one or more wiki pages to an external document format.
Go to Special:PandocExport or use the Export action in the page tools menu (the same menu where "Delete" and "Move" appear).
Supported export formats: DOCX, ODT, EPUB, PDF, HTML, RTF, TXT.
Features:
- Export a single page or multiple pages into one document
- Export entire categories (subcategories are resolved recursively)
- "Separate files" option bundles each page as an individual file in a ZIP archive
- Images referenced in wikitext are embedded into the output document
- PDF export uses a Pandoc → DOCX → LibreOffice pipeline (no LaTeX required)
Mass-migrate an entire Confluence space to this wiki in one operation.
Go to Special:ConfluenceMigration and fill in:
| Field | Description |
|---|---|
| Confluence URL | Base URL of your Confluence instance (see below) |
| Space key | Key of the Confluence space to migrate (e.g. DOCS, DEV) |
| Email / Username | Your Confluence login email (Cloud) or username (Server) |
| API token / Password | API token (Cloud) or password / personal access token (Server) |
| Target page prefix | Optional prefix prepended to every page title, e.g. Confluence/DOCS |
| Overwrite existing pages | When checked, existing wiki pages are replaced |
| Auto-categorize | Creates MediaWiki categories mirroring the Confluence page hierarchy (checked by default) |
| Confluence Cloud | Confluence Server / Data Center | |
|---|---|---|
| Base URL | https://yourcompany.atlassian.net |
https://confluence.yourcompany.com |
| Username field | Your Atlassian account email | Your Confluence username |
| Token field | Atlassian API token | Password or Personal Access Token |
- All pages in the specified space are fetched via the Confluence REST API v1.
- Page content (Confluence "storage format" HTML) is converted to MediaWiki wikitext using Pandoc.
- Common Confluence macros (code blocks, info/note/warning/tip panels) are converted to their MediaWiki equivalents.
- File attachments are downloaded from Confluence and uploaded to the MediaWiki file repository.
- Pages are created with the edit summary "Imported from Confluence".
- When auto-categorize is enabled, pages with sub-pages get a matching category; nested sub-pages produce nested categories.
The migration is processed as a background job via the MediaWiki job queue. You do not have to keep your browser open. When the migration finishes you receive an Echo notification (requires the Echo extension).
Jobs are processed by maintenance/runJobs.php or automatically during regular wiki requests if $wgJobRunRate > 0 (the default).
// LocalSettings.php
$wgPandocUltimateConverter_EnableConfluenceMigration = false;Setting this to false hides Special:ConfluenceMigration entirely and displays a notice to users who navigate to it directly.
Supports everything Pandoc supports. Tested: DOCX, ODT, PDF, DOC.
| Format | Pipeline | Extra dependency |
|---|---|---|
| DOCX, ODT | Pandoc → wikitext | — |
| DOC | LibreOffice → DOCX → Pandoc | LibreOffice |
| PDF (text) | pdftohtml → HTML → Pandoc | poppler |
| PDF (scanned) | pdftoppm → Tesseract OCR → wikitext | poppler + Tesseract |
All parameters are set in LocalSettings.php with the $wg prefix.
| Parameter | Default | Description |
|---|---|---|
PandocUltimateConverter_PandocExecutablePath |
null |
Path to the Pandoc binary. Not needed if Pandoc is in PATH. |
PandocUltimateConverter_TempFolderPath |
null |
Temp folder for conversion files. Uses system default if not set. |
PandocUltimateConverter_PdfToHtmlExecutablePath |
null |
Path to poppler's pdftohtml. Not needed if in PATH. |
PandocUltimateConverter_LibreOfficeExecutablePath |
null |
Path to soffice/libreoffice. Not needed if in PATH. |
PandocUltimateConverter_TesseractExecutablePath |
null |
Path to the Tesseract OCR binary. Not needed if in PATH. |
PandocUltimateConverter_OcrLanguage |
"eng" |
Tesseract language code(s). Use + for multiple, e.g. "eng+deu". |
PandocUltimateConverter_PandocCustomUserRight |
"" |
Restrict access to a specific user right. |
PandocUltimateConverter_MediaFileExtensionsToSkip |
[] |
File extensions to skip during image upload (e.g. ["emf"]). |
PandocUltimateConverter_FiltersToUse |
[] |
Custom Pandoc Lua filters to apply. Must be in the filters/ folder. |
PandocUltimateConverter_UseColorProcessors |
false |
Preserve text/background colors from DOCX/ODT files. |
PandocUltimateConverter_ShowExportInPageTools |
true |
Show "Export" in the page Actions menu. |
PandocUltimateConverter_LlmProvider |
null |
LLM provider: "openai" or "claude". |
PandocUltimateConverter_LlmApiKey |
null |
API key for the LLM provider. |
PandocUltimateConverter_LlmModel |
null |
Model name override. |
PandocUltimateConverter_LlmPrompt |
null |
Custom system prompt for AI cleanup. |
PandocUltimateConverter_EnableConfluenceMigration |
true |
Set to false to disable Special:ConfluenceMigration. |
Filters are placed in the filters/ subfolder. Add them via:
$wgPandocUltimateConverter_FiltersToUse[] = 'increase_heading_level.lua';| Filter | Description |
|---|---|
increase_heading_level.lua |
Increase heading levels by 1 (useful when documents start at H1) |
colorize_mark_class.lua |
Highlight "mark" classes with yellow background |
Required for PDF import. If not installed, PDF files will fail to convert — all other formats work normally.
Linux:
sudo apt install poppler-utils # Debian/Ubuntu
sudo dnf install poppler-utils # RHEL/FedoraWindows:
choco install popplerOr download manually from https://github.com/oschwartz10612/poppler-windows/releases and add bin/ to PATH, or set:
$wgPandocUltimateConverter_PdfToHtmlExecutablePath = 'C:\poppler\Library\bin\pdftohtml.exe';Required for scanned PDF OCR. Also requires poppler (pdftoppm, installed with pdftohtml).
Linux:
sudo apt install tesseract-ocr # Debian/Ubuntu
sudo apt install tesseract-ocr-deu # additional languages
sudo dnf install tesseract # RHEL/FedoraWindows:
choco install tesseractOr download from https://github.com/UB-Mannheim/tesseract/wiki and add to PATH, or set:
$wgPandocUltimateConverter_TesseractExecutablePath = 'C:\Program Files\Tesseract-OCR\tesseract.exe';Required for DOC import and PDF export.
Linux:
sudo apt install libreoffice # Debian/Ubuntu
sudo dnf install libreoffice # RHEL/FedoraWindows: Download from https://www.libreoffice.org/download/download/ and add the program/ folder to PATH, or set:
$wgPandocUltimateConverter_LibreOfficeExecutablePath = 'C:\Program Files\LibreOffice\program\soffice.exe';The extension exposes three API modules. Write operations (pandocconvert, pandocllmpolish) require a CSRF token and POST.
Obtain a CSRF token first:
GET /api.php?action=query&meta=tokens&format=json
Converts a file or URL to a wiki page. Requires a CSRF token and POST.
POST /api.php
action=pandocconvert&pagename=My Article&url=https://example.com&forceoverwrite=1&token=<csrf>&format=json
Response:
{ "pandocconvert": { "result": "success", "pagename": "My Article" } }| Parameter | Required | Description |
|---|---|---|
pagename |
yes | Target wiki page title |
filename |
one of | Uploaded file name (mutually exclusive with url) |
url |
one of | http/https URL to fetch (mutually exclusive with filename) |
forceoverwrite |
no | 1 to overwrite existing page (default: 0) |
token |
yes | CSRF token |
Runs LLM AI cleanup on an existing wiki page's wikitext. Requires a CSRF token and POST. The LLM provider must be configured.
POST /api.php
action=pandocllmpolish&pagename=My Article&token=<csrf>&format=json
Response:
{ "pandocllmpolish": { "result": "success", "pagename": "My Article" } }| Parameter | Required | Description |
|---|---|---|
pagename |
yes | Title of existing wiki page to polish |
token |
yes | CSRF token |
Fetches remote URLs and extracts their HTML <title> tags. Used internally by the Codex UI to suggest page names for URL imports. GET request, no token required.
GET /api.php?action=pandocurltitle&urls=https://example.com&format=json
Response:
{ "pandocurltitle": { "results": [ { "url": "https://example.com", "title": "Example Domain" } ] } }Accepts multiple URLs (pipe-separated). Only http/https URLs are accepted.
| Parameter | Required | Description |
|---|---|---|
urls |
yes | One or more URLs (pipe-separated) to fetch titles from |
pandocconvert:
| Code | Meaning |
|---|---|
nosource |
Neither filename nor url supplied |
multiplesource |
Both filename and url supplied |
invalidurlscheme |
URL is not http/https |
pageexists |
Page exists and forceoverwrite not set |
pandocllmpolish:
| Code | Meaning |
|---|---|
pagenotfound |
The specified page does not exist |
notconfigured |
LLM provider is not configured on this wiki |
notwikitext |
The page content is not wikitext |
Add to LocalSettings.php:
$wgShowExceptionDetails = true;
$wgDebugLogGroups['PandocUltimateConverter'] = '/var/log/mediawiki/pandoc.log';The extension logs diagnostic messages to the PandocUltimateConverter log group.


