Convert infographic-style PNGs into editable PowerPoint slides. This works surprisingly well but it's not perfect - have gimp or photoshop ready to make some edits.
png2pptx keeps the source image as the slide background, OCRs text out of the PNG, and recreates that text as editable PowerPoint text boxes so you can update labels and copy without rebuilding the slide by hand.
Status: v0.1 beta. The current release is aimed at infographic and diagram-style images with reasonably clear text. It works well on many layouts, but it is not a promise of perfect OCR or perfect visual matching on every design.
- editable text overlays placed near their original positions
- sampled text colors taken from the source image
- background text removal via inpainting
- aggressive OCR enabled by default for better recall
- batch conversion: one input PNG per slide
- this was "agentically engineered" so it is what it is
Tesseract OCR must be installed on your system.
- Windows: Download from UB Mannheim and add it to
PATH - macOS:
brew install tesseract - Linux:
sudo apt install tesseract-ocr
If you are working in PowerShell on Windows and Tesseract is installed but not on PATH, this session-only fix is often enough:
$env:path="C:\Program Files\Tesseract-OCR;$env:path"pip install -e .OpenCV is installed as part of the package and is used for inpainting plus the higher-recall OCR path.
Convert the bundled example:
png2pptx convert examples/sample_input.png -o sample_output.pptxBy default, png2pptx uses:
- uses
--ocr-mode aggressive - uses
--remove-text
If you want to keep the original source text in the background image, disable inpainting explicitly:
png2pptx convert examples/sample_input.png -o sample_output.pptx --no-remove-textBatch convert multiple PNGs into one deck:
png2pptx convert slide1.png slide2.png slide3.png -o deck.pptx| Flag | Default | Description |
|---|---|---|
-o, --output |
output.pptx |
Output PPTX path |
--confidence |
40 |
Minimum OCR confidence (0-100) |
--lang |
eng |
Tesseract language code |
--ocr-mode |
aggressive |
aggressive is the default higher-recall path; fast is quicker but may miss more text |
--remove-text / --no-remove-text |
--remove-text |
Remove detected text from the background image before adding editable text |
- OCR - Tesseract extracts word boxes from the PNG.
- Layout grouping - nearby words are grouped into text blocks.
- Color sampling - likely text color is estimated from the source image.
- Background cleanup - detected text can be inpainted out of the background image.
- PPTX generation - the PNG becomes the slide background and editable text is layered on top.
png2pptx works best when the source image looks like a presentation graphic:
- headings, labels, and body text are visually distinct
- text is mostly horizontal
- background art is not too dense around the copy
Current weak spots:
- font family matching is still approximate
- decorative icons and stylized typography can still create OCR mistakes
- dense or low-contrast images may miss text or produce noisy fragments
- different visual styles can behave very differently even with aggressive OCR enabled
The current beta intentionally keeps scope tight. Deferred work that should be picked back up after launch includes:
- missed-region recovery for text-like areas Tesseract skips entirely
- optional second OCR backend support if Tesseract tops out on harder layouts
- better font inference for closer visual matching
- richer diagnostics for tuning difficult images
- Better inpainting (see examples)
Install development dependencies and run the test suite:
pip install -e ".[dev]"
pytest -qGenerate a repeatable quality baseline from the checked-in examples:
png2pptx quality-loop --examples-dir examples --output-dir quality_outputThis writes per-example PPTX files, current clean images, overlay review images, and a summary.json file for before/after comparisons while tuning OCR, layout, or inpainting behavior.
Before opening a public-facing PR, make sure docs stay aligned with real CLI behavior and avoid committing local/generated artifacts such as virtualenvs, build outputs, or ad-hoc debug exports.